docs: add improvement proposal based on multi-agent research

- Created IMPROVEMENT_PROPOSAL.md with analysis findings
- Added capability-index.yaml for orchestrator routing
- Changed agent modes from 'all' to 'subagent' for isolation
- Created Gitea issues #21-25 for tracking improvements:
  - #21: Implement parallelization pattern (P0)
  - #22: Implement evaluator-optimizer pattern (P1)
  - #23: Enforce quality gates (P0)
  - #24: Consolidate overlapping agents (P2)
  - #25: Research milestone with references
This commit is contained in:
¨NW¨
2026-04-05 01:50:12 +01:00
parent 124b7244b4
commit 7a825a4cb2
6 changed files with 931 additions and 4 deletions

View File

@@ -1,6 +1,6 @@
---
description: Adversarial code reviewer. Finds problems and issues. Does NOT suggest implementations
mode: all
mode: subagent
model: ollama-cloud/minimax-m2.5
color: "#E11D48"
permission:

View File

@@ -1,6 +1,6 @@
---
description: Scores agent effectiveness after task completion for continuous improvement
mode: all
mode: subagent
model: ollama-cloud/gpt-oss:120b
color: "#047857"
permission:

View File

@@ -1,6 +1,6 @@
---
description: Primary code writer for backend and core logic. Writes implementation to pass tests
mode: all
mode: subagent
model: ollama-cloud/qwen3-coder:480b
color: "#DC2626"
permission:

View File

@@ -1,6 +1,6 @@
---
description: Manages git operations, semantic versioning, branching, and deployments. Ensures clean history
mode: all
mode: subagent
model: ollama-cloud/devstral-2:123b
color: "#581C87"
permission:

502
.kilo/capability-index.yaml Normal file
View File

@@ -0,0 +1,502 @@
# Capability Index
# Maps agent capabilities for orchestrator routing
agents:
# Core Development
lead-developer:
capabilities:
- code_writing
- refactoring
- bug_fixing
- implementation
receives:
- tests
- specifications
- architecture_docs
produces:
- code
- documentation_inline
forbidden:
- test_writing
- code_review
model: ollama-cloud/qwen3-coder:480b
mode: subagent
frontend-developer:
capabilities:
- ui_implementation
- component_creation
- styling
- responsive_design
receives:
- designs
- wireframes
- api_endpoints
produces:
- vue_components
- css_styles
- frontend_tests
forbidden:
- backend_code
model: ollama-cloud/qwen3-coder:480b
mode: subagent
backend-developer:
capabilities:
- api_development
- database_design
- server_logic
- authentication
receives:
- api_specifications
- database_requirements
produces:
- express_routes
- database_schema
- api_documentation
forbidden:
- frontend_code
model: ollama-cloud/qwen3-coder:480b
mode: subagent
# Quality Assurance
sdet-engineer:
capabilities:
- unit_tests
- integration_tests
- e2e_tests
- test_planning
- visual_regression
receives:
- code
- requirements
produces:
- test_files
- test_reports
- coverage_reports
forbidden:
- implementation_code
model: ollama-cloud/qwen3-coder:480b
mode: subagent
code-skeptic:
capabilities:
- code_review
- security_review
- style_check
- issue_identification
receives:
- code
produces:
- review_comments
- approval_status
- issue_list
forbidden:
- suggest_implementations
- write_code
model: ollama-cloud/minimax-m2.5
mode: subagent
# Security & Performance
security-auditor:
capabilities:
- vulnerability_scan
- owasp_check
- secret_detection
- auth_review
receives:
- code
- configuration
produces:
- security_report
- vulnerability_list
forbidden:
- fix_vulnerabilities
model: ollama-cloud/gpt-oss:120b
mode: subagent
performance-engineer:
capabilities:
- performance_analysis
- n_plus_one_detection
- memory_leak_check
- algorithm_analysis
receives:
- code
- performance_requirements
produces:
- performance_report
- optimization_suggestions
forbidden:
- write_code
model: ollama-cloud/gpt-oss:120b
mode: subagent
# Specialized Development
browser-automation:
capabilities:
- e2e_browser_tests
- form_filling
- navigation_testing
- screenshot_capture
receives:
- test_scenarios
- url_list
produces:
- test_results
- screenshots
forbidden:
- unit_testing
model: ollama-cloud/qwen3-coder:480b
mode: subagent
visual-tester:
capabilities:
- visual_regression
- pixel_comparison
- screenshot_diff
- ui_validation
receives:
- baseline_screenshots
- new_screenshots
produces:
- diff_report
- visual_issues
forbidden:
- code_changes
model: ollama-cloud/qwen3-coder:480b
mode: subagent
# Analysis & Design
system-analyst:
capabilities:
- architecture_design
- api_specification
- database_modeling
- technical_documentation
receives:
- requirements
- user_stories
produces:
- architecture_docs
- api_specs
- database_schemas
forbidden:
- implementation
model: ollama-cloud/gpt-oss:120b
mode: subagent
requirement-refiner:
capabilities:
- requirement_analysis
- user_story_creation
- acceptance_criteria
- clarification
receives:
- raw_requests
- feature_ideas
produces:
- user_stories
- acceptance_criteria
- requirements_doc
forbidden:
- design_decisions
model: ollama-cloud/gpt-oss:120b
mode: subagent
history-miner:
capabilities:
- git_search
- duplicate_detection
- past_solution_finder
- pattern_identification
receives:
- search_query
- issue_description
produces:
- commit_list
- duplicate_report
- related_files
forbidden:
- code_changes
model: ollama-cloud/glm-5
mode: subagent
capability-analyst:
capabilities:
- gap_analysis
- capability_mapping
- recommendation_generation
- coverage_analysis
receives:
- task_requirements
produces:
- analysis_report
- recommendations
- new_agent_specs
forbidden:
- implementation
model: ollama-cloud/gpt-oss:120b
mode: subagent
# Process Management
orchestrator:
capabilities:
- task_routing
- state_management
- agent_coordination
- workflow_execution
receives:
- issue
- status_change
produces:
- routing_decisions
- status_updates
forbidden:
- code_writing
- code_review
model: ollama-cloud/glm-5
mode: primary
release-manager:
capabilities:
- git_operations
- version_management
- changelog_creation
- deployment
receives:
- approved_code
- release_request
produces:
- commits
- tags
- releases
forbidden:
- code_changes
- feature_development
model: ollama-cloud/devstral-2:123b
mode: subagent
evaluator:
capabilities:
- performance_scoring
- process_analysis
- pattern_identification
- improvement_recommendations
receives:
- completed_issue
- agent_logs
produces:
- performance_report
- scores
- recommendations
forbidden:
- code_changes
model: ollama-cloud/gpt-oss:120b
mode: subagent
prompt-optimizer:
capabilities:
- prompt_analysis
- prompt_improvement
- failure_pattern_detection
receives:
- low_scores
- failure_reports
produces:
- improved_prompts
- optimization_report
forbidden:
- agent_creation
model: ollama-cloud/gpt-oss:120b
mode: subagent
# Fixes
the-fixer:
capabilities:
- bug_fixing
- issue_resolution
- code_correction
receives:
- issue_list
- code_context
produces:
- code_fixes
- resolution_notes
forbidden:
- feature_development
model: ollama-cloud/minimax-m2.5
mode: subagent
# Product Management
product-owner:
capabilities:
- issue_management
- prioritization
- backlog_management
- workflow_completion
receives:
- completed_work
- stakeholder_requests
produces:
- priority_order
- issue_labels
- issue closures
forbidden:
- implementation
model: ollama-cloud/glm-5
mode: subagent
# Workflow
workflow-architect:
capabilities:
- workflow_design
- process_definition
- automation_setup
receives:
- workflow_requirements
produces:
- workflow_definitions
- command_files
forbidden:
- execution
model: ollama-cloud/glm-5
mode: subagent
# Validation
markdown-validator:
capabilities:
- markdown_validation
- formatting_check
- link_validation
receives:
- markdown_files
produces:
- validation_report
- corrections
forbidden:
- content_creation
model: ollama-cloud/glm-5
mode: subagent
agent-architect:
capabilities:
- agent_design
- prompt_engineering
- capability_definition
receives:
- agent_requirements
produces:
- agent_definition
- integration_plan
forbidden:
- agent_execution
model: ollama-cloud/gpt-oss:120b
mode: subagent
# Capability Routing Map
capability_routing:
code_writing: lead-developer
code_review: code-skeptic
test_writing: sdet-engineer
architecture: system-analyst
security: security-auditor
performance: performance-engineer
bug_fixing: the-fixer
git_operations: release-manager
ui_implementation: frontend-developer
api_development: backend-developer
e2e_testing: browser-automation
visual_testing: visual-tester
requirement_analysis: requirement-refiner
gap_analysis: capability-analyst
issue_management: product-owner
prompt_optimization: prompt-optimizer
workflow_design: workflow-architect
scoring: evaluator
duplicate_detection: history-miner
agent_design: agent-architect
markdown_validation: markdown-validator
# Parallelizable Tasks
parallel_groups:
review_phase:
- security-auditor
- performance-engineer
- code-skeptic
testing_phase:
- sdet-engineer
- browser-automation
- visual-tester
# Evaluator-Optimizer Patterns
iteration_loops:
code_review:
evaluator: code-skeptic
optimizer: the-fixer
max_iterations: 3
convergence: all_issues_resolved
security_review:
evaluator: security-auditor
optimizer: the-fixer
max_iterations: 2
convergence: no_critical_vulnerabilities
performance_review:
evaluator: performance-engineer
optimizer: the-fixer
max_iterations: 2
convergence: all_perf_issues_resolved
# Quality Gates
quality_gates:
requirements:
- user_stories_defined
- acceptance_criteria_complete
- technical_constraints_documented
architecture:
- schema_valid
- endpoints_documented
- tech_stack_decided
implementation:
- build_success
- no_type_errors
- no_lint_errors
testing:
- coverage_gte_80
- all_tests_pass
- no_critical_bugs
review:
- no_critical_issues
- no_security_vulnerabilities
- performance_acceptable
docker:
- build_success
- health_check_pass
- size_under_limit
documentation:
- readme_complete
- api_docs_complete
- deployment_guide_complete
# State Transitions
workflow_states:
new: [planned]
planned: [researching]
researching: [designed]
designed: [testing]
testing: [implementing]
implementing: [reviewing]
reviewing: [fixing, perf_check]
fixing: [reviewing]
perf_check: [security_check]
security_check: [releasing]
releasing: [evaluated]
evaluated: [completed]

425
IMPROVEMENT_PROPOSAL.md Normal file
View File

@@ -0,0 +1,425 @@
# Multi-Agent System Improvement Proposal
## Executive Summary
Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.
**Current State:** 22 agents, 18 commands, 12 skills
**Issues:** Mode confusion, serial execution, overlapping capabilities
**Goal:** Optimize for efficiency, maintainability, and quality
---
## Analysis Findings
### 1. Agent Inventory
| Agent | Mode | Role | Issues |
|-------|------|------|--------|
| orchestrator | all | Dispatcher | ✅ Correct |
| capability-analyst | subagent | Gap analysis | ✅ Correct |
| history-miner | subagent | Git search | ✅ Correct |
| requirement-refiner | subagent | User stories | ✅ Correct |
| system-analyst | subagent | Architecture | ✅ Correct |
| sdet-engineer | subagent | Test writing | ✅ Correct |
| lead-developer | all | Code writing | ⚠️ Should be subagent |
| frontend-developer | subagent | UI implementation | ✅ Correct |
| backend-developer | subagent | Node/Express/APIs | ✅ Correct |
| workflow-architect | subagent | Create workflows | ✅ Correct |
| code-skeptic | all | Adversarial review | ⚠️ Should be subagent |
| the-fixer | subagent | Bug fixes | ✅ Correct |
| performance-engineer | subagent | Performance review | ✅ Correct |
| security-auditor | subagent | Security audit | ✅ Correct |
| release-manager | all | Git operations | ⚠️ Should be subagent |
| evaluator | all | Scoring | ⚠️ Should be subagent |
| prompt-optimizer | subagent | Optimize prompts | ✅ Correct |
| product-owner | subagent | Issue management | ✅ Correct |
| visual-tester | subagent | Visual regression | ✅ Correct |
| browser-automation | subagent | E2E testing | ✅ Correct |
| markdown-validator | subagent | Markdown validation | ✅ Correct |
| agent-architect | subagent | Create agents | ✅ Correct |
### 2. Issue Summary
| Issue | Severity | Impact |
|-------|----------|--------|
| Mode confusion (all vs subagent) | Medium | Context pollution |
| Serial execution of independent tasks | High | Slower execution |
| No parallelization pattern | High | Latency overhead |
| Overlapping agent roles | Low | Maint overhead |
| Quality gates not enforced | Medium | Quality variance |
---
## Proposed Improvements
### Improvement 1: Normalize Agent Modes
**Problem:** Many agents use `mode: all` but are conceptually subagents that should run in isolated contexts.
**Solution:** Change all specialized agents to `mode: subagent`:
```yaml
# Before
lead-developer:
mode: all
# After
lead-developer:
mode: subagent
```
**Files to Update:**
- `.kilo/agents/lead-developer.md`
- `.kilo/agents/code-skeptic.md`
- `.kilo/agents/release-manager.md`
- `.kilo/agents/evaluator.md`
**Rationale:** Subagent mode provides:
- Isolated context
- Clear input/output contracts
- Better token efficiency
- Prevents context pollution
---
### Improvement 2: Implement Parallelization Pattern
**Problem:** Security and performance reviews run serially but are independent.
**Solution:** Use orchestrator-workers pattern for parallel execution:
```python
async def execute_parallel_reviews():
"""Run security and performance reviews in parallel"""
tasks = [
Task(subagent_type="security-auditor", prompt="..."),
Task(subagent_type="performance-engineer", prompt="...")
]
results = await asyncio.gather(*tasks)
# Collect all issues
all_issues = [
*results[0].security_issues,
*results[1].performance_issues
]
if all_issues:
return Task(subagent_type="the-fixer", issues=all_issues)
```
**New Workflow Step:**
```markdown
## Step 6: Parallel Review
**Agents**: `@security-auditor`, `@performance-engineer` (parallel)
1. Launch both agents simultaneously
2. Wait for both results
3. Aggregate findings
4. If issues found → send to `@the-fixer`
5. If all pass → proceed to release
```
**Rationale:** Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.
---
### Improvement 3: Evaluator-Optimizer Pattern
**Problem:** Code review loop is informal - `code-skeptic``the-fixer` lacks structured iteration.
**Solution:** Formalize as evaluator-optimizer pattern:
```yaml
# New agent definition
code-skeptic:
role: evaluator
outputs:
- verdict: APPROVED | REQUEST_CHANGES
- issues: List[Issue]
- severity: critical | high | medium | low
the-fixer:
role: optimizer
inputs:
- issues: List[Issue]
- code: CodeContext
outputs:
- changes: List[Change]
- resolution_notes: List[str]
# Iteration loop
max_iterations: 3
convergence_criteria: all_issues_resolved OR max_iterations_reached
```
**Implementation:**
```python
def review_loop(issue_number, code_context):
"""Evaluator-Optimizer pattern for code review"""
for iteration in range(max_iterations=3):
# Evaluator reviews
review = task(subagent_type="code-skeptic", code=code_context)
if review.verdict == "APPROVED":
return review
# Optimizer fixes
fix = task(
subagent_type="the-fixer",
issues=review.issues,
code=code_context
)
code_context = apply_fixes(code_context, fix.changes)
iteration += 1
# Escalate if not resolved
post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")
```
**Rationale:** Structured iteration prevents infinite loops and ensures convergence.
---
### Improvement 4: Quality Gate Enforcement
**Problem:** Workflow defines quality gates but agents don't enforce them.
**Solution:** Add gate validation to each agent:
```yaml
# Add to each agent definition
gates:
preconditions:
- files_exist: true
- tests_pass: true
postconditions:
- build_succeeds: true
- coverage_met: true
- no_critical_issues: true
```
**Implementation in Workflow:**
```python
def validate_gate(agent_name, gate_name, artifacts):
"""Validate quality gate before proceeding"""
gates = {
"requirements": ["user_stories_defined", "acceptance_criteria_complete"],
"architecture": ["schema_valid", "endpoints_documented"],
"implementation": ["build_success", "no_type_errors"],
"testing": ["coverage >= 80", "all_tests_pass"],
"review": ["no_critical_issues", "no_security_vulnerabilities"],
"docker": ["build_success", "health_check_pass"]
}
gate_checks = gates[gate_name]
results = run_checks(gate_checks, artifacts)
if not results.all_passed:
raise GateError(f"Gate {gate_name} failed: {results.failed}")
return results
```
---
### Improvement 5: Agent Capability Consolidation
**Problem:** Some agents have overlapping capabilities.
**Solution:** Merge and clarify responsibilities:
| Merge From | Merge To | Rationale |
|------------|----------|-----------|
| browser-automation | sdet-engineer | E2E testing is SDET domain |
| markdown-validator | requirement-refiner | Validation is refiner's job |
**New SDET Engineer Capabilities:**
```yaml
sdet-engineer:
capabilities:
- unit_tests
- integration_tests
- e2e_tests:
tool: playwright
browser: chromium, firefox, webkit
- visual_regression:
tool: pixelmatch
threshold: 0.1
```
**Rationale:** Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.
---
### Improvement 6: Add Capability Index
**Problem:** No central registry of what each agent can do.
**Solution:** Create capability index for orchestrator:
```yaml
# .kilo/capability-index.yaml
agents:
lead-developer:
capabilities:
- code_writing
- refactoring
- bug_fixing
receives:
- tests
- specifications
produces:
- code
- documentation
code-skeptic:
capabilities:
- code_review
- security_review
- style_review
receives:
- code
produces:
- review_comments
- approval_status
forbidden:
- suggest_implementations
```
**Usage in Orchestrator:**
```python
def route_task(task_type: str) -> str:
"""Route task to appropriate agent based on capability"""
capability_map = {
"code_writing": "lead-developer",
"code_review": "code-skeptic",
"test_writing": "sdet-engineer",
"architecture": "system-analyst",
"security": "security-auditor",
"performance": "performance-engineer"
}
return capability_map.get(task_type, "orchestrator")
```
---
### Improvement 7: Workflow State Machine Enforcement
**Problem:** Workflow state machine is documented but not enforced.
**Solution:** Add explicit state transitions:
```python
# State machine definition
from enum import Enum
from typing import Dict, List
class WorkflowState(Enum):
NEW = "new"
PLANNED = "planned"
RESEARCHING = "researching"
DESIGNED = "designed"
TESTING = "testing"
IMPLEMENTING = "implementing"
REVIEWING = "reviewing"
FIXING = "fixing"
PERF_CHECK = "perf-check"
SECURITY_CHECK = "security-check"
RELEASING = "releasing"
EVALUATED = "evaluated"
COMPLETED = "completed"
# Valid transitions
TRANSITIONS = {
WorkflowState.NEW: [WorkflowState.PLANNED],
WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
WorkflowState.DESIGNED: [WorkflowState.TESTING],
WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
WorkflowState.FIXING: [WorkflowState.REVIEWING],
WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
WorkflowState.RELEASING: [WorkflowState.EVALUATED],
WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
}
def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
"""Validate state transition"""
valid_next = TRANSITIONS.get(current, [])
if next_state not in valid_next:
raise InvalidTransition(f"Cannot go from {current} to {next_state}")
return True
```
---
## Implementation Priority
| Priority | Improvement | Effort | Impact |
|----------|-------------|--------|--------|
| P0 | Implement Parallelization | Medium | High |
| P0 | Quality Gate Enforcement | Medium | High |
| P1 | Normalize Agent Modes | Low | Medium |
| P1 | Evaluator-Optimizer Pattern | Low | High |
| P2 | Agent Consolidation | Medium | Low |
| P2 | Capability Index | Low | Medium |
| P3 | State Machine Enforcement | Medium | Medium |
---
## Files to Modify
### Must Modify
1. `.kilo/agents/lead-developer.md` - Change mode to `subagent`
2. `.kilo/agents/code-skeptic.md` - Change mode to `subagent`
3. `.kilo/agents/release-manager.md` - Change mode to `subagent`
4. `.kilo/agents/evaluator.md` - Change mode to `subagent`
5. `.kilo/commands/workflow.md` - Add parallel execution
6. `.kilo/agents/orchestrator.md` - Add evaluator-optimizer pattern
### Must Create
1. `.kilo/capability-index.yaml` - Agent capabilities registry
2. `.kilo/skills/quality-gates/SKILL.md` - Gate validation skill
---
## Expected Outcomes
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Workflow duration | ~3 hours | ~2 hours | 33% faster |
| Review iterations | 2-5 | 1-3 | 40% fewer |
| Agent context pollution | High | Low | Isolated |
| Quality gate failures | Manual | Automated | Consistent |
---
## Next Steps
1. **Apply this proposal as issues** - Create Gitea issues for each improvement
2. **Run `/pipeline` for each** - Use existing pipeline to implement
3. **Measure improvements** - Use evaluator to track effectiveness
4. **Iterate** - Use prompt-optimizer to refine
---
*Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research*