# Multi-Agent System Improvement Proposal ## Executive Summary Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes. **Current State:** 22 agents, 18 commands, 12 skills **Issues:** Mode confusion, serial execution, overlapping capabilities **Goal:** Optimize for efficiency, maintainability, and quality --- ## Analysis Findings ### 1. Agent Inventory | Agent | Mode | Role | Issues | |-------|------|------|--------| | orchestrator | all | Dispatcher | ✅ Correct | | capability-analyst | subagent | Gap analysis | ✅ Correct | | history-miner | subagent | Git search | ✅ Correct | | requirement-refiner | subagent | User stories | ✅ Correct | | system-analyst | subagent | Architecture | ✅ Correct | | sdet-engineer | subagent | Test writing | ✅ Correct | | lead-developer | all | Code writing | ⚠️ Should be subagent | | frontend-developer | subagent | UI implementation | ✅ Correct | | backend-developer | subagent | Node/Express/APIs | ✅ Correct | | workflow-architect | subagent | Create workflows | ✅ Correct | | code-skeptic | all | Adversarial review | ⚠️ Should be subagent | | the-fixer | subagent | Bug fixes | ✅ Correct | | performance-engineer | subagent | Performance review | ✅ Correct | | security-auditor | subagent | Security audit | ✅ Correct | | release-manager | all | Git operations | ⚠️ Should be subagent | | evaluator | all | Scoring | ⚠️ Should be subagent | | prompt-optimizer | subagent | Optimize prompts | ✅ Correct | | product-owner | subagent | Issue management | ✅ Correct | | visual-tester | subagent | Visual regression | ✅ Correct | | browser-automation | subagent | E2E testing | ✅ Correct | | markdown-validator | subagent | Markdown validation | ✅ Correct | | agent-architect | subagent | Create agents | ✅ Correct | ### 2. Issue Summary | Issue | Severity | Impact | |-------|----------|--------| | Mode confusion (all vs subagent) | Medium | Context pollution | | Serial execution of independent tasks | High | Slower execution | | No parallelization pattern | High | Latency overhead | | Overlapping agent roles | Low | Maint overhead | | Quality gates not enforced | Medium | Quality variance | --- ## Proposed Improvements ### Improvement 1: Normalize Agent Modes **Problem:** Many agents use `mode: all` but are conceptually subagents that should run in isolated contexts. **Solution:** Change all specialized agents to `mode: subagent`: ```yaml # Before lead-developer: mode: all # After lead-developer: mode: subagent ``` **Files to Update:** - `.kilo/agents/lead-developer.md` - `.kilo/agents/code-skeptic.md` - `.kilo/agents/release-manager.md` - `.kilo/agents/evaluator.md` **Rationale:** Subagent mode provides: - Isolated context - Clear input/output contracts - Better token efficiency - Prevents context pollution --- ### Improvement 2: Implement Parallelization Pattern **Problem:** Security and performance reviews run serially but are independent. **Solution:** Use orchestrator-workers pattern for parallel execution: ```python async def execute_parallel_reviews(): """Run security and performance reviews in parallel""" tasks = [ Task(subagent_type="security-auditor", prompt="..."), Task(subagent_type="performance-engineer", prompt="...") ] results = await asyncio.gather(*tasks) # Collect all issues all_issues = [ *results[0].security_issues, *results[1].performance_issues ] if all_issues: return Task(subagent_type="the-fixer", issues=all_issues) ``` **New Workflow Step:** ```markdown ## Step 6: Parallel Review **Agents**: `@security-auditor`, `@performance-engineer` (parallel) 1. Launch both agents simultaneously 2. Wait for both results 3. Aggregate findings 4. If issues found → send to `@the-fixer` 5. If all pass → proceed to release ``` **Rationale:** Anthropic's research shows parallelization reduces latency for independent tasks by ~50%. --- ### Improvement 3: Evaluator-Optimizer Pattern **Problem:** Code review loop is informal - `code-skeptic` → `the-fixer` lacks structured iteration. **Solution:** Formalize as evaluator-optimizer pattern: ```yaml # New agent definition code-skeptic: role: evaluator outputs: - verdict: APPROVED | REQUEST_CHANGES - issues: List[Issue] - severity: critical | high | medium | low the-fixer: role: optimizer inputs: - issues: List[Issue] - code: CodeContext outputs: - changes: List[Change] - resolution_notes: List[str] # Iteration loop max_iterations: 3 convergence_criteria: all_issues_resolved OR max_iterations_reached ``` **Implementation:** ```python def review_loop(issue_number, code_context): """Evaluator-Optimizer pattern for code review""" for iteration in range(max_iterations=3): # Evaluator reviews review = task(subagent_type="code-skeptic", code=code_context) if review.verdict == "APPROVED": return review # Optimizer fixes fix = task( subagent_type="the-fixer", issues=review.issues, code=code_context ) code_context = apply_fixes(code_context, fix.changes) iteration += 1 # Escalate if not resolved post_comment(issue_number, "⚠️ Max iterations reached, manual review needed") ``` **Rationale:** Structured iteration prevents infinite loops and ensures convergence. --- ### Improvement 4: Quality Gate Enforcement **Problem:** Workflow defines quality gates but agents don't enforce them. **Solution:** Add gate validation to each agent: ```yaml # Add to each agent definition gates: preconditions: - files_exist: true - tests_pass: true postconditions: - build_succeeds: true - coverage_met: true - no_critical_issues: true ``` **Implementation in Workflow:** ```python def validate_gate(agent_name, gate_name, artifacts): """Validate quality gate before proceeding""" gates = { "requirements": ["user_stories_defined", "acceptance_criteria_complete"], "architecture": ["schema_valid", "endpoints_documented"], "implementation": ["build_success", "no_type_errors"], "testing": ["coverage >= 80", "all_tests_pass"], "review": ["no_critical_issues", "no_security_vulnerabilities"], "docker": ["build_success", "health_check_pass"] } gate_checks = gates[gate_name] results = run_checks(gate_checks, artifacts) if not results.all_passed: raise GateError(f"Gate {gate_name} failed: {results.failed}") return results ``` --- ### Improvement 5: Agent Capability Consolidation **Problem:** Some agents have overlapping capabilities. **Solution:** Merge and clarify responsibilities: | Merge From | Merge To | Rationale | |------------|----------|-----------| | browser-automation | sdet-engineer | E2E testing is SDET domain | | markdown-validator | requirement-refiner | Validation is refiner's job | **New SDET Engineer Capabilities:** ```yaml sdet-engineer: capabilities: - unit_tests - integration_tests - e2e_tests: tool: playwright browser: chromium, firefox, webkit - visual_regression: tool: pixelmatch threshold: 0.1 ``` **Rationale:** Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent. --- ### Improvement 6: Add Capability Index **Problem:** No central registry of what each agent can do. **Solution:** Create capability index for orchestrator: ```yaml # .kilo/capability-index.yaml agents: lead-developer: capabilities: - code_writing - refactoring - bug_fixing receives: - tests - specifications produces: - code - documentation code-skeptic: capabilities: - code_review - security_review - style_review receives: - code produces: - review_comments - approval_status forbidden: - suggest_implementations ``` **Usage in Orchestrator:** ```python def route_task(task_type: str) -> str: """Route task to appropriate agent based on capability""" capability_map = { "code_writing": "lead-developer", "code_review": "code-skeptic", "test_writing": "sdet-engineer", "architecture": "system-analyst", "security": "security-auditor", "performance": "performance-engineer" } return capability_map.get(task_type, "orchestrator") ``` --- ### Improvement 7: Workflow State Machine Enforcement **Problem:** Workflow state machine is documented but not enforced. **Solution:** Add explicit state transitions: ```python # State machine definition from enum import Enum from typing import Dict, List class WorkflowState(Enum): NEW = "new" PLANNED = "planned" RESEARCHING = "researching" DESIGNED = "designed" TESTING = "testing" IMPLEMENTING = "implementing" REVIEWING = "reviewing" FIXING = "fixing" PERF_CHECK = "perf-check" SECURITY_CHECK = "security-check" RELEASING = "releasing" EVALUATED = "evaluated" COMPLETED = "completed" # Valid transitions TRANSITIONS = { WorkflowState.NEW: [WorkflowState.PLANNED], WorkflowState.PLANNED: [WorkflowState.RESEARCHING], WorkflowState.RESEARCHING: [WorkflowState.DESIGNED], WorkflowState.DESIGNED: [WorkflowState.TESTING], WorkflowState.TESTING: [WorkflowState.IMPLEMENTING], WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING], WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK], WorkflowState.FIXING: [WorkflowState.REVIEWING], WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK], WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING], WorkflowState.RELEASING: [WorkflowState.EVALUATED], WorkflowState.EVALUATED: [WorkflowState.COMPLETED], } def transition(current: WorkflowState, next_state: WorkflowState) -> bool: """Validate state transition""" valid_next = TRANSITIONS.get(current, []) if next_state not in valid_next: raise InvalidTransition(f"Cannot go from {current} to {next_state}") return True ``` --- ## Implementation Priority | Priority | Improvement | Effort | Impact | |----------|-------------|--------|--------| | P0 | Implement Parallelization | Medium | High | | P0 | Quality Gate Enforcement | Medium | High | | P1 | Normalize Agent Modes | Low | Medium | | P1 | Evaluator-Optimizer Pattern | Low | High | | P2 | Agent Consolidation | Medium | Low | | P2 | Capability Index | Low | Medium | | P3 | State Machine Enforcement | Medium | Medium | --- ## Files to Modify ### Must Modify 1. `.kilo/agents/lead-developer.md` - Change mode to `subagent` 2. `.kilo/agents/code-skeptic.md` - Change mode to `subagent` 3. `.kilo/agents/release-manager.md` - Change mode to `subagent` 4. `.kilo/agents/evaluator.md` - Change mode to `subagent` 5. `.kilo/commands/workflow.md` - Add parallel execution 6. `.kilo/agents/orchestrator.md` - Add evaluator-optimizer pattern ### Must Create 1. `.kilo/capability-index.yaml` - Agent capabilities registry 2. `.kilo/skills/quality-gates/SKILL.md` - Gate validation skill --- ## Expected Outcomes | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Workflow duration | ~3 hours | ~2 hours | 33% faster | | Review iterations | 2-5 | 1-3 | 40% fewer | | Agent context pollution | High | Low | Isolated | | Quality gate failures | Manual | Automated | Consistent | --- ## Next Steps 1. **Apply this proposal as issues** - Create Gitea issues for each improvement 2. **Run `/pipeline` for each** - Use existing pipeline to implement 3. **Measure improvements** - Use evaluator to track effectiveness 4. **Iterate** - Use prompt-optimizer to refine --- *Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research*