Archive: - docker-compose.yml, Dockerfile.playwright - scripts/ (legacy test scripts) - docs/, .test/ (old documentation and tests) - IMPROVEMENT_PROPOSAL.md (superseded by .kilo/) - BROWSER_VISIBILITY.md, README.Docker.md - cleanup-packages.sh, fix-permissions.sh, install-apaw.sh Keep in root: - .kilo/ (active system) - .claude/ (Claude Code runtime) - AGENTS.md (agent reference) - README.md (main documentation) - src/ (utility code) - package.json, tsconfig.json (project config)
12 KiB
Multi-Agent System Improvement Proposal
Executive Summary
Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.
Current State: 22 agents, 18 commands, 12 skills Issues: Mode confusion, serial execution, overlapping capabilities Goal: Optimize for efficiency, maintainability, and quality
Analysis Findings
1. Agent Inventory
| Agent | Mode | Role | Issues |
|---|---|---|---|
| orchestrator | all | Dispatcher | ✅ Correct |
| capability-analyst | subagent | Gap analysis | ✅ Correct |
| history-miner | subagent | Git search | ✅ Correct |
| requirement-refiner | subagent | User stories | ✅ Correct |
| system-analyst | subagent | Architecture | ✅ Correct |
| sdet-engineer | subagent | Test writing | ✅ Correct |
| lead-developer | all | Code writing | ⚠️ Should be subagent |
| frontend-developer | subagent | UI implementation | ✅ Correct |
| backend-developer | subagent | Node/Express/APIs | ✅ Correct |
| workflow-architect | subagent | Create workflows | ✅ Correct |
| code-skeptic | all | Adversarial review | ⚠️ Should be subagent |
| the-fixer | subagent | Bug fixes | ✅ Correct |
| performance-engineer | subagent | Performance review | ✅ Correct |
| security-auditor | subagent | Security audit | ✅ Correct |
| release-manager | all | Git operations | ⚠️ Should be subagent |
| evaluator | all | Scoring | ⚠️ Should be subagent |
| prompt-optimizer | subagent | Optimize prompts | ✅ Correct |
| product-owner | subagent | Issue management | ✅ Correct |
| visual-tester | subagent | Visual regression | ✅ Correct |
| browser-automation | subagent | E2E testing | ✅ Correct |
| markdown-validator | subagent | Markdown validation | ✅ Correct |
| agent-architect | subagent | Create agents | ✅ Correct |
2. Issue Summary
| Issue | Severity | Impact |
|---|---|---|
| Mode confusion (all vs subagent) | Medium | Context pollution |
| Serial execution of independent tasks | High | Slower execution |
| No parallelization pattern | High | Latency overhead |
| Overlapping agent roles | Low | Maint overhead |
| Quality gates not enforced | Medium | Quality variance |
Proposed Improvements
Improvement 1: Normalize Agent Modes
Problem: Many agents use mode: all but are conceptually subagents that should run in isolated contexts.
Solution: Change all specialized agents to mode: subagent:
# Before
lead-developer:
mode: all
# After
lead-developer:
mode: subagent
Files to Update:
.kilo/agents/lead-developer.md.kilo/agents/code-skeptic.md.kilo/agents/release-manager.md.kilo/agents/evaluator.md
Rationale: Subagent mode provides:
- Isolated context
- Clear input/output contracts
- Better token efficiency
- Prevents context pollution
Improvement 2: Implement Parallelization Pattern
Problem: Security and performance reviews run serially but are independent.
Solution: Use orchestrator-workers pattern for parallel execution:
async def execute_parallel_reviews():
"""Run security and performance reviews in parallel"""
tasks = [
Task(subagent_type="security-auditor", prompt="..."),
Task(subagent_type="performance-engineer", prompt="...")
]
results = await asyncio.gather(*tasks)
# Collect all issues
all_issues = [
*results[0].security_issues,
*results[1].performance_issues
]
if all_issues:
return Task(subagent_type="the-fixer", issues=all_issues)
New Workflow Step:
## Step 6: Parallel Review
**Agents**: `@security-auditor`, `@performance-engineer` (parallel)
1. Launch both agents simultaneously
2. Wait for both results
3. Aggregate findings
4. If issues found → send to `@the-fixer`
5. If all pass → proceed to release
Rationale: Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.
Improvement 3: Evaluator-Optimizer Pattern
Problem: Code review loop is informal - code-skeptic → the-fixer lacks structured iteration.
Solution: Formalize as evaluator-optimizer pattern:
# New agent definition
code-skeptic:
role: evaluator
outputs:
- verdict: APPROVED | REQUEST_CHANGES
- issues: List[Issue]
- severity: critical | high | medium | low
the-fixer:
role: optimizer
inputs:
- issues: List[Issue]
- code: CodeContext
outputs:
- changes: List[Change]
- resolution_notes: List[str]
# Iteration loop
max_iterations: 3
convergence_criteria: all_issues_resolved OR max_iterations_reached
Implementation:
def review_loop(issue_number, code_context):
"""Evaluator-Optimizer pattern for code review"""
for iteration in range(max_iterations=3):
# Evaluator reviews
review = task(subagent_type="code-skeptic", code=code_context)
if review.verdict == "APPROVED":
return review
# Optimizer fixes
fix = task(
subagent_type="the-fixer",
issues=review.issues,
code=code_context
)
code_context = apply_fixes(code_context, fix.changes)
iteration += 1
# Escalate if not resolved
post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")
Rationale: Structured iteration prevents infinite loops and ensures convergence.
Improvement 4: Quality Gate Enforcement
Problem: Workflow defines quality gates but agents don't enforce them.
Solution: Add gate validation to each agent:
# Add to each agent definition
gates:
preconditions:
- files_exist: true
- tests_pass: true
postconditions:
- build_succeeds: true
- coverage_met: true
- no_critical_issues: true
Implementation in Workflow:
def validate_gate(agent_name, gate_name, artifacts):
"""Validate quality gate before proceeding"""
gates = {
"requirements": ["user_stories_defined", "acceptance_criteria_complete"],
"architecture": ["schema_valid", "endpoints_documented"],
"implementation": ["build_success", "no_type_errors"],
"testing": ["coverage >= 80", "all_tests_pass"],
"review": ["no_critical_issues", "no_security_vulnerabilities"],
"docker": ["build_success", "health_check_pass"]
}
gate_checks = gates[gate_name]
results = run_checks(gate_checks, artifacts)
if not results.all_passed:
raise GateError(f"Gate {gate_name} failed: {results.failed}")
return results
Improvement 5: Agent Capability Consolidation
Problem: Some agents have overlapping capabilities.
Solution: Merge and clarify responsibilities:
| Merge From | Merge To | Rationale |
|---|---|---|
| browser-automation | sdet-engineer | E2E testing is SDET domain |
| markdown-validator | requirement-refiner | Validation is refiner's job |
New SDET Engineer Capabilities:
sdet-engineer:
capabilities:
- unit_tests
- integration_tests
- e2e_tests:
tool: playwright
browser: chromium, firefox, webkit
- visual_regression:
tool: pixelmatch
threshold: 0.1
Rationale: Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.
Improvement 6: Add Capability Index
Problem: No central registry of what each agent can do.
Solution: Create capability index for orchestrator:
# .kilo/capability-index.yaml
agents:
lead-developer:
capabilities:
- code_writing
- refactoring
- bug_fixing
receives:
- tests
- specifications
produces:
- code
- documentation
code-skeptic:
capabilities:
- code_review
- security_review
- style_review
receives:
- code
produces:
- review_comments
- approval_status
forbidden:
- suggest_implementations
Usage in Orchestrator:
def route_task(task_type: str) -> str:
"""Route task to appropriate agent based on capability"""
capability_map = {
"code_writing": "lead-developer",
"code_review": "code-skeptic",
"test_writing": "sdet-engineer",
"architecture": "system-analyst",
"security": "security-auditor",
"performance": "performance-engineer"
}
return capability_map.get(task_type, "orchestrator")
Improvement 7: Workflow State Machine Enforcement
Problem: Workflow state machine is documented but not enforced.
Solution: Add explicit state transitions:
# State machine definition
from enum import Enum
from typing import Dict, List
class WorkflowState(Enum):
NEW = "new"
PLANNED = "planned"
RESEARCHING = "researching"
DESIGNED = "designed"
TESTING = "testing"
IMPLEMENTING = "implementing"
REVIEWING = "reviewing"
FIXING = "fixing"
PERF_CHECK = "perf-check"
SECURITY_CHECK = "security-check"
RELEASING = "releasing"
EVALUATED = "evaluated"
COMPLETED = "completed"
# Valid transitions
TRANSITIONS = {
WorkflowState.NEW: [WorkflowState.PLANNED],
WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
WorkflowState.DESIGNED: [WorkflowState.TESTING],
WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
WorkflowState.FIXING: [WorkflowState.REVIEWING],
WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
WorkflowState.RELEASING: [WorkflowState.EVALUATED],
WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
}
def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
"""Validate state transition"""
valid_next = TRANSITIONS.get(current, [])
if next_state not in valid_next:
raise InvalidTransition(f"Cannot go from {current} to {next_state}")
return True
Implementation Priority
| Priority | Improvement | Effort | Impact |
|---|---|---|---|
| P0 | Implement Parallelization | Medium | High |
| P0 | Quality Gate Enforcement | Medium | High |
| P1 | Normalize Agent Modes | Low | Medium |
| P1 | Evaluator-Optimizer Pattern | Low | High |
| P2 | Agent Consolidation | Medium | Low |
| P2 | Capability Index | Low | Medium |
| P3 | State Machine Enforcement | Medium | Medium |
Files to Modify
Must Modify
.kilo/agents/lead-developer.md- Change mode tosubagent.kilo/agents/code-skeptic.md- Change mode tosubagent.kilo/agents/release-manager.md- Change mode tosubagent.kilo/agents/evaluator.md- Change mode tosubagent.kilo/commands/workflow.md- Add parallel execution.kilo/agents/orchestrator.md- Add evaluator-optimizer pattern
Must Create
.kilo/capability-index.yaml- Agent capabilities registry.kilo/skills/quality-gates/SKILL.md- Gate validation skill
Expected Outcomes
| Metric | Before | After | Improvement |
|---|---|---|---|
| Workflow duration | ~3 hours | ~2 hours | 33% faster |
| Review iterations | 2-5 | 1-3 | 40% fewer |
| Agent context pollution | High | Low | Isolated |
| Quality gate failures | Manual | Automated | Consistent |
Next Steps
- Apply this proposal as issues - Create Gitea issues for each improvement
- Run
/pipelinefor each - Use existing pipeline to implement - Measure improvements - Use evaluator to track effectiveness
- Iterate - Use prompt-optimizer to refine
Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research