Archive: - docker-compose.yml, Dockerfile.playwright - scripts/ (legacy test scripts) - docs/, .test/ (old documentation and tests) - IMPROVEMENT_PROPOSAL.md (superseded by .kilo/) - BROWSER_VISIBILITY.md, README.Docker.md - cleanup-packages.sh, fix-permissions.sh, install-apaw.sh Keep in root: - .kilo/ (active system) - .claude/ (Claude Code runtime) - AGENTS.md (agent reference) - README.md (main documentation) - src/ (utility code) - package.json, tsconfig.json (project config)
425 lines
12 KiB
Markdown
425 lines
12 KiB
Markdown
# Multi-Agent System Improvement Proposal
|
|
|
|
## Executive Summary
|
|
|
|
Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.
|
|
|
|
**Current State:** 22 agents, 18 commands, 12 skills
|
|
**Issues:** Mode confusion, serial execution, overlapping capabilities
|
|
**Goal:** Optimize for efficiency, maintainability, and quality
|
|
|
|
---
|
|
|
|
## Analysis Findings
|
|
|
|
### 1. Agent Inventory
|
|
|
|
| Agent | Mode | Role | Issues |
|
|
|-------|------|------|--------|
|
|
| orchestrator | all | Dispatcher | ✅ Correct |
|
|
| capability-analyst | subagent | Gap analysis | ✅ Correct |
|
|
| history-miner | subagent | Git search | ✅ Correct |
|
|
| requirement-refiner | subagent | User stories | ✅ Correct |
|
|
| system-analyst | subagent | Architecture | ✅ Correct |
|
|
| sdet-engineer | subagent | Test writing | ✅ Correct |
|
|
| lead-developer | all | Code writing | ⚠️ Should be subagent |
|
|
| frontend-developer | subagent | UI implementation | ✅ Correct |
|
|
| backend-developer | subagent | Node/Express/APIs | ✅ Correct |
|
|
| workflow-architect | subagent | Create workflows | ✅ Correct |
|
|
| code-skeptic | all | Adversarial review | ⚠️ Should be subagent |
|
|
| the-fixer | subagent | Bug fixes | ✅ Correct |
|
|
| performance-engineer | subagent | Performance review | ✅ Correct |
|
|
| security-auditor | subagent | Security audit | ✅ Correct |
|
|
| release-manager | all | Git operations | ⚠️ Should be subagent |
|
|
| evaluator | all | Scoring | ⚠️ Should be subagent |
|
|
| prompt-optimizer | subagent | Optimize prompts | ✅ Correct |
|
|
| product-owner | subagent | Issue management | ✅ Correct |
|
|
| visual-tester | subagent | Visual regression | ✅ Correct |
|
|
| browser-automation | subagent | E2E testing | ✅ Correct |
|
|
| markdown-validator | subagent | Markdown validation | ✅ Correct |
|
|
| agent-architect | subagent | Create agents | ✅ Correct |
|
|
|
|
### 2. Issue Summary
|
|
|
|
| Issue | Severity | Impact |
|
|
|-------|----------|--------|
|
|
| Mode confusion (all vs subagent) | Medium | Context pollution |
|
|
| Serial execution of independent tasks | High | Slower execution |
|
|
| No parallelization pattern | High | Latency overhead |
|
|
| Overlapping agent roles | Low | Maint overhead |
|
|
| Quality gates not enforced | Medium | Quality variance |
|
|
|
|
---
|
|
|
|
## Proposed Improvements
|
|
|
|
### Improvement 1: Normalize Agent Modes
|
|
|
|
**Problem:** Many agents use `mode: all` but are conceptually subagents that should run in isolated contexts.
|
|
|
|
**Solution:** Change all specialized agents to `mode: subagent`:
|
|
|
|
```yaml
|
|
# Before
|
|
lead-developer:
|
|
mode: all
|
|
|
|
# After
|
|
lead-developer:
|
|
mode: subagent
|
|
```
|
|
|
|
**Files to Update:**
|
|
- `.kilo/agents/lead-developer.md`
|
|
- `.kilo/agents/code-skeptic.md`
|
|
- `.kilo/agents/release-manager.md`
|
|
- `.kilo/agents/evaluator.md`
|
|
|
|
**Rationale:** Subagent mode provides:
|
|
- Isolated context
|
|
- Clear input/output contracts
|
|
- Better token efficiency
|
|
- Prevents context pollution
|
|
|
|
---
|
|
|
|
### Improvement 2: Implement Parallelization Pattern
|
|
|
|
**Problem:** Security and performance reviews run serially but are independent.
|
|
|
|
**Solution:** Use orchestrator-workers pattern for parallel execution:
|
|
|
|
```python
|
|
async def execute_parallel_reviews():
|
|
"""Run security and performance reviews in parallel"""
|
|
|
|
tasks = [
|
|
Task(subagent_type="security-auditor", prompt="..."),
|
|
Task(subagent_type="performance-engineer", prompt="...")
|
|
]
|
|
|
|
results = await asyncio.gather(*tasks)
|
|
|
|
# Collect all issues
|
|
all_issues = [
|
|
*results[0].security_issues,
|
|
*results[1].performance_issues
|
|
]
|
|
|
|
if all_issues:
|
|
return Task(subagent_type="the-fixer", issues=all_issues)
|
|
```
|
|
|
|
**New Workflow Step:**
|
|
|
|
```markdown
|
|
## Step 6: Parallel Review
|
|
|
|
**Agents**: `@security-auditor`, `@performance-engineer` (parallel)
|
|
|
|
1. Launch both agents simultaneously
|
|
2. Wait for both results
|
|
3. Aggregate findings
|
|
4. If issues found → send to `@the-fixer`
|
|
5. If all pass → proceed to release
|
|
```
|
|
|
|
**Rationale:** Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.
|
|
|
|
---
|
|
|
|
### Improvement 3: Evaluator-Optimizer Pattern
|
|
|
|
**Problem:** Code review loop is informal - `code-skeptic` → `the-fixer` lacks structured iteration.
|
|
|
|
**Solution:** Formalize as evaluator-optimizer pattern:
|
|
|
|
```yaml
|
|
# New agent definition
|
|
code-skeptic:
|
|
role: evaluator
|
|
outputs:
|
|
- verdict: APPROVED | REQUEST_CHANGES
|
|
- issues: List[Issue]
|
|
- severity: critical | high | medium | low
|
|
|
|
the-fixer:
|
|
role: optimizer
|
|
inputs:
|
|
- issues: List[Issue]
|
|
- code: CodeContext
|
|
outputs:
|
|
- changes: List[Change]
|
|
- resolution_notes: List[str]
|
|
|
|
# Iteration loop
|
|
max_iterations: 3
|
|
convergence_criteria: all_issues_resolved OR max_iterations_reached
|
|
```
|
|
|
|
**Implementation:**
|
|
|
|
```python
|
|
def review_loop(issue_number, code_context):
|
|
"""Evaluator-Optimizer pattern for code review"""
|
|
|
|
for iteration in range(max_iterations=3):
|
|
# Evaluator reviews
|
|
review = task(subagent_type="code-skeptic", code=code_context)
|
|
|
|
if review.verdict == "APPROVED":
|
|
return review
|
|
|
|
# Optimizer fixes
|
|
fix = task(
|
|
subagent_type="the-fixer",
|
|
issues=review.issues,
|
|
code=code_context
|
|
)
|
|
|
|
code_context = apply_fixes(code_context, fix.changes)
|
|
iteration += 1
|
|
|
|
# Escalate if not resolved
|
|
post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")
|
|
```
|
|
|
|
**Rationale:** Structured iteration prevents infinite loops and ensures convergence.
|
|
|
|
---
|
|
|
|
### Improvement 4: Quality Gate Enforcement
|
|
|
|
**Problem:** Workflow defines quality gates but agents don't enforce them.
|
|
|
|
**Solution:** Add gate validation to each agent:
|
|
|
|
```yaml
|
|
# Add to each agent definition
|
|
gates:
|
|
preconditions:
|
|
- files_exist: true
|
|
- tests_pass: true
|
|
postconditions:
|
|
- build_succeeds: true
|
|
- coverage_met: true
|
|
- no_critical_issues: true
|
|
```
|
|
|
|
**Implementation in Workflow:**
|
|
|
|
```python
|
|
def validate_gate(agent_name, gate_name, artifacts):
|
|
"""Validate quality gate before proceeding"""
|
|
|
|
gates = {
|
|
"requirements": ["user_stories_defined", "acceptance_criteria_complete"],
|
|
"architecture": ["schema_valid", "endpoints_documented"],
|
|
"implementation": ["build_success", "no_type_errors"],
|
|
"testing": ["coverage >= 80", "all_tests_pass"],
|
|
"review": ["no_critical_issues", "no_security_vulnerabilities"],
|
|
"docker": ["build_success", "health_check_pass"]
|
|
}
|
|
|
|
gate_checks = gates[gate_name]
|
|
results = run_checks(gate_checks, artifacts)
|
|
|
|
if not results.all_passed:
|
|
raise GateError(f"Gate {gate_name} failed: {results.failed}")
|
|
|
|
return results
|
|
```
|
|
|
|
---
|
|
|
|
### Improvement 5: Agent Capability Consolidation
|
|
|
|
**Problem:** Some agents have overlapping capabilities.
|
|
|
|
**Solution:** Merge and clarify responsibilities:
|
|
|
|
| Merge From | Merge To | Rationale |
|
|
|------------|----------|-----------|
|
|
| browser-automation | sdet-engineer | E2E testing is SDET domain |
|
|
| markdown-validator | requirement-refiner | Validation is refiner's job |
|
|
|
|
**New SDET Engineer Capabilities:**
|
|
|
|
```yaml
|
|
sdet-engineer:
|
|
capabilities:
|
|
- unit_tests
|
|
- integration_tests
|
|
- e2e_tests:
|
|
tool: playwright
|
|
browser: chromium, firefox, webkit
|
|
- visual_regression:
|
|
tool: pixelmatch
|
|
threshold: 0.1
|
|
```
|
|
|
|
**Rationale:** Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.
|
|
|
|
---
|
|
|
|
### Improvement 6: Add Capability Index
|
|
|
|
**Problem:** No central registry of what each agent can do.
|
|
|
|
**Solution:** Create capability index for orchestrator:
|
|
|
|
```yaml
|
|
# .kilo/capability-index.yaml
|
|
|
|
agents:
|
|
lead-developer:
|
|
capabilities:
|
|
- code_writing
|
|
- refactoring
|
|
- bug_fixing
|
|
receives:
|
|
- tests
|
|
- specifications
|
|
produces:
|
|
- code
|
|
- documentation
|
|
|
|
code-skeptic:
|
|
capabilities:
|
|
- code_review
|
|
- security_review
|
|
- style_review
|
|
receives:
|
|
- code
|
|
produces:
|
|
- review_comments
|
|
- approval_status
|
|
forbidden:
|
|
- suggest_implementations
|
|
```
|
|
|
|
**Usage in Orchestrator:**
|
|
|
|
```python
|
|
def route_task(task_type: str) -> str:
|
|
"""Route task to appropriate agent based on capability"""
|
|
|
|
capability_map = {
|
|
"code_writing": "lead-developer",
|
|
"code_review": "code-skeptic",
|
|
"test_writing": "sdet-engineer",
|
|
"architecture": "system-analyst",
|
|
"security": "security-auditor",
|
|
"performance": "performance-engineer"
|
|
}
|
|
|
|
return capability_map.get(task_type, "orchestrator")
|
|
```
|
|
|
|
---
|
|
|
|
### Improvement 7: Workflow State Machine Enforcement
|
|
|
|
**Problem:** Workflow state machine is documented but not enforced.
|
|
|
|
**Solution:** Add explicit state transitions:
|
|
|
|
```python
|
|
# State machine definition
|
|
from enum import Enum
|
|
from typing import Dict, List
|
|
|
|
class WorkflowState(Enum):
|
|
NEW = "new"
|
|
PLANNED = "planned"
|
|
RESEARCHING = "researching"
|
|
DESIGNED = "designed"
|
|
TESTING = "testing"
|
|
IMPLEMENTING = "implementing"
|
|
REVIEWING = "reviewing"
|
|
FIXING = "fixing"
|
|
PERF_CHECK = "perf-check"
|
|
SECURITY_CHECK = "security-check"
|
|
RELEASING = "releasing"
|
|
EVALUATED = "evaluated"
|
|
COMPLETED = "completed"
|
|
|
|
# Valid transitions
|
|
TRANSITIONS = {
|
|
WorkflowState.NEW: [WorkflowState.PLANNED],
|
|
WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
|
|
WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
|
|
WorkflowState.DESIGNED: [WorkflowState.TESTING],
|
|
WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
|
|
WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
|
|
WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
|
|
WorkflowState.FIXING: [WorkflowState.REVIEWING],
|
|
WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
|
|
WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
|
|
WorkflowState.RELEASING: [WorkflowState.EVALUATED],
|
|
WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
|
|
}
|
|
|
|
def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
|
|
"""Validate state transition"""
|
|
valid_next = TRANSITIONS.get(current, [])
|
|
if next_state not in valid_next:
|
|
raise InvalidTransition(f"Cannot go from {current} to {next_state}")
|
|
return True
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Priority
|
|
|
|
| Priority | Improvement | Effort | Impact |
|
|
|----------|-------------|--------|--------|
|
|
| P0 | Implement Parallelization | Medium | High |
|
|
| P0 | Quality Gate Enforcement | Medium | High |
|
|
| P1 | Normalize Agent Modes | Low | Medium |
|
|
| P1 | Evaluator-Optimizer Pattern | Low | High |
|
|
| P2 | Agent Consolidation | Medium | Low |
|
|
| P2 | Capability Index | Low | Medium |
|
|
| P3 | State Machine Enforcement | Medium | Medium |
|
|
|
|
---
|
|
|
|
## Files to Modify
|
|
|
|
### Must Modify
|
|
|
|
1. `.kilo/agents/lead-developer.md` - Change mode to `subagent`
|
|
2. `.kilo/agents/code-skeptic.md` - Change mode to `subagent`
|
|
3. `.kilo/agents/release-manager.md` - Change mode to `subagent`
|
|
4. `.kilo/agents/evaluator.md` - Change mode to `subagent`
|
|
5. `.kilo/commands/workflow.md` - Add parallel execution
|
|
6. `.kilo/agents/orchestrator.md` - Add evaluator-optimizer pattern
|
|
|
|
### Must Create
|
|
|
|
1. `.kilo/capability-index.yaml` - Agent capabilities registry
|
|
2. `.kilo/skills/quality-gates/SKILL.md` - Gate validation skill
|
|
|
|
---
|
|
|
|
## Expected Outcomes
|
|
|
|
| Metric | Before | After | Improvement |
|
|
|--------|--------|-------|-------------|
|
|
| Workflow duration | ~3 hours | ~2 hours | 33% faster |
|
|
| Review iterations | 2-5 | 1-3 | 40% fewer |
|
|
| Agent context pollution | High | Low | Isolated |
|
|
| Quality gate failures | Manual | Automated | Consistent |
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Apply this proposal as issues** - Create Gitea issues for each improvement
|
|
2. **Run `/pipeline` for each** - Use existing pipeline to implement
|
|
3. **Measure improvements** - Use evaluator to track effectiveness
|
|
4. **Iterate** - Use prompt-optimizer to refine
|
|
|
|
---
|
|
|
|
*Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research* |