Files
APAW/archive/IMPROVEMENT_PROPOSAL.md
¨NW¨ 5a77528b23 refactor: clean up root directory - move deprecated files to archive
Archive:
- docker-compose.yml, Dockerfile.playwright
- scripts/ (legacy test scripts)
- docs/, .test/ (old documentation and tests)
- IMPROVEMENT_PROPOSAL.md (superseded by .kilo/)
- BROWSER_VISIBILITY.md, README.Docker.md
- cleanup-packages.sh, fix-permissions.sh, install-apaw.sh

Keep in root:
- .kilo/ (active system)
- .claude/ (Claude Code runtime)
- AGENTS.md (agent reference)
- README.md (main documentation)
- src/ (utility code)
- package.json, tsconfig.json (project config)
2026-04-05 03:52:10 +01:00

425 lines
12 KiB
Markdown

# Multi-Agent System Improvement Proposal
## Executive Summary
Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.
**Current State:** 22 agents, 18 commands, 12 skills
**Issues:** Mode confusion, serial execution, overlapping capabilities
**Goal:** Optimize for efficiency, maintainability, and quality
---
## Analysis Findings
### 1. Agent Inventory
| Agent | Mode | Role | Issues |
|-------|------|------|--------|
| orchestrator | all | Dispatcher | ✅ Correct |
| capability-analyst | subagent | Gap analysis | ✅ Correct |
| history-miner | subagent | Git search | ✅ Correct |
| requirement-refiner | subagent | User stories | ✅ Correct |
| system-analyst | subagent | Architecture | ✅ Correct |
| sdet-engineer | subagent | Test writing | ✅ Correct |
| lead-developer | all | Code writing | ⚠️ Should be subagent |
| frontend-developer | subagent | UI implementation | ✅ Correct |
| backend-developer | subagent | Node/Express/APIs | ✅ Correct |
| workflow-architect | subagent | Create workflows | ✅ Correct |
| code-skeptic | all | Adversarial review | ⚠️ Should be subagent |
| the-fixer | subagent | Bug fixes | ✅ Correct |
| performance-engineer | subagent | Performance review | ✅ Correct |
| security-auditor | subagent | Security audit | ✅ Correct |
| release-manager | all | Git operations | ⚠️ Should be subagent |
| evaluator | all | Scoring | ⚠️ Should be subagent |
| prompt-optimizer | subagent | Optimize prompts | ✅ Correct |
| product-owner | subagent | Issue management | ✅ Correct |
| visual-tester | subagent | Visual regression | ✅ Correct |
| browser-automation | subagent | E2E testing | ✅ Correct |
| markdown-validator | subagent | Markdown validation | ✅ Correct |
| agent-architect | subagent | Create agents | ✅ Correct |
### 2. Issue Summary
| Issue | Severity | Impact |
|-------|----------|--------|
| Mode confusion (all vs subagent) | Medium | Context pollution |
| Serial execution of independent tasks | High | Slower execution |
| No parallelization pattern | High | Latency overhead |
| Overlapping agent roles | Low | Maint overhead |
| Quality gates not enforced | Medium | Quality variance |
---
## Proposed Improvements
### Improvement 1: Normalize Agent Modes
**Problem:** Many agents use `mode: all` but are conceptually subagents that should run in isolated contexts.
**Solution:** Change all specialized agents to `mode: subagent`:
```yaml
# Before
lead-developer:
mode: all
# After
lead-developer:
mode: subagent
```
**Files to Update:**
- `.kilo/agents/lead-developer.md`
- `.kilo/agents/code-skeptic.md`
- `.kilo/agents/release-manager.md`
- `.kilo/agents/evaluator.md`
**Rationale:** Subagent mode provides:
- Isolated context
- Clear input/output contracts
- Better token efficiency
- Prevents context pollution
---
### Improvement 2: Implement Parallelization Pattern
**Problem:** Security and performance reviews run serially but are independent.
**Solution:** Use orchestrator-workers pattern for parallel execution:
```python
async def execute_parallel_reviews():
"""Run security and performance reviews in parallel"""
tasks = [
Task(subagent_type="security-auditor", prompt="..."),
Task(subagent_type="performance-engineer", prompt="...")
]
results = await asyncio.gather(*tasks)
# Collect all issues
all_issues = [
*results[0].security_issues,
*results[1].performance_issues
]
if all_issues:
return Task(subagent_type="the-fixer", issues=all_issues)
```
**New Workflow Step:**
```markdown
## Step 6: Parallel Review
**Agents**: `@security-auditor`, `@performance-engineer` (parallel)
1. Launch both agents simultaneously
2. Wait for both results
3. Aggregate findings
4. If issues found → send to `@the-fixer`
5. If all pass → proceed to release
```
**Rationale:** Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.
---
### Improvement 3: Evaluator-Optimizer Pattern
**Problem:** Code review loop is informal - `code-skeptic``the-fixer` lacks structured iteration.
**Solution:** Formalize as evaluator-optimizer pattern:
```yaml
# New agent definition
code-skeptic:
role: evaluator
outputs:
- verdict: APPROVED | REQUEST_CHANGES
- issues: List[Issue]
- severity: critical | high | medium | low
the-fixer:
role: optimizer
inputs:
- issues: List[Issue]
- code: CodeContext
outputs:
- changes: List[Change]
- resolution_notes: List[str]
# Iteration loop
max_iterations: 3
convergence_criteria: all_issues_resolved OR max_iterations_reached
```
**Implementation:**
```python
def review_loop(issue_number, code_context):
"""Evaluator-Optimizer pattern for code review"""
for iteration in range(max_iterations=3):
# Evaluator reviews
review = task(subagent_type="code-skeptic", code=code_context)
if review.verdict == "APPROVED":
return review
# Optimizer fixes
fix = task(
subagent_type="the-fixer",
issues=review.issues,
code=code_context
)
code_context = apply_fixes(code_context, fix.changes)
iteration += 1
# Escalate if not resolved
post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")
```
**Rationale:** Structured iteration prevents infinite loops and ensures convergence.
---
### Improvement 4: Quality Gate Enforcement
**Problem:** Workflow defines quality gates but agents don't enforce them.
**Solution:** Add gate validation to each agent:
```yaml
# Add to each agent definition
gates:
preconditions:
- files_exist: true
- tests_pass: true
postconditions:
- build_succeeds: true
- coverage_met: true
- no_critical_issues: true
```
**Implementation in Workflow:**
```python
def validate_gate(agent_name, gate_name, artifacts):
"""Validate quality gate before proceeding"""
gates = {
"requirements": ["user_stories_defined", "acceptance_criteria_complete"],
"architecture": ["schema_valid", "endpoints_documented"],
"implementation": ["build_success", "no_type_errors"],
"testing": ["coverage >= 80", "all_tests_pass"],
"review": ["no_critical_issues", "no_security_vulnerabilities"],
"docker": ["build_success", "health_check_pass"]
}
gate_checks = gates[gate_name]
results = run_checks(gate_checks, artifacts)
if not results.all_passed:
raise GateError(f"Gate {gate_name} failed: {results.failed}")
return results
```
---
### Improvement 5: Agent Capability Consolidation
**Problem:** Some agents have overlapping capabilities.
**Solution:** Merge and clarify responsibilities:
| Merge From | Merge To | Rationale |
|------------|----------|-----------|
| browser-automation | sdet-engineer | E2E testing is SDET domain |
| markdown-validator | requirement-refiner | Validation is refiner's job |
**New SDET Engineer Capabilities:**
```yaml
sdet-engineer:
capabilities:
- unit_tests
- integration_tests
- e2e_tests:
tool: playwright
browser: chromium, firefox, webkit
- visual_regression:
tool: pixelmatch
threshold: 0.1
```
**Rationale:** Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.
---
### Improvement 6: Add Capability Index
**Problem:** No central registry of what each agent can do.
**Solution:** Create capability index for orchestrator:
```yaml
# .kilo/capability-index.yaml
agents:
lead-developer:
capabilities:
- code_writing
- refactoring
- bug_fixing
receives:
- tests
- specifications
produces:
- code
- documentation
code-skeptic:
capabilities:
- code_review
- security_review
- style_review
receives:
- code
produces:
- review_comments
- approval_status
forbidden:
- suggest_implementations
```
**Usage in Orchestrator:**
```python
def route_task(task_type: str) -> str:
"""Route task to appropriate agent based on capability"""
capability_map = {
"code_writing": "lead-developer",
"code_review": "code-skeptic",
"test_writing": "sdet-engineer",
"architecture": "system-analyst",
"security": "security-auditor",
"performance": "performance-engineer"
}
return capability_map.get(task_type, "orchestrator")
```
---
### Improvement 7: Workflow State Machine Enforcement
**Problem:** Workflow state machine is documented but not enforced.
**Solution:** Add explicit state transitions:
```python
# State machine definition
from enum import Enum
from typing import Dict, List
class WorkflowState(Enum):
NEW = "new"
PLANNED = "planned"
RESEARCHING = "researching"
DESIGNED = "designed"
TESTING = "testing"
IMPLEMENTING = "implementing"
REVIEWING = "reviewing"
FIXING = "fixing"
PERF_CHECK = "perf-check"
SECURITY_CHECK = "security-check"
RELEASING = "releasing"
EVALUATED = "evaluated"
COMPLETED = "completed"
# Valid transitions
TRANSITIONS = {
WorkflowState.NEW: [WorkflowState.PLANNED],
WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
WorkflowState.DESIGNED: [WorkflowState.TESTING],
WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
WorkflowState.FIXING: [WorkflowState.REVIEWING],
WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
WorkflowState.RELEASING: [WorkflowState.EVALUATED],
WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
}
def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
"""Validate state transition"""
valid_next = TRANSITIONS.get(current, [])
if next_state not in valid_next:
raise InvalidTransition(f"Cannot go from {current} to {next_state}")
return True
```
---
## Implementation Priority
| Priority | Improvement | Effort | Impact |
|----------|-------------|--------|--------|
| P0 | Implement Parallelization | Medium | High |
| P0 | Quality Gate Enforcement | Medium | High |
| P1 | Normalize Agent Modes | Low | Medium |
| P1 | Evaluator-Optimizer Pattern | Low | High |
| P2 | Agent Consolidation | Medium | Low |
| P2 | Capability Index | Low | Medium |
| P3 | State Machine Enforcement | Medium | Medium |
---
## Files to Modify
### Must Modify
1. `.kilo/agents/lead-developer.md` - Change mode to `subagent`
2. `.kilo/agents/code-skeptic.md` - Change mode to `subagent`
3. `.kilo/agents/release-manager.md` - Change mode to `subagent`
4. `.kilo/agents/evaluator.md` - Change mode to `subagent`
5. `.kilo/commands/workflow.md` - Add parallel execution
6. `.kilo/agents/orchestrator.md` - Add evaluator-optimizer pattern
### Must Create
1. `.kilo/capability-index.yaml` - Agent capabilities registry
2. `.kilo/skills/quality-gates/SKILL.md` - Gate validation skill
---
## Expected Outcomes
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Workflow duration | ~3 hours | ~2 hours | 33% faster |
| Review iterations | 2-5 | 1-3 | 40% fewer |
| Agent context pollution | High | Low | Isolated |
| Quality gate failures | Manual | Automated | Consistent |
---
## Next Steps
1. **Apply this proposal as issues** - Create Gitea issues for each improvement
2. **Run `/pipeline` for each** - Use existing pipeline to implement
3. **Measure improvements** - Use evaluator to track effectiveness
4. **Iterate** - Use prompt-optimizer to refine
---
*Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research*