APAW/archive/IMPROVEMENT_PROPOSAL.md

# Multi-Agent System Improvement Proposal

## Executive Summary

Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.

**Current State:** 22 agents, 18 commands, 12 skills
**Issues:** Mode confusion, serial execution, overlapping capabilities
**Goal:** Optimize for efficiency, maintainability, and quality

---

## Analysis Findings

### 1. Agent Inventory

| Agent | Mode | Role | Issues |
|-------|------|------|--------|
| orchestrator | all | Dispatcher | ✅ Correct |
| capability-analyst | subagent | Gap analysis | ✅ Correct |
| history-miner | subagent | Git search | ✅ Correct |
| requirement-refiner | subagent | User stories | ✅ Correct |
| system-analyst | subagent | Architecture | ✅ Correct |
| sdet-engineer | subagent | Test writing | ✅ Correct |
| lead-developer | all | Code writing | ⚠️ Should be subagent |
| frontend-developer | subagent | UI implementation | ✅ Correct |
| backend-developer | subagent | Node/Express/APIs | ✅ Correct |
| workflow-architect | subagent | Create workflows | ✅ Correct |
| code-skeptic | all | Adversarial review | ⚠️ Should be subagent |
| the-fixer | subagent | Bug fixes | ✅ Correct |
| performance-engineer | subagent | Performance review | ✅ Correct |
| security-auditor | subagent | Security audit | ✅ Correct |
| release-manager | all | Git operations | ⚠️ Should be subagent |
| evaluator | all | Scoring | ⚠️ Should be subagent |
| prompt-optimizer | subagent | Optimize prompts | ✅ Correct |
| product-owner | subagent | Issue management | ✅ Correct |
| visual-tester | subagent | Visual regression | ✅ Correct |
| browser-automation | subagent | E2E testing | ✅ Correct |
| markdown-validator | subagent | Markdown validation | ✅ Correct |
| agent-architect | subagent | Create agents | ✅ Correct |

### 2. Issue Summary

| Issue | Severity | Impact |
|-------|----------|--------|
| Mode confusion (all vs subagent) | Medium | Context pollution |
| Serial execution of independent tasks | High | Slower execution |
| No parallelization pattern | High | Latency overhead |
| Overlapping agent roles | Low | Maint overhead |
| Quality gates not enforced | Medium | Quality variance |

---

## Proposed Improvements

### Improvement 1: Normalize Agent Modes

**Problem:** Many agents use `mode: all` but are conceptually subagents that should run in isolated contexts.

**Solution:** Change all specialized agents to `mode: subagent`:

```yaml
# Before
lead-developer:
  mode: all

# After
lead-developer:
  mode: subagent
```

**Files to Update:**
- `.kilo/agents/lead-developer.md`
- `.kilo/agents/code-skeptic.md`
- `.kilo/agents/release-manager.md`
- `.kilo/agents/evaluator.md`

**Rationale:** Subagent mode provides:
- Isolated context
- Clear input/output contracts
- Better token efficiency
- Prevents context pollution

---

### Improvement 2: Implement Parallelization Pattern

**Problem:** Security and performance reviews run serially but are independent.

**Solution:** Use orchestrator-workers pattern for parallel execution:

```python
async def execute_parallel_reviews():
    """Run security and performance reviews in parallel"""

    tasks = [
        Task(subagent_type="security-auditor", prompt="..."),
        Task(subagent_type="performance-engineer", prompt="...")
    ]

    results = await asyncio.gather(*tasks)

    # Collect all issues
    all_issues = [
        *results[0].security_issues,
        *results[1].performance_issues
    ]

    if all_issues:
        return Task(subagent_type="the-fixer", issues=all_issues)
```

**New Workflow Step:**

```markdown
## Step 6: Parallel Review

**Agents**: `@security-auditor`, `@performance-engineer` (parallel)

1. Launch both agents simultaneously
2. Wait for both results
3. Aggregate findings
4. If issues found → send to `@the-fixer`
5. If all pass → proceed to release
```

**Rationale:** Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.

---

### Improvement 3: Evaluator-Optimizer Pattern

**Problem:** Code review loop is informal - `code-skeptic` → `the-fixer` lacks structured iteration.

**Solution:** Formalize as evaluator-optimizer pattern:

```yaml
# New agent definition
code-skeptic:
  role: evaluator
  outputs:
    - verdict: APPROVED | REQUEST_CHANGES
    - issues: List[Issue]
    - severity: critical | high | medium | low

the-fixer:
  role: optimizer
  inputs:
    - issues: List[Issue]
    - code: CodeContext
  outputs:
    - changes: List[Change]
    - resolution_notes: List[str]

# Iteration loop
max_iterations: 3
convergence_criteria: all_issues_resolved OR max_iterations_reached
```

**Implementation:**

```python
def review_loop(issue_number, code_context):
    """Evaluator-Optimizer pattern for code review"""

    for iteration in range(max_iterations=3):
        # Evaluator reviews
        review = task(subagent_type="code-skeptic", code=code_context)

        if review.verdict == "APPROVED":
            return review

        # Optimizer fixes
        fix = task(
            subagent_type="the-fixer",
            issues=review.issues,
            code=code_context
        )

        code_context = apply_fixes(code_context, fix.changes)
        iteration += 1

    # Escalate if not resolved
    post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")
```

**Rationale:** Structured iteration prevents infinite loops and ensures convergence.

---

### Improvement 4: Quality Gate Enforcement

**Problem:** Workflow defines quality gates but agents don't enforce them.

**Solution:** Add gate validation to each agent:

```yaml
# Add to each agent definition
gates:
  preconditions:
    - files_exist: true
    - tests_pass: true
  postconditions:
    - build_succeeds: true
    - coverage_met: true
    - no_critical_issues: true
```

**Implementation in Workflow:**

```python
def validate_gate(agent_name, gate_name, artifacts):
    """Validate quality gate before proceeding"""

    gates = {
        "requirements": ["user_stories_defined", "acceptance_criteria_complete"],
        "architecture": ["schema_valid", "endpoints_documented"],
        "implementation": ["build_success", "no_type_errors"],
        "testing": ["coverage >= 80", "all_tests_pass"],
        "review": ["no_critical_issues", "no_security_vulnerabilities"],
        "docker": ["build_success", "health_check_pass"]
    }

    gate_checks = gates[gate_name]
    results = run_checks(gate_checks, artifacts)

    if not results.all_passed:
        raise GateError(f"Gate {gate_name} failed: {results.failed}")

    return results
```

---

### Improvement 5: Agent Capability Consolidation

**Problem:** Some agents have overlapping capabilities.

**Solution:** Merge and clarify responsibilities:

| Merge From | Merge To | Rationale |
|------------|----------|-----------|
| browser-automation | sdet-engineer | E2E testing is SDET domain |
| markdown-validator | requirement-refiner | Validation is refiner's job |

**New SDET Engineer Capabilities:**

```yaml
sdet-engineer:
  capabilities:
    - unit_tests
    - integration_tests
    - e2e_tests:
        tool: playwright
        browser: chromium, firefox, webkit
    - visual_regression:
        tool: pixelmatch
        threshold: 0.1
```

**Rationale:** Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.

---

### Improvement 6: Add Capability Index

**Problem:** No central registry of what each agent can do.

**Solution:** Create capability index for orchestrator:

```yaml
# .kilo/capability-index.yaml

agents:
  lead-developer:
    capabilities:
      - code_writing
      - refactoring
      - bug_fixing
    receives:
      - tests
      - specifications
    produces:
      - code
      - documentation

  code-skeptic:
    capabilities:
      - code_review
      - security_review
      - style_review
    receives:
      - code
    produces:
      - review_comments
      - approval_status
    forbidden:
      - suggest_implementations
```

**Usage in Orchestrator:**

```python
def route_task(task_type: str) -> str:
    """Route task to appropriate agent based on capability"""

    capability_map = {
        "code_writing": "lead-developer",
        "code_review": "code-skeptic",
        "test_writing": "sdet-engineer",
        "architecture": "system-analyst",
        "security": "security-auditor",
        "performance": "performance-engineer"
    }

    return capability_map.get(task_type, "orchestrator")
```

---

### Improvement 7: Workflow State Machine Enforcement

**Problem:** Workflow state machine is documented but not enforced.

**Solution:** Add explicit state transitions:

```python
# State machine definition
from enum import Enum
from typing import Dict, List

class WorkflowState(Enum):
    NEW = "new"
    PLANNED = "planned"
    RESEARCHING = "researching"
    DESIGNED = "designed"
    TESTING = "testing"
    IMPLEMENTING = "implementing"
    REVIEWING = "reviewing"
    FIXING = "fixing"
    PERF_CHECK = "perf-check"
    SECURITY_CHECK = "security-check"
    RELEASING = "releasing"
    EVALUATED = "evaluated"
    COMPLETED = "completed"

# Valid transitions
TRANSITIONS = {
    WorkflowState.NEW: [WorkflowState.PLANNED],
    WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
    WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
    WorkflowState.DESIGNED: [WorkflowState.TESTING],
    WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
    WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
    WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
    WorkflowState.FIXING: [WorkflowState.REVIEWING],
    WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
    WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
    WorkflowState.RELEASING: [WorkflowState.EVALUATED],
    WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
}

def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
    """Validate state transition"""
    valid_next = TRANSITIONS.get(current, [])
    if next_state not in valid_next:
        raise InvalidTransition(f"Cannot go from {current} to {next_state}")
    return True
```

---

## Implementation Priority

| Priority | Improvement | Effort | Impact |
|----------|-------------|--------|--------|
| P0 | Implement Parallelization | Medium | High |
| P0 | Quality Gate Enforcement | Medium | High |
| P1 | Normalize Agent Modes | Low | Medium |
| P1 | Evaluator-Optimizer Pattern | Low | High |
| P2 | Agent Consolidation | Medium | Low |
| P2 | Capability Index | Low | Medium |
| P3 | State Machine Enforcement | Medium | Medium |

---

## Files to Modify

### Must Modify

1. `.kilo/agents/lead-developer.md` - Change mode to `subagent`
2. `.kilo/agents/code-skeptic.md` - Change mode to `subagent`
3. `.kilo/agents/release-manager.md` - Change mode to `subagent`
4. `.kilo/agents/evaluator.md` - Change mode to `subagent`
5. `.kilo/commands/workflow.md` - Add parallel execution
6. `.kilo/agents/orchestrator.md` - Add evaluator-optimizer pattern

### Must Create

1. `.kilo/capability-index.yaml` - Agent capabilities registry
2. `.kilo/skills/quality-gates/SKILL.md` - Gate validation skill

---

## Expected Outcomes

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Workflow duration | ~3 hours | ~2 hours | 33% faster |
| Review iterations | 2-5 | 1-3 | 40% fewer |
| Agent context pollution | High | Low | Isolated |
| Quality gate failures | Manual | Automated | Consistent |

---

## Next Steps

1. **Apply this proposal as issues** - Create Gitea issues for each improvement
2. **Run `/pipeline` for each** - Use existing pipeline to implement
3. **Measure improvements** - Use evaluator to track effectiveness
4. **Iterate** - Use prompt-optimizer to refine

---

*Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research*