Files
APAW/archive/IMPROVEMENT_PROPOSAL.md
¨NW¨ 5a77528b23 refactor: clean up root directory - move deprecated files to archive
Archive:
- docker-compose.yml, Dockerfile.playwright
- scripts/ (legacy test scripts)
- docs/, .test/ (old documentation and tests)
- IMPROVEMENT_PROPOSAL.md (superseded by .kilo/)
- BROWSER_VISIBILITY.md, README.Docker.md
- cleanup-packages.sh, fix-permissions.sh, install-apaw.sh

Keep in root:
- .kilo/ (active system)
- .claude/ (Claude Code runtime)
- AGENTS.md (agent reference)
- README.md (main documentation)
- src/ (utility code)
- package.json, tsconfig.json (project config)
2026-04-05 03:52:10 +01:00

12 KiB

Multi-Agent System Improvement Proposal

Executive Summary

Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.

Current State: 22 agents, 18 commands, 12 skills Issues: Mode confusion, serial execution, overlapping capabilities Goal: Optimize for efficiency, maintainability, and quality


Analysis Findings

1. Agent Inventory

Agent Mode Role Issues
orchestrator all Dispatcher Correct
capability-analyst subagent Gap analysis Correct
history-miner subagent Git search Correct
requirement-refiner subagent User stories Correct
system-analyst subagent Architecture Correct
sdet-engineer subagent Test writing Correct
lead-developer all Code writing ⚠️ Should be subagent
frontend-developer subagent UI implementation Correct
backend-developer subagent Node/Express/APIs Correct
workflow-architect subagent Create workflows Correct
code-skeptic all Adversarial review ⚠️ Should be subagent
the-fixer subagent Bug fixes Correct
performance-engineer subagent Performance review Correct
security-auditor subagent Security audit Correct
release-manager all Git operations ⚠️ Should be subagent
evaluator all Scoring ⚠️ Should be subagent
prompt-optimizer subagent Optimize prompts Correct
product-owner subagent Issue management Correct
visual-tester subagent Visual regression Correct
browser-automation subagent E2E testing Correct
markdown-validator subagent Markdown validation Correct
agent-architect subagent Create agents Correct

2. Issue Summary

Issue Severity Impact
Mode confusion (all vs subagent) Medium Context pollution
Serial execution of independent tasks High Slower execution
No parallelization pattern High Latency overhead
Overlapping agent roles Low Maint overhead
Quality gates not enforced Medium Quality variance

Proposed Improvements

Improvement 1: Normalize Agent Modes

Problem: Many agents use mode: all but are conceptually subagents that should run in isolated contexts.

Solution: Change all specialized agents to mode: subagent:

# Before
lead-developer:
  mode: all

# After
lead-developer:
  mode: subagent

Files to Update:

  • .kilo/agents/lead-developer.md
  • .kilo/agents/code-skeptic.md
  • .kilo/agents/release-manager.md
  • .kilo/agents/evaluator.md

Rationale: Subagent mode provides:

  • Isolated context
  • Clear input/output contracts
  • Better token efficiency
  • Prevents context pollution

Improvement 2: Implement Parallelization Pattern

Problem: Security and performance reviews run serially but are independent.

Solution: Use orchestrator-workers pattern for parallel execution:

async def execute_parallel_reviews():
    """Run security and performance reviews in parallel"""
    
    tasks = [
        Task(subagent_type="security-auditor", prompt="..."),
        Task(subagent_type="performance-engineer", prompt="...")
    ]
    
    results = await asyncio.gather(*tasks)
    
    # Collect all issues
    all_issues = [
        *results[0].security_issues,
        *results[1].performance_issues
    ]
    
    if all_issues:
        return Task(subagent_type="the-fixer", issues=all_issues)

New Workflow Step:

## Step 6: Parallel Review

**Agents**: `@security-auditor`, `@performance-engineer` (parallel)

1. Launch both agents simultaneously
2. Wait for both results
3. Aggregate findings
4. If issues found → send to `@the-fixer`
5. If all pass → proceed to release

Rationale: Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.


Improvement 3: Evaluator-Optimizer Pattern

Problem: Code review loop is informal - code-skepticthe-fixer lacks structured iteration.

Solution: Formalize as evaluator-optimizer pattern:

# New agent definition
code-skeptic:
  role: evaluator
  outputs:
    - verdict: APPROVED | REQUEST_CHANGES
    - issues: List[Issue]
    - severity: critical | high | medium | low

the-fixer:
  role: optimizer
  inputs:
    - issues: List[Issue]
    - code: CodeContext
  outputs:
    - changes: List[Change]
    - resolution_notes: List[str]

# Iteration loop
max_iterations: 3
convergence_criteria: all_issues_resolved OR max_iterations_reached

Implementation:

def review_loop(issue_number, code_context):
    """Evaluator-Optimizer pattern for code review"""
    
    for iteration in range(max_iterations=3):
        # Evaluator reviews
        review = task(subagent_type="code-skeptic", code=code_context)
        
        if review.verdict == "APPROVED":
            return review
        
        # Optimizer fixes
        fix = task(
            subagent_type="the-fixer", 
            issues=review.issues,
            code=code_context
        )
        
        code_context = apply_fixes(code_context, fix.changes)
        iteration += 1
    
    # Escalate if not resolved
    post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")

Rationale: Structured iteration prevents infinite loops and ensures convergence.


Improvement 4: Quality Gate Enforcement

Problem: Workflow defines quality gates but agents don't enforce them.

Solution: Add gate validation to each agent:

# Add to each agent definition
gates:
  preconditions:
    - files_exist: true
    - tests_pass: true
  postconditions:
    - build_succeeds: true
    - coverage_met: true
    - no_critical_issues: true

Implementation in Workflow:

def validate_gate(agent_name, gate_name, artifacts):
    """Validate quality gate before proceeding"""
    
    gates = {
        "requirements": ["user_stories_defined", "acceptance_criteria_complete"],
        "architecture": ["schema_valid", "endpoints_documented"],
        "implementation": ["build_success", "no_type_errors"],
        "testing": ["coverage >= 80", "all_tests_pass"],
        "review": ["no_critical_issues", "no_security_vulnerabilities"],
        "docker": ["build_success", "health_check_pass"]
    }
    
    gate_checks = gates[gate_name]
    results = run_checks(gate_checks, artifacts)
    
    if not results.all_passed:
        raise GateError(f"Gate {gate_name} failed: {results.failed}")
    
    return results

Improvement 5: Agent Capability Consolidation

Problem: Some agents have overlapping capabilities.

Solution: Merge and clarify responsibilities:

Merge From Merge To Rationale
browser-automation sdet-engineer E2E testing is SDET domain
markdown-validator requirement-refiner Validation is refiner's job

New SDET Engineer Capabilities:

sdet-engineer:
  capabilities:
    - unit_tests
    - integration_tests
    - e2e_tests:
        tool: playwright
        browser: chromium, firefox, webkit
    - visual_regression:
        tool: pixelmatch
        threshold: 0.1

Rationale: Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.


Improvement 6: Add Capability Index

Problem: No central registry of what each agent can do.

Solution: Create capability index for orchestrator:

# .kilo/capability-index.yaml

agents:
  lead-developer:
    capabilities:
      - code_writing
      - refactoring
      - bug_fixing
    receives:
      - tests
      - specifications
    produces:
      - code
      - documentation
    
  code-skeptic:
    capabilities:
      - code_review
      - security_review
      - style_review
    receives:
      - code
    produces:
      - review_comments
      - approval_status
    forbidden:
      - suggest_implementations

Usage in Orchestrator:

def route_task(task_type: str) -> str:
    """Route task to appropriate agent based on capability"""
    
    capability_map = {
        "code_writing": "lead-developer",
        "code_review": "code-skeptic",
        "test_writing": "sdet-engineer",
        "architecture": "system-analyst",
        "security": "security-auditor",
        "performance": "performance-engineer"
    }
    
    return capability_map.get(task_type, "orchestrator")

Improvement 7: Workflow State Machine Enforcement

Problem: Workflow state machine is documented but not enforced.

Solution: Add explicit state transitions:

# State machine definition
from enum import Enum
from typing import Dict, List

class WorkflowState(Enum):
    NEW = "new"
    PLANNED = "planned"
    RESEARCHING = "researching"
    DESIGNED = "designed"
    TESTING = "testing"
    IMPLEMENTING = "implementing"
    REVIEWING = "reviewing"
    FIXING = "fixing"
    PERF_CHECK = "perf-check"
    SECURITY_CHECK = "security-check"
    RELEASING = "releasing"
    EVALUATED = "evaluated"
    COMPLETED = "completed"

# Valid transitions
TRANSITIONS = {
    WorkflowState.NEW: [WorkflowState.PLANNED],
    WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
    WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
    WorkflowState.DESIGNED: [WorkflowState.TESTING],
    WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
    WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
    WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
    WorkflowState.FIXING: [WorkflowState.REVIEWING],
    WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
    WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
    WorkflowState.RELEASING: [WorkflowState.EVALUATED],
    WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
}

def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
    """Validate state transition"""
    valid_next = TRANSITIONS.get(current, [])
    if next_state not in valid_next:
        raise InvalidTransition(f"Cannot go from {current} to {next_state}")
    return True

Implementation Priority

Priority Improvement Effort Impact
P0 Implement Parallelization Medium High
P0 Quality Gate Enforcement Medium High
P1 Normalize Agent Modes Low Medium
P1 Evaluator-Optimizer Pattern Low High
P2 Agent Consolidation Medium Low
P2 Capability Index Low Medium
P3 State Machine Enforcement Medium Medium

Files to Modify

Must Modify

  1. .kilo/agents/lead-developer.md - Change mode to subagent
  2. .kilo/agents/code-skeptic.md - Change mode to subagent
  3. .kilo/agents/release-manager.md - Change mode to subagent
  4. .kilo/agents/evaluator.md - Change mode to subagent
  5. .kilo/commands/workflow.md - Add parallel execution
  6. .kilo/agents/orchestrator.md - Add evaluator-optimizer pattern

Must Create

  1. .kilo/capability-index.yaml - Agent capabilities registry
  2. .kilo/skills/quality-gates/SKILL.md - Gate validation skill

Expected Outcomes

Metric Before After Improvement
Workflow duration ~3 hours ~2 hours 33% faster
Review iterations 2-5 1-3 40% fewer
Agent context pollution High Low Isolated
Quality gate failures Manual Automated Consistent

Next Steps

  1. Apply this proposal as issues - Create Gitea issues for each improvement
  2. Run /pipeline for each - Use existing pipeline to implement
  3. Measure improvements - Use evaluator to track effectiveness
  4. Iterate - Use prompt-optimizer to refine

Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research