Files

¨NW¨ 5a77528b23 refactor: clean up root directory - move deprecated files to archive

Archive:
- docker-compose.yml, Dockerfile.playwright
- scripts/ (legacy test scripts)
- docs/, .test/ (old documentation and tests)
- IMPROVEMENT_PROPOSAL.md (superseded by .kilo/)
- BROWSER_VISIBILITY.md, README.Docker.md
- cleanup-packages.sh, fix-permissions.sh, install-apaw.sh

Keep in root:
- .kilo/ (active system)
- .claude/ (Claude Code runtime)
- AGENTS.md (agent reference)
- README.md (main documentation)
- src/ (utility code)
- package.json, tsconfig.json (project config)

2026-04-05 03:52:10 +01:00

12 KiB

Raw Permalink Blame History

Multi-Agent System Improvement Proposal

Executive Summary

Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.

Current State: 22 agents, 18 commands, 12 skills Issues: Mode confusion, serial execution, overlapping capabilities Goal: Optimize for efficiency, maintainability, and quality

Analysis Findings

1. Agent Inventory

Agent	Mode	Role	Issues
orchestrator	all	Dispatcher	✅ Correct
capability-analyst	subagent	Gap analysis	✅ Correct
history-miner	subagent	Git search	✅ Correct
requirement-refiner	subagent	User stories	✅ Correct
system-analyst	subagent	Architecture	✅ Correct
sdet-engineer	subagent	Test writing	✅ Correct
lead-developer	all	Code writing	⚠️ Should be subagent
frontend-developer	subagent	UI implementation	✅ Correct
backend-developer	subagent	Node/Express/APIs	✅ Correct
workflow-architect	subagent	Create workflows	✅ Correct
code-skeptic	all	Adversarial review	⚠️ Should be subagent
the-fixer	subagent	Bug fixes	✅ Correct
performance-engineer	subagent	Performance review	✅ Correct
security-auditor	subagent	Security audit	✅ Correct
release-manager	all	Git operations	⚠️ Should be subagent
evaluator	all	Scoring	⚠️ Should be subagent
prompt-optimizer	subagent	Optimize prompts	✅ Correct
product-owner	subagent	Issue management	✅ Correct
visual-tester	subagent	Visual regression	✅ Correct
browser-automation	subagent	E2E testing	✅ Correct
markdown-validator	subagent	Markdown validation	✅ Correct
agent-architect	subagent	Create agents	✅ Correct

2. Issue Summary

Issue	Severity	Impact
Mode confusion (all vs subagent)	Medium	Context pollution
Serial execution of independent tasks	High	Slower execution
No parallelization pattern	High	Latency overhead
Overlapping agent roles	Low	Maint overhead
Quality gates not enforced	Medium	Quality variance

Proposed Improvements

Improvement 1: Normalize Agent Modes

Problem: Many agents use mode: all but are conceptually subagents that should run in isolated contexts.

Solution: Change all specialized agents to mode: subagent:

# Before
lead-developer:
  mode: all

# After
lead-developer:
  mode: subagent

Files to Update:

.kilo/agents/lead-developer.md
.kilo/agents/code-skeptic.md
.kilo/agents/release-manager.md
.kilo/agents/evaluator.md

Rationale: Subagent mode provides:

Isolated context
Clear input/output contracts
Better token efficiency
Prevents context pollution

Improvement 2: Implement Parallelization Pattern

Problem: Security and performance reviews run serially but are independent.

Solution: Use orchestrator-workers pattern for parallel execution:

async def execute_parallel_reviews():
    """Run security and performance reviews in parallel"""
    
    tasks = [
        Task(subagent_type="security-auditor", prompt="..."),
        Task(subagent_type="performance-engineer", prompt="...")
    ]
    
    results = await asyncio.gather(*tasks)
    
    # Collect all issues
    all_issues = [
        *results[0].security_issues,
        *results[1].performance_issues
    ]
    
    if all_issues:
        return Task(subagent_type="the-fixer", issues=all_issues)

New Workflow Step:

## Step 6: Parallel Review

**Agents**: `@security-auditor`, `@performance-engineer` (parallel)

1. Launch both agents simultaneously
2. Wait for both results
3. Aggregate findings
4. If issues found → send to `@the-fixer`
5. If all pass → proceed to release

Rationale: Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.

Improvement 3: Evaluator-Optimizer Pattern

Problem: Code review loop is informal - code-skeptic → the-fixer lacks structured iteration.

Solution: Formalize as evaluator-optimizer pattern:

# New agent definition
code-skeptic:
  role: evaluator
  outputs:
    - verdict: APPROVED | REQUEST_CHANGES
    - issues: List[Issue]
    - severity: critical | high | medium | low

the-fixer:
  role: optimizer
  inputs:
    - issues: List[Issue]
    - code: CodeContext
  outputs:
    - changes: List[Change]
    - resolution_notes: List[str]

# Iteration loop
max_iterations: 3
convergence_criteria: all_issues_resolved OR max_iterations_reached

Implementation:

def review_loop(issue_number, code_context):
    """Evaluator-Optimizer pattern for code review"""
    
    for iteration in range(max_iterations=3):
        # Evaluator reviews
        review = task(subagent_type="code-skeptic", code=code_context)
        
        if review.verdict == "APPROVED":
            return review
        
        # Optimizer fixes
        fix = task(
            subagent_type="the-fixer", 
            issues=review.issues,
            code=code_context
        )
        
        code_context = apply_fixes(code_context, fix.changes)
        iteration += 1
    
    # Escalate if not resolved
    post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")

Rationale: Structured iteration prevents infinite loops and ensures convergence.

Improvement 4: Quality Gate Enforcement

Problem: Workflow defines quality gates but agents don't enforce them.

Solution: Add gate validation to each agent:

# Add to each agent definition
gates:
  preconditions:
    - files_exist: true
    - tests_pass: true
  postconditions:
    - build_succeeds: true
    - coverage_met: true
    - no_critical_issues: true

Implementation in Workflow:

def validate_gate(agent_name, gate_name, artifacts):
    """Validate quality gate before proceeding"""
    
    gates = {
        "requirements": ["user_stories_defined", "acceptance_criteria_complete"],
        "architecture": ["schema_valid", "endpoints_documented"],
        "implementation": ["build_success", "no_type_errors"],
        "testing": ["coverage >= 80", "all_tests_pass"],
        "review": ["no_critical_issues", "no_security_vulnerabilities"],
        "docker": ["build_success", "health_check_pass"]
    }
    
    gate_checks = gates[gate_name]
    results = run_checks(gate_checks, artifacts)
    
    if not results.all_passed:
        raise GateError(f"Gate {gate_name} failed: {results.failed}")
    
    return results

Improvement 5: Agent Capability Consolidation

Problem: Some agents have overlapping capabilities.

Solution: Merge and clarify responsibilities:

Merge From	Merge To	Rationale
browser-automation	sdet-engineer	E2E testing is SDET domain
markdown-validator	requirement-refiner	Validation is refiner's job

New SDET Engineer Capabilities:

sdet-engineer:
  capabilities:
    - unit_tests
    - integration_tests
    - e2e_tests:
        tool: playwright
        browser: chromium, firefox, webkit
    - visual_regression:
        tool: pixelmatch
        threshold: 0.1

Rationale: Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.

Improvement 6: Add Capability Index

Problem: No central registry of what each agent can do.

Solution: Create capability index for orchestrator:

# .kilo/capability-index.yaml

agents:
  lead-developer:
    capabilities:
      - code_writing
      - refactoring
      - bug_fixing
    receives:
      - tests
      - specifications
    produces:
      - code
      - documentation
    
  code-skeptic:
    capabilities:
      - code_review
      - security_review
      - style_review
    receives:
      - code
    produces:
      - review_comments
      - approval_status
    forbidden:
      - suggest_implementations

Usage in Orchestrator:

def route_task(task_type: str) -> str:
    """Route task to appropriate agent based on capability"""
    
    capability_map = {
        "code_writing": "lead-developer",
        "code_review": "code-skeptic",
        "test_writing": "sdet-engineer",
        "architecture": "system-analyst",
        "security": "security-auditor",
        "performance": "performance-engineer"
    }
    
    return capability_map.get(task_type, "orchestrator")

Improvement 7: Workflow State Machine Enforcement

Problem: Workflow state machine is documented but not enforced.

Solution: Add explicit state transitions:

# State machine definition
from enum import Enum
from typing import Dict, List

class WorkflowState(Enum):
    NEW = "new"
    PLANNED = "planned"
    RESEARCHING = "researching"
    DESIGNED = "designed"
    TESTING = "testing"
    IMPLEMENTING = "implementing"
    REVIEWING = "reviewing"
    FIXING = "fixing"
    PERF_CHECK = "perf-check"
    SECURITY_CHECK = "security-check"
    RELEASING = "releasing"
    EVALUATED = "evaluated"
    COMPLETED = "completed"

# Valid transitions
TRANSITIONS = {
    WorkflowState.NEW: [WorkflowState.PLANNED],
    WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
    WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
    WorkflowState.DESIGNED: [WorkflowState.TESTING],
    WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
    WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
    WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
    WorkflowState.FIXING: [WorkflowState.REVIEWING],
    WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
    WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
    WorkflowState.RELEASING: [WorkflowState.EVALUATED],
    WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
}

def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
    """Validate state transition"""
    valid_next = TRANSITIONS.get(current, [])
    if next_state not in valid_next:
        raise InvalidTransition(f"Cannot go from {current} to {next_state}")
    return True

Implementation Priority

Priority	Improvement	Effort	Impact
P0	Implement Parallelization	Medium	High
P0	Quality Gate Enforcement	Medium	High
P1	Normalize Agent Modes	Low	Medium
P1	Evaluator-Optimizer Pattern	Low	High
P2	Agent Consolidation	Medium	Low
P2	Capability Index	Low	Medium
P3	State Machine Enforcement	Medium	Medium

Files to Modify

Must Modify

.kilo/agents/lead-developer.md - Change mode to subagent
.kilo/agents/code-skeptic.md - Change mode to subagent
.kilo/agents/release-manager.md - Change mode to subagent
.kilo/agents/evaluator.md - Change mode to subagent
.kilo/commands/workflow.md - Add parallel execution
.kilo/agents/orchestrator.md - Add evaluator-optimizer pattern

Must Create

.kilo/capability-index.yaml - Agent capabilities registry
.kilo/skills/quality-gates/SKILL.md - Gate validation skill

Expected Outcomes

Metric	Before	After	Improvement
Workflow duration	~3 hours	~2 hours	33% faster
Review iterations	2-5	1-3	40% fewer
Agent context pollution	High	Low	Isolated
Quality gate failures	Manual	Automated	Consistent

Next Steps

Apply this proposal as issues - Create Gitea issues for each improvement
Run /pipeline for each - Use existing pipeline to implement
Measure improvements - Use evaluator to track effectiveness
Iterate - Use prompt-optimizer to refine

Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research

12 KiB Raw Permalink Blame History