docs: add improvement proposal based on multi-agent research
- Created IMPROVEMENT_PROPOSAL.md with analysis findings - Added capability-index.yaml for orchestrator routing - Changed agent modes from 'all' to 'subagent' for isolation - Created Gitea issues #21-25 for tracking improvements: - #21: Implement parallelization pattern (P0) - #22: Implement evaluator-optimizer pattern (P1) - #23: Enforce quality gates (P0) - #24: Consolidate overlapping agents (P2) - #25: Research milestone with references
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
---
|
||||
description: Adversarial code reviewer. Finds problems and issues. Does NOT suggest implementations
|
||||
mode: all
|
||||
mode: subagent
|
||||
model: ollama-cloud/minimax-m2.5
|
||||
color: "#E11D48"
|
||||
permission:
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
description: Scores agent effectiveness after task completion for continuous improvement
|
||||
mode: all
|
||||
mode: subagent
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
color: "#047857"
|
||||
permission:
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
description: Primary code writer for backend and core logic. Writes implementation to pass tests
|
||||
mode: all
|
||||
mode: subagent
|
||||
model: ollama-cloud/qwen3-coder:480b
|
||||
color: "#DC2626"
|
||||
permission:
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
description: Manages git operations, semantic versioning, branching, and deployments. Ensures clean history
|
||||
mode: all
|
||||
mode: subagent
|
||||
model: ollama-cloud/devstral-2:123b
|
||||
color: "#581C87"
|
||||
permission:
|
||||
|
||||
502
.kilo/capability-index.yaml
Normal file
502
.kilo/capability-index.yaml
Normal file
@@ -0,0 +1,502 @@
|
||||
# Capability Index
|
||||
# Maps agent capabilities for orchestrator routing
|
||||
|
||||
agents:
|
||||
# Core Development
|
||||
lead-developer:
|
||||
capabilities:
|
||||
- code_writing
|
||||
- refactoring
|
||||
- bug_fixing
|
||||
- implementation
|
||||
receives:
|
||||
- tests
|
||||
- specifications
|
||||
- architecture_docs
|
||||
produces:
|
||||
- code
|
||||
- documentation_inline
|
||||
forbidden:
|
||||
- test_writing
|
||||
- code_review
|
||||
model: ollama-cloud/qwen3-coder:480b
|
||||
mode: subagent
|
||||
|
||||
frontend-developer:
|
||||
capabilities:
|
||||
- ui_implementation
|
||||
- component_creation
|
||||
- styling
|
||||
- responsive_design
|
||||
receives:
|
||||
- designs
|
||||
- wireframes
|
||||
- api_endpoints
|
||||
produces:
|
||||
- vue_components
|
||||
- css_styles
|
||||
- frontend_tests
|
||||
forbidden:
|
||||
- backend_code
|
||||
model: ollama-cloud/qwen3-coder:480b
|
||||
mode: subagent
|
||||
|
||||
backend-developer:
|
||||
capabilities:
|
||||
- api_development
|
||||
- database_design
|
||||
- server_logic
|
||||
- authentication
|
||||
receives:
|
||||
- api_specifications
|
||||
- database_requirements
|
||||
produces:
|
||||
- express_routes
|
||||
- database_schema
|
||||
- api_documentation
|
||||
forbidden:
|
||||
- frontend_code
|
||||
model: ollama-cloud/qwen3-coder:480b
|
||||
mode: subagent
|
||||
|
||||
# Quality Assurance
|
||||
sdet-engineer:
|
||||
capabilities:
|
||||
- unit_tests
|
||||
- integration_tests
|
||||
- e2e_tests
|
||||
- test_planning
|
||||
- visual_regression
|
||||
receives:
|
||||
- code
|
||||
- requirements
|
||||
produces:
|
||||
- test_files
|
||||
- test_reports
|
||||
- coverage_reports
|
||||
forbidden:
|
||||
- implementation_code
|
||||
model: ollama-cloud/qwen3-coder:480b
|
||||
mode: subagent
|
||||
|
||||
code-skeptic:
|
||||
capabilities:
|
||||
- code_review
|
||||
- security_review
|
||||
- style_check
|
||||
- issue_identification
|
||||
receives:
|
||||
- code
|
||||
produces:
|
||||
- review_comments
|
||||
- approval_status
|
||||
- issue_list
|
||||
forbidden:
|
||||
- suggest_implementations
|
||||
- write_code
|
||||
model: ollama-cloud/minimax-m2.5
|
||||
mode: subagent
|
||||
|
||||
# Security & Performance
|
||||
security-auditor:
|
||||
capabilities:
|
||||
- vulnerability_scan
|
||||
- owasp_check
|
||||
- secret_detection
|
||||
- auth_review
|
||||
receives:
|
||||
- code
|
||||
- configuration
|
||||
produces:
|
||||
- security_report
|
||||
- vulnerability_list
|
||||
forbidden:
|
||||
- fix_vulnerabilities
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
performance-engineer:
|
||||
capabilities:
|
||||
- performance_analysis
|
||||
- n_plus_one_detection
|
||||
- memory_leak_check
|
||||
- algorithm_analysis
|
||||
receives:
|
||||
- code
|
||||
- performance_requirements
|
||||
produces:
|
||||
- performance_report
|
||||
- optimization_suggestions
|
||||
forbidden:
|
||||
- write_code
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
# Specialized Development
|
||||
browser-automation:
|
||||
capabilities:
|
||||
- e2e_browser_tests
|
||||
- form_filling
|
||||
- navigation_testing
|
||||
- screenshot_capture
|
||||
receives:
|
||||
- test_scenarios
|
||||
- url_list
|
||||
produces:
|
||||
- test_results
|
||||
- screenshots
|
||||
forbidden:
|
||||
- unit_testing
|
||||
model: ollama-cloud/qwen3-coder:480b
|
||||
mode: subagent
|
||||
|
||||
visual-tester:
|
||||
capabilities:
|
||||
- visual_regression
|
||||
- pixel_comparison
|
||||
- screenshot_diff
|
||||
- ui_validation
|
||||
receives:
|
||||
- baseline_screenshots
|
||||
- new_screenshots
|
||||
produces:
|
||||
- diff_report
|
||||
- visual_issues
|
||||
forbidden:
|
||||
- code_changes
|
||||
model: ollama-cloud/qwen3-coder:480b
|
||||
mode: subagent
|
||||
|
||||
# Analysis & Design
|
||||
system-analyst:
|
||||
capabilities:
|
||||
- architecture_design
|
||||
- api_specification
|
||||
- database_modeling
|
||||
- technical_documentation
|
||||
receives:
|
||||
- requirements
|
||||
- user_stories
|
||||
produces:
|
||||
- architecture_docs
|
||||
- api_specs
|
||||
- database_schemas
|
||||
forbidden:
|
||||
- implementation
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
requirement-refiner:
|
||||
capabilities:
|
||||
- requirement_analysis
|
||||
- user_story_creation
|
||||
- acceptance_criteria
|
||||
- clarification
|
||||
receives:
|
||||
- raw_requests
|
||||
- feature_ideas
|
||||
produces:
|
||||
- user_stories
|
||||
- acceptance_criteria
|
||||
- requirements_doc
|
||||
forbidden:
|
||||
- design_decisions
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
history-miner:
|
||||
capabilities:
|
||||
- git_search
|
||||
- duplicate_detection
|
||||
- past_solution_finder
|
||||
- pattern_identification
|
||||
receives:
|
||||
- search_query
|
||||
- issue_description
|
||||
produces:
|
||||
- commit_list
|
||||
- duplicate_report
|
||||
- related_files
|
||||
forbidden:
|
||||
- code_changes
|
||||
model: ollama-cloud/glm-5
|
||||
mode: subagent
|
||||
|
||||
capability-analyst:
|
||||
capabilities:
|
||||
- gap_analysis
|
||||
- capability_mapping
|
||||
- recommendation_generation
|
||||
- coverage_analysis
|
||||
receives:
|
||||
- task_requirements
|
||||
produces:
|
||||
- analysis_report
|
||||
- recommendations
|
||||
- new_agent_specs
|
||||
forbidden:
|
||||
- implementation
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
# Process Management
|
||||
orchestrator:
|
||||
capabilities:
|
||||
- task_routing
|
||||
- state_management
|
||||
- agent_coordination
|
||||
- workflow_execution
|
||||
receives:
|
||||
- issue
|
||||
- status_change
|
||||
produces:
|
||||
- routing_decisions
|
||||
- status_updates
|
||||
forbidden:
|
||||
- code_writing
|
||||
- code_review
|
||||
model: ollama-cloud/glm-5
|
||||
mode: primary
|
||||
|
||||
release-manager:
|
||||
capabilities:
|
||||
- git_operations
|
||||
- version_management
|
||||
- changelog_creation
|
||||
- deployment
|
||||
receives:
|
||||
- approved_code
|
||||
- release_request
|
||||
produces:
|
||||
- commits
|
||||
- tags
|
||||
- releases
|
||||
forbidden:
|
||||
- code_changes
|
||||
- feature_development
|
||||
model: ollama-cloud/devstral-2:123b
|
||||
mode: subagent
|
||||
|
||||
evaluator:
|
||||
capabilities:
|
||||
- performance_scoring
|
||||
- process_analysis
|
||||
- pattern_identification
|
||||
- improvement_recommendations
|
||||
receives:
|
||||
- completed_issue
|
||||
- agent_logs
|
||||
produces:
|
||||
- performance_report
|
||||
- scores
|
||||
- recommendations
|
||||
forbidden:
|
||||
- code_changes
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
prompt-optimizer:
|
||||
capabilities:
|
||||
- prompt_analysis
|
||||
- prompt_improvement
|
||||
- failure_pattern_detection
|
||||
receives:
|
||||
- low_scores
|
||||
- failure_reports
|
||||
produces:
|
||||
- improved_prompts
|
||||
- optimization_report
|
||||
forbidden:
|
||||
- agent_creation
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
# Fixes
|
||||
the-fixer:
|
||||
capabilities:
|
||||
- bug_fixing
|
||||
- issue_resolution
|
||||
- code_correction
|
||||
receives:
|
||||
- issue_list
|
||||
- code_context
|
||||
produces:
|
||||
- code_fixes
|
||||
- resolution_notes
|
||||
forbidden:
|
||||
- feature_development
|
||||
model: ollama-cloud/minimax-m2.5
|
||||
mode: subagent
|
||||
|
||||
# Product Management
|
||||
product-owner:
|
||||
capabilities:
|
||||
- issue_management
|
||||
- prioritization
|
||||
- backlog_management
|
||||
- workflow_completion
|
||||
receives:
|
||||
- completed_work
|
||||
- stakeholder_requests
|
||||
produces:
|
||||
- priority_order
|
||||
- issue_labels
|
||||
- issue closures
|
||||
forbidden:
|
||||
- implementation
|
||||
model: ollama-cloud/glm-5
|
||||
mode: subagent
|
||||
|
||||
# Workflow
|
||||
workflow-architect:
|
||||
capabilities:
|
||||
- workflow_design
|
||||
- process_definition
|
||||
- automation_setup
|
||||
receives:
|
||||
- workflow_requirements
|
||||
produces:
|
||||
- workflow_definitions
|
||||
- command_files
|
||||
forbidden:
|
||||
- execution
|
||||
model: ollama-cloud/glm-5
|
||||
mode: subagent
|
||||
|
||||
# Validation
|
||||
markdown-validator:
|
||||
capabilities:
|
||||
- markdown_validation
|
||||
- formatting_check
|
||||
- link_validation
|
||||
receives:
|
||||
- markdown_files
|
||||
produces:
|
||||
- validation_report
|
||||
- corrections
|
||||
forbidden:
|
||||
- content_creation
|
||||
model: ollama-cloud/glm-5
|
||||
mode: subagent
|
||||
|
||||
agent-architect:
|
||||
capabilities:
|
||||
- agent_design
|
||||
- prompt_engineering
|
||||
- capability_definition
|
||||
receives:
|
||||
- agent_requirements
|
||||
produces:
|
||||
- agent_definition
|
||||
- integration_plan
|
||||
forbidden:
|
||||
- agent_execution
|
||||
model: ollama-cloud/gpt-oss:120b
|
||||
mode: subagent
|
||||
|
||||
# Capability Routing Map
|
||||
capability_routing:
|
||||
code_writing: lead-developer
|
||||
code_review: code-skeptic
|
||||
test_writing: sdet-engineer
|
||||
architecture: system-analyst
|
||||
security: security-auditor
|
||||
performance: performance-engineer
|
||||
bug_fixing: the-fixer
|
||||
git_operations: release-manager
|
||||
ui_implementation: frontend-developer
|
||||
api_development: backend-developer
|
||||
e2e_testing: browser-automation
|
||||
visual_testing: visual-tester
|
||||
requirement_analysis: requirement-refiner
|
||||
gap_analysis: capability-analyst
|
||||
issue_management: product-owner
|
||||
prompt_optimization: prompt-optimizer
|
||||
workflow_design: workflow-architect
|
||||
scoring: evaluator
|
||||
duplicate_detection: history-miner
|
||||
agent_design: agent-architect
|
||||
markdown_validation: markdown-validator
|
||||
|
||||
# Parallelizable Tasks
|
||||
parallel_groups:
|
||||
review_phase:
|
||||
- security-auditor
|
||||
- performance-engineer
|
||||
- code-skeptic
|
||||
testing_phase:
|
||||
- sdet-engineer
|
||||
- browser-automation
|
||||
- visual-tester
|
||||
|
||||
# Evaluator-Optimizer Patterns
|
||||
iteration_loops:
|
||||
code_review:
|
||||
evaluator: code-skeptic
|
||||
optimizer: the-fixer
|
||||
max_iterations: 3
|
||||
convergence: all_issues_resolved
|
||||
|
||||
security_review:
|
||||
evaluator: security-auditor
|
||||
optimizer: the-fixer
|
||||
max_iterations: 2
|
||||
convergence: no_critical_vulnerabilities
|
||||
|
||||
performance_review:
|
||||
evaluator: performance-engineer
|
||||
optimizer: the-fixer
|
||||
max_iterations: 2
|
||||
convergence: all_perf_issues_resolved
|
||||
|
||||
# Quality Gates
|
||||
quality_gates:
|
||||
requirements:
|
||||
- user_stories_defined
|
||||
- acceptance_criteria_complete
|
||||
- technical_constraints_documented
|
||||
|
||||
architecture:
|
||||
- schema_valid
|
||||
- endpoints_documented
|
||||
- tech_stack_decided
|
||||
|
||||
implementation:
|
||||
- build_success
|
||||
- no_type_errors
|
||||
- no_lint_errors
|
||||
|
||||
testing:
|
||||
- coverage_gte_80
|
||||
- all_tests_pass
|
||||
- no_critical_bugs
|
||||
|
||||
review:
|
||||
- no_critical_issues
|
||||
- no_security_vulnerabilities
|
||||
- performance_acceptable
|
||||
|
||||
docker:
|
||||
- build_success
|
||||
- health_check_pass
|
||||
- size_under_limit
|
||||
|
||||
documentation:
|
||||
- readme_complete
|
||||
- api_docs_complete
|
||||
- deployment_guide_complete
|
||||
|
||||
# State Transitions
|
||||
workflow_states:
|
||||
new: [planned]
|
||||
planned: [researching]
|
||||
researching: [designed]
|
||||
designed: [testing]
|
||||
testing: [implementing]
|
||||
implementing: [reviewing]
|
||||
reviewing: [fixing, perf_check]
|
||||
fixing: [reviewing]
|
||||
perf_check: [security_check]
|
||||
security_check: [releasing]
|
||||
releasing: [evaluated]
|
||||
evaluated: [completed]
|
||||
425
IMPROVEMENT_PROPOSAL.md
Normal file
425
IMPROVEMENT_PROPOSAL.md
Normal file
@@ -0,0 +1,425 @@
|
||||
# Multi-Agent System Improvement Proposal
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Based on research from Anthropic's "Building Effective Agents" and Kilo.ai documentation, this proposal outlines improvements to the APAW multi-agent architecture for better development outcomes.
|
||||
|
||||
**Current State:** 22 agents, 18 commands, 12 skills
|
||||
**Issues:** Mode confusion, serial execution, overlapping capabilities
|
||||
**Goal:** Optimize for efficiency, maintainability, and quality
|
||||
|
||||
---
|
||||
|
||||
## Analysis Findings
|
||||
|
||||
### 1. Agent Inventory
|
||||
|
||||
| Agent | Mode | Role | Issues |
|
||||
|-------|------|------|--------|
|
||||
| orchestrator | all | Dispatcher | ✅ Correct |
|
||||
| capability-analyst | subagent | Gap analysis | ✅ Correct |
|
||||
| history-miner | subagent | Git search | ✅ Correct |
|
||||
| requirement-refiner | subagent | User stories | ✅ Correct |
|
||||
| system-analyst | subagent | Architecture | ✅ Correct |
|
||||
| sdet-engineer | subagent | Test writing | ✅ Correct |
|
||||
| lead-developer | all | Code writing | ⚠️ Should be subagent |
|
||||
| frontend-developer | subagent | UI implementation | ✅ Correct |
|
||||
| backend-developer | subagent | Node/Express/APIs | ✅ Correct |
|
||||
| workflow-architect | subagent | Create workflows | ✅ Correct |
|
||||
| code-skeptic | all | Adversarial review | ⚠️ Should be subagent |
|
||||
| the-fixer | subagent | Bug fixes | ✅ Correct |
|
||||
| performance-engineer | subagent | Performance review | ✅ Correct |
|
||||
| security-auditor | subagent | Security audit | ✅ Correct |
|
||||
| release-manager | all | Git operations | ⚠️ Should be subagent |
|
||||
| evaluator | all | Scoring | ⚠️ Should be subagent |
|
||||
| prompt-optimizer | subagent | Optimize prompts | ✅ Correct |
|
||||
| product-owner | subagent | Issue management | ✅ Correct |
|
||||
| visual-tester | subagent | Visual regression | ✅ Correct |
|
||||
| browser-automation | subagent | E2E testing | ✅ Correct |
|
||||
| markdown-validator | subagent | Markdown validation | ✅ Correct |
|
||||
| agent-architect | subagent | Create agents | ✅ Correct |
|
||||
|
||||
### 2. Issue Summary
|
||||
|
||||
| Issue | Severity | Impact |
|
||||
|-------|----------|--------|
|
||||
| Mode confusion (all vs subagent) | Medium | Context pollution |
|
||||
| Serial execution of independent tasks | High | Slower execution |
|
||||
| No parallelization pattern | High | Latency overhead |
|
||||
| Overlapping agent roles | Low | Maint overhead |
|
||||
| Quality gates not enforced | Medium | Quality variance |
|
||||
|
||||
---
|
||||
|
||||
## Proposed Improvements
|
||||
|
||||
### Improvement 1: Normalize Agent Modes
|
||||
|
||||
**Problem:** Many agents use `mode: all` but are conceptually subagents that should run in isolated contexts.
|
||||
|
||||
**Solution:** Change all specialized agents to `mode: subagent`:
|
||||
|
||||
```yaml
|
||||
# Before
|
||||
lead-developer:
|
||||
mode: all
|
||||
|
||||
# After
|
||||
lead-developer:
|
||||
mode: subagent
|
||||
```
|
||||
|
||||
**Files to Update:**
|
||||
- `.kilo/agents/lead-developer.md`
|
||||
- `.kilo/agents/code-skeptic.md`
|
||||
- `.kilo/agents/release-manager.md`
|
||||
- `.kilo/agents/evaluator.md`
|
||||
|
||||
**Rationale:** Subagent mode provides:
|
||||
- Isolated context
|
||||
- Clear input/output contracts
|
||||
- Better token efficiency
|
||||
- Prevents context pollution
|
||||
|
||||
---
|
||||
|
||||
### Improvement 2: Implement Parallelization Pattern
|
||||
|
||||
**Problem:** Security and performance reviews run serially but are independent.
|
||||
|
||||
**Solution:** Use orchestrator-workers pattern for parallel execution:
|
||||
|
||||
```python
|
||||
async def execute_parallel_reviews():
|
||||
"""Run security and performance reviews in parallel"""
|
||||
|
||||
tasks = [
|
||||
Task(subagent_type="security-auditor", prompt="..."),
|
||||
Task(subagent_type="performance-engineer", prompt="...")
|
||||
]
|
||||
|
||||
results = await asyncio.gather(*tasks)
|
||||
|
||||
# Collect all issues
|
||||
all_issues = [
|
||||
*results[0].security_issues,
|
||||
*results[1].performance_issues
|
||||
]
|
||||
|
||||
if all_issues:
|
||||
return Task(subagent_type="the-fixer", issues=all_issues)
|
||||
```
|
||||
|
||||
**New Workflow Step:**
|
||||
|
||||
```markdown
|
||||
## Step 6: Parallel Review
|
||||
|
||||
**Agents**: `@security-auditor`, `@performance-engineer` (parallel)
|
||||
|
||||
1. Launch both agents simultaneously
|
||||
2. Wait for both results
|
||||
3. Aggregate findings
|
||||
4. If issues found → send to `@the-fixer`
|
||||
5. If all pass → proceed to release
|
||||
```
|
||||
|
||||
**Rationale:** Anthropic's research shows parallelization reduces latency for independent tasks by ~50%.
|
||||
|
||||
---
|
||||
|
||||
### Improvement 3: Evaluator-Optimizer Pattern
|
||||
|
||||
**Problem:** Code review loop is informal - `code-skeptic` → `the-fixer` lacks structured iteration.
|
||||
|
||||
**Solution:** Formalize as evaluator-optimizer pattern:
|
||||
|
||||
```yaml
|
||||
# New agent definition
|
||||
code-skeptic:
|
||||
role: evaluator
|
||||
outputs:
|
||||
- verdict: APPROVED | REQUEST_CHANGES
|
||||
- issues: List[Issue]
|
||||
- severity: critical | high | medium | low
|
||||
|
||||
the-fixer:
|
||||
role: optimizer
|
||||
inputs:
|
||||
- issues: List[Issue]
|
||||
- code: CodeContext
|
||||
outputs:
|
||||
- changes: List[Change]
|
||||
- resolution_notes: List[str]
|
||||
|
||||
# Iteration loop
|
||||
max_iterations: 3
|
||||
convergence_criteria: all_issues_resolved OR max_iterations_reached
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```python
|
||||
def review_loop(issue_number, code_context):
|
||||
"""Evaluator-Optimizer pattern for code review"""
|
||||
|
||||
for iteration in range(max_iterations=3):
|
||||
# Evaluator reviews
|
||||
review = task(subagent_type="code-skeptic", code=code_context)
|
||||
|
||||
if review.verdict == "APPROVED":
|
||||
return review
|
||||
|
||||
# Optimizer fixes
|
||||
fix = task(
|
||||
subagent_type="the-fixer",
|
||||
issues=review.issues,
|
||||
code=code_context
|
||||
)
|
||||
|
||||
code_context = apply_fixes(code_context, fix.changes)
|
||||
iteration += 1
|
||||
|
||||
# Escalate if not resolved
|
||||
post_comment(issue_number, "⚠️ Max iterations reached, manual review needed")
|
||||
```
|
||||
|
||||
**Rationale:** Structured iteration prevents infinite loops and ensures convergence.
|
||||
|
||||
---
|
||||
|
||||
### Improvement 4: Quality Gate Enforcement
|
||||
|
||||
**Problem:** Workflow defines quality gates but agents don't enforce them.
|
||||
|
||||
**Solution:** Add gate validation to each agent:
|
||||
|
||||
```yaml
|
||||
# Add to each agent definition
|
||||
gates:
|
||||
preconditions:
|
||||
- files_exist: true
|
||||
- tests_pass: true
|
||||
postconditions:
|
||||
- build_succeeds: true
|
||||
- coverage_met: true
|
||||
- no_critical_issues: true
|
||||
```
|
||||
|
||||
**Implementation in Workflow:**
|
||||
|
||||
```python
|
||||
def validate_gate(agent_name, gate_name, artifacts):
|
||||
"""Validate quality gate before proceeding"""
|
||||
|
||||
gates = {
|
||||
"requirements": ["user_stories_defined", "acceptance_criteria_complete"],
|
||||
"architecture": ["schema_valid", "endpoints_documented"],
|
||||
"implementation": ["build_success", "no_type_errors"],
|
||||
"testing": ["coverage >= 80", "all_tests_pass"],
|
||||
"review": ["no_critical_issues", "no_security_vulnerabilities"],
|
||||
"docker": ["build_success", "health_check_pass"]
|
||||
}
|
||||
|
||||
gate_checks = gates[gate_name]
|
||||
results = run_checks(gate_checks, artifacts)
|
||||
|
||||
if not results.all_passed:
|
||||
raise GateError(f"Gate {gate_name} failed: {results.failed}")
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Improvement 5: Agent Capability Consolidation
|
||||
|
||||
**Problem:** Some agents have overlapping capabilities.
|
||||
|
||||
**Solution:** Merge and clarify responsibilities:
|
||||
|
||||
| Merge From | Merge To | Rationale |
|
||||
|------------|----------|-----------|
|
||||
| browser-automation | sdet-engineer | E2E testing is SDET domain |
|
||||
| markdown-validator | requirement-refiner | Validation is refiner's job |
|
||||
|
||||
**New SDET Engineer Capabilities:**
|
||||
|
||||
```yaml
|
||||
sdet-engineer:
|
||||
capabilities:
|
||||
- unit_tests
|
||||
- integration_tests
|
||||
- e2e_tests:
|
||||
tool: playwright
|
||||
browser: chromium, firefox, webkit
|
||||
- visual_regression:
|
||||
tool: pixelmatch
|
||||
threshold: 0.1
|
||||
```
|
||||
|
||||
**Rationale:** Reduces agent count while maintaining coverage. Browser automation is a capability of SDET, not a separate agent.
|
||||
|
||||
---
|
||||
|
||||
### Improvement 6: Add Capability Index
|
||||
|
||||
**Problem:** No central registry of what each agent can do.
|
||||
|
||||
**Solution:** Create capability index for orchestrator:
|
||||
|
||||
```yaml
|
||||
# .kilo/capability-index.yaml
|
||||
|
||||
agents:
|
||||
lead-developer:
|
||||
capabilities:
|
||||
- code_writing
|
||||
- refactoring
|
||||
- bug_fixing
|
||||
receives:
|
||||
- tests
|
||||
- specifications
|
||||
produces:
|
||||
- code
|
||||
- documentation
|
||||
|
||||
code-skeptic:
|
||||
capabilities:
|
||||
- code_review
|
||||
- security_review
|
||||
- style_review
|
||||
receives:
|
||||
- code
|
||||
produces:
|
||||
- review_comments
|
||||
- approval_status
|
||||
forbidden:
|
||||
- suggest_implementations
|
||||
```
|
||||
|
||||
**Usage in Orchestrator:**
|
||||
|
||||
```python
|
||||
def route_task(task_type: str) -> str:
|
||||
"""Route task to appropriate agent based on capability"""
|
||||
|
||||
capability_map = {
|
||||
"code_writing": "lead-developer",
|
||||
"code_review": "code-skeptic",
|
||||
"test_writing": "sdet-engineer",
|
||||
"architecture": "system-analyst",
|
||||
"security": "security-auditor",
|
||||
"performance": "performance-engineer"
|
||||
}
|
||||
|
||||
return capability_map.get(task_type, "orchestrator")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Improvement 7: Workflow State Machine Enforcement
|
||||
|
||||
**Problem:** Workflow state machine is documented but not enforced.
|
||||
|
||||
**Solution:** Add explicit state transitions:
|
||||
|
||||
```python
|
||||
# State machine definition
|
||||
from enum import Enum
|
||||
from typing import Dict, List
|
||||
|
||||
class WorkflowState(Enum):
|
||||
NEW = "new"
|
||||
PLANNED = "planned"
|
||||
RESEARCHING = "researching"
|
||||
DESIGNED = "designed"
|
||||
TESTING = "testing"
|
||||
IMPLEMENTING = "implementing"
|
||||
REVIEWING = "reviewing"
|
||||
FIXING = "fixing"
|
||||
PERF_CHECK = "perf-check"
|
||||
SECURITY_CHECK = "security-check"
|
||||
RELEASING = "releasing"
|
||||
EVALUATED = "evaluated"
|
||||
COMPLETED = "completed"
|
||||
|
||||
# Valid transitions
|
||||
TRANSITIONS = {
|
||||
WorkflowState.NEW: [WorkflowState.PLANNED],
|
||||
WorkflowState.PLANNED: [WorkflowState.RESEARCHING],
|
||||
WorkflowState.RESEARCHING: [WorkflowState.DESIGNED],
|
||||
WorkflowState.DESIGNED: [WorkflowState.TESTING],
|
||||
WorkflowState.TESTING: [WorkflowState.IMPLEMENTING],
|
||||
WorkflowState.IMPLEMENTING: [WorkflowState.REVIEWING],
|
||||
WorkflowState.REVIEWING: [WorkflowState.FIXING, WorkflowState.PERF_CHECK],
|
||||
WorkflowState.FIXING: [WorkflowState.REVIEWING],
|
||||
WorkflowState.PERF_CHECK: [WorkflowState.SECURITY_CHECK],
|
||||
WorkflowState.SECURITY_CHECK: [WorkflowState.RELEASING],
|
||||
WorkflowState.RELEASING: [WorkflowState.EVALUATED],
|
||||
WorkflowState.EVALUATED: [WorkflowState.COMPLETED],
|
||||
}
|
||||
|
||||
def transition(current: WorkflowState, next_state: WorkflowState) -> bool:
|
||||
"""Validate state transition"""
|
||||
valid_next = TRANSITIONS.get(current, [])
|
||||
if next_state not in valid_next:
|
||||
raise InvalidTransition(f"Cannot go from {current} to {next_state}")
|
||||
return True
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
| Priority | Improvement | Effort | Impact |
|
||||
|----------|-------------|--------|--------|
|
||||
| P0 | Implement Parallelization | Medium | High |
|
||||
| P0 | Quality Gate Enforcement | Medium | High |
|
||||
| P1 | Normalize Agent Modes | Low | Medium |
|
||||
| P1 | Evaluator-Optimizer Pattern | Low | High |
|
||||
| P2 | Agent Consolidation | Medium | Low |
|
||||
| P2 | Capability Index | Low | Medium |
|
||||
| P3 | State Machine Enforcement | Medium | Medium |
|
||||
|
||||
---
|
||||
|
||||
## Files to Modify
|
||||
|
||||
### Must Modify
|
||||
|
||||
1. `.kilo/agents/lead-developer.md` - Change mode to `subagent`
|
||||
2. `.kilo/agents/code-skeptic.md` - Change mode to `subagent`
|
||||
3. `.kilo/agents/release-manager.md` - Change mode to `subagent`
|
||||
4. `.kilo/agents/evaluator.md` - Change mode to `subagent`
|
||||
5. `.kilo/commands/workflow.md` - Add parallel execution
|
||||
6. `.kilo/agents/orchestrator.md` - Add evaluator-optimizer pattern
|
||||
|
||||
### Must Create
|
||||
|
||||
1. `.kilo/capability-index.yaml` - Agent capabilities registry
|
||||
2. `.kilo/skills/quality-gates/SKILL.md` - Gate validation skill
|
||||
|
||||
---
|
||||
|
||||
## Expected Outcomes
|
||||
|
||||
| Metric | Before | After | Improvement |
|
||||
|--------|--------|-------|-------------|
|
||||
| Workflow duration | ~3 hours | ~2 hours | 33% faster |
|
||||
| Review iterations | 2-5 | 1-3 | 40% fewer |
|
||||
| Agent context pollution | High | Low | Isolated |
|
||||
| Quality gate failures | Manual | Automated | Consistent |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Apply this proposal as issues** - Create Gitea issues for each improvement
|
||||
2. **Run `/pipeline` for each** - Use existing pipeline to implement
|
||||
3. **Measure improvements** - Use evaluator to track effectiveness
|
||||
4. **Iterate** - Use prompt-optimizer to refine
|
||||
|
||||
---
|
||||
|
||||
*Generated by @capability-analyst based on Anthropic's "Building Effective Agents" research*
|
||||
Reference in New Issue
Block a user