- Add pipeline-judge agent for objective fitness scoring - Update capability-index.yaml with pipeline-judge, evolution config - Add fitness-evaluation.md workflow for auto-optimization - Update evolution.md command with /evolve CLI - Create .kilo/logs/fitness-history.jsonl for metrics logging - Update AGENTS.md with new workflow state machine - Add 6 new issues to MILESTONE_ISSUES.md for evolution integration - Preserve ideas in agent-evolution/ideas/ Pipeline Judge computes fitness = (test_rate*0.5) + (gates*0.25) + (efficiency*0.25) Auto-triggers prompt-optimizer when fitness < 0.70
6.3 KiB
6.3 KiB
description
| description |
|---|
| Run evolution cycle - judge last workflow, optimize underperforming agents, re-test |
/evolution — Pipeline Evolution Command
Runs the automated evolution cycle on the most recent (or specified) workflow.
Usage
/evolution # evolve last completed workflow
/evolution --issue 42 # evolve workflow for issue #42
/evolution --agent planner # focus evolution on one agent
/evolution --dry-run # show what would change without applying
/evolution --history # print fitness trend chart
/evolution --fitness # run fitness evaluation (alias for /evolve)
Aliases
/evolve— same as/evolution --fitness/evolution log— log agent model change to Gitea
Execution
Step 1: Judge (Fitness Evaluation)
Task(subagent_type: "pipeline-judge")
→ produces fitness report
Step 2: Decide (Threshold Routing)
IF fitness >= 0.85:
echo "✅ Pipeline healthy (fitness: {score}). No action needed."
append to fitness-history.jsonl
EXIT
IF fitness >= 0.70:
echo "⚠ Pipeline marginal (fitness: {score}). Optimizing weak agents..."
identify agents with lowest per-agent scores
Task(subagent_type: "prompt-optimizer", target: weak_agents)
IF fitness < 0.70:
echo "🔴 Pipeline underperforming (fitness: {score}). Major optimization..."
Task(subagent_type: "prompt-optimizer", target: all_flagged_agents)
IF fitness < 0.50:
Task(subagent_type: "agent-architect", action: "redesign", target: worst_agent)
Step 3: Re-test (After Optimization)
Re-run the SAME workflow with updated prompts
Task(subagent_type: "pipeline-judge") → fitness_after
IF fitness_after > fitness_before:
commit prompt changes
echo "📈 Fitness improved: {before} → {after}"
ELSE:
revert prompt changes
echo "📉 No improvement. Reverting."
Step 4: Log
Append to .kilo/logs/fitness-history.jsonl:
{
"ts": "<now>",
"issue": <N>,
"workflow": "<type>",
"fitness_before": <score>,
"fitness_after": <score>,
"agents_optimized": ["planner", "requirement-refiner"],
"tokens_saved": <delta>,
"time_saved_ms": <delta>
}
Subcommands
log — Log Model Change
Log an agent model improvement to Gitea and evolution data.
/evolution log capability-analyst "Updated to qwen3.6-plus for better IF score"
Steps:
- Read current model from
.kilo/agents/{agent}.md - Get previous model from
agent-evolution/data/agent-versions.json - Calculate improvement (IF score, context window)
- Write to evolution data
- Post Gitea comment
report — Generate Evolution Report
Generate comprehensive report for agent or all agents:
/evolution report # all agents
/evolution report planner # specific agent
Output includes:
- Total agents
- Model changes this month
- Average quality improvement
- Recent changes table
- Performance metrics
- Model distribution
- Recommendations
history — Show Fitness Trend
Print fitness trend chart:
/evolution --history
Output:
Fitness Trend (Last 30 days):
1.00 ┤
0.90 ┤ ╭─╮ ╭──╮
0.80 ┤ ╭─╯ ╰─╮ ╭─╯ ╰──╮
0.70 ┤ ╭─╯ ╰─╯ ╰──╮
0.60 ┤ │ ╰─╮
0.50 ┼─┴───────────────────────────┴──
Apr 1 Apr 8 Apr 15 Apr 22 Apr 29
Avg fitness: 0.82
Trend: ↑ improving
recommend — Get Model Recommendations
/evolution recommend
Shows:
- Agents with fitness < 0.70 (need optimization)
- Agents consuming > 30% of token budget (bottlenecks)
- Model upgrade recommendations
- Priority order
Data Storage
fitness-history.jsonl
{"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"breakdown":{"test_pass_rate":0.95,"quality_gates_rate":0.80,"efficiency_score":0.65},"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47,"verdict":"PASS"}
{"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"breakdown":{"test_pass_rate":1.00,"quality_gates_rate":0.80,"efficiency_score":0.88},"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47,"verdict":"PASS"}
agent-versions.json
{
"version": "1.0",
"agents": {
"capability-analyst": {
"current": {
"model": "qwen/qwen3.6-plus:free",
"provider": "openrouter",
"if_score": 90,
"quality_score": 79,
"context_window": "1M"
},
"history": [
{
"date": "2026-04-05T22:20:00Z",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "qwen/qwen3.6-plus:free",
"rationale": "Better IF score, FREE via OpenRouter"
}
]
}
}
}
Integration Points
- After
/pipeline: Evaluator scores logged - After model update: Evolution logged
- Weekly: Performance report generated
- On request: Recommendations provided
Configuration
# In capability-index.yaml
evolution:
enabled: true
auto_trigger: true # trigger after every workflow
fitness_threshold: 0.70 # below this → auto-optimize
max_evolution_attempts: 3 # max retries per cycle
fitness_history: .kilo/logs/fitness-history.jsonl
token_budget_default: 50000
time_budget_default: 300
Metrics Tracked
| Metric | Source | Purpose |
|---|---|---|
| Fitness Score | pipeline-judge | Overall pipeline health |
| Test Pass Rate | bun test | Code quality |
| Quality Gates | build/lint/typecheck | Standards compliance |
| Token Cost | pipeline logs | Resource efficiency |
| Wall-Clock Time | pipeline logs | Speed |
| Agent ROI | history analysis | Cost/benefit |
Example Session
$ /evolution
## Pipeline Judgment: Issue #42
**Fitness: 0.82/1.00** [PASS]
| Metric | Value | Weight | Contribution |
|--------|-------|--------|-------------|
| Tests | 95% (45/47) | 50% | 0.475 |
| Gates | 80% (4/5) | 25% | 0.200 |
| Cost | 38.4K tok / 245s | 25% | 0.163 |
**Bottleneck:** lead-developer (31% of tokens)
**Verdict:** PASS - within acceptable range
✅ Logged to .kilo/logs/fitness-history.jsonl
Evolution workflow v2.0 - Objective fitness scoring with pipeline-judge