--- description: Run evolution cycle - judge last workflow, optimize underperforming agents, re-test --- # /evolution — Pipeline Evolution Command Runs the automated evolution cycle on the most recent (or specified) workflow. ## Usage ``` /evolution # evolve last completed workflow /evolution --issue 42 # evolve workflow for issue #42 /evolution --agent planner # focus evolution on one agent /evolution --dry-run # show what would change without applying /evolution --history # print fitness trend chart /evolution --fitness # run fitness evaluation (alias for /evolve) ``` ## Aliases - `/evolve` — same as `/evolution --fitness` - `/evolution log` — log agent model change to Gitea ## Execution ### Step 1: Judge (Fitness Evaluation) ```bash Task(subagent_type: "pipeline-judge") → produces fitness report ``` ### Step 2: Decide (Threshold Routing) ``` IF fitness >= 0.85: echo "✅ Pipeline healthy (fitness: {score}). No action needed." append to fitness-history.jsonl EXIT IF fitness >= 0.70: echo "⚠ Pipeline marginal (fitness: {score}). Optimizing weak agents..." identify agents with lowest per-agent scores Task(subagent_type: "prompt-optimizer", target: weak_agents) IF fitness < 0.70: echo "🔴 Pipeline underperforming (fitness: {score}). Major optimization..." Task(subagent_type: "prompt-optimizer", target: all_flagged_agents) IF fitness < 0.50: Task(subagent_type: "agent-architect", action: "redesign", target: worst_agent) ``` ### Step 3: Re-test (After Optimization) ``` Re-run the SAME workflow with updated prompts Task(subagent_type: "pipeline-judge") → fitness_after IF fitness_after > fitness_before: commit prompt changes echo "📈 Fitness improved: {before} → {after}" ELSE: revert prompt changes echo "📉 No improvement. Reverting." ``` ### Step 4: Log Append to `.kilo/logs/fitness-history.jsonl`: ```json { "ts": "", "issue": , "workflow": "", "fitness_before": , "fitness_after": , "agents_optimized": ["planner", "requirement-refiner"], "tokens_saved": , "time_saved_ms": } ``` ## Subcommands ### `log` — Log Model Change Log an agent model improvement to Gitea and evolution data. ```bash /evolution log capability-analyst "Updated to qwen3.6-plus for better IF score" ``` Steps: 1. Read current model from `.kilo/agents/{agent}.md` 2. Get previous model from `agent-evolution/data/agent-versions.json` 3. Calculate improvement (IF score, context window) 4. Write to evolution data 5. Post Gitea comment ### `report` — Generate Evolution Report Generate comprehensive report for agent or all agents: ```bash /evolution report # all agents /evolution report planner # specific agent ``` Output includes: - Total agents - Model changes this month - Average quality improvement - Recent changes table - Performance metrics - Model distribution - Recommendations ### `history` — Show Fitness Trend Print fitness trend chart: ```bash /evolution --history ``` Output: ``` Fitness Trend (Last 30 days): 1.00 ┤ 0.90 ┤ ╭─╮ ╭──╮ 0.80 ┤ ╭─╯ ╰─╮ ╭─╯ ╰──╮ 0.70 ┤ ╭─╯ ╰─╯ ╰──╮ 0.60 ┤ │ ╰─╮ 0.50 ┼─┴───────────────────────────┴── Apr 1 Apr 8 Apr 15 Apr 22 Apr 29 Avg fitness: 0.82 Trend: ↑ improving ``` ### `recommend` — Get Model Recommendations ```bash /evolution recommend ``` Shows: - Agents with fitness < 0.70 (need optimization) - Agents consuming > 30% of token budget (bottlenecks) - Model upgrade recommendations - Priority order ## Data Storage ### fitness-history.jsonl ```jsonl {"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"breakdown":{"test_pass_rate":0.95,"quality_gates_rate":0.80,"efficiency_score":0.65},"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47,"verdict":"PASS"} {"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"breakdown":{"test_pass_rate":1.00,"quality_gates_rate":0.80,"efficiency_score":0.88},"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47,"verdict":"PASS"} ``` ### agent-versions.json ```json { "version": "1.0", "agents": { "capability-analyst": { "current": { "model": "qwen/qwen3.6-plus:free", "provider": "openrouter", "if_score": 90, "quality_score": 79, "context_window": "1M" }, "history": [ { "date": "2026-04-05T22:20:00Z", "type": "model_change", "from": "ollama-cloud/nemotron-3-super", "to": "qwen/qwen3.6-plus:free", "rationale": "Better IF score, FREE via OpenRouter" } ] } } } ``` ## Integration Points - **After `/pipeline`**: Evaluator scores logged - **After model update**: Evolution logged - **Weekly**: Performance report generated - **On request**: Recommendations provided ## Configuration ```yaml # In capability-index.yaml evolution: enabled: true auto_trigger: true # trigger after every workflow fitness_threshold: 0.70 # below this → auto-optimize max_evolution_attempts: 3 # max retries per cycle fitness_history: .kilo/logs/fitness-history.jsonl token_budget_default: 50000 time_budget_default: 300 ``` ## Metrics Tracked | Metric | Source | Purpose | |--------|--------|---------| | Fitness Score | pipeline-judge | Overall pipeline health | | Test Pass Rate | bun test | Code quality | | Quality Gates | build/lint/typecheck | Standards compliance | | Token Cost | pipeline logs | Resource efficiency | | Wall-Clock Time | pipeline logs | Speed | | Agent ROI | history analysis | Cost/benefit | ## Example Session ```bash $ /evolution ## Pipeline Judgment: Issue #42 **Fitness: 0.82/1.00** [PASS] | Metric | Value | Weight | Contribution | |--------|-------|--------|-------------| | Tests | 95% (45/47) | 50% | 0.475 | | Gates | 80% (4/5) | 25% | 0.200 | | Cost | 38.4K tok / 245s | 25% | 0.163 | **Bottleneck:** lead-developer (31% of tokens) **Verdict:** PASS - within acceptable range ✅ Logged to .kilo/logs/fitness-history.jsonl ``` --- *Evolution workflow v2.0 - Objective fitness scoring with pipeline-judge*