feat: add pipeline-judge agent and evolution workflow system

- Add pipeline-judge agent for objective fitness scoring - Update capability-index.yaml with pipeline-judge, evolution config - Add fitness-evaluation.md workflow for auto-optimization - Update evolution.md command with /evolve CLI - Create .kilo/logs/fitness-history.jsonl for metrics logging - Update AGENTS.md with new workflow state machine - Add 6 new issues to MILESTONE_ISSUES.md for evolution integration - Preserve ideas in agent-evolution/ideas/ Pipeline Judge computes fitness = (test_rate*0.5) + (gates*0.25) + (efficiency*0.25) Auto-triggers prompt-optimizer when fitness < 0.70
2026-04-06 00:23:50 +01:00
parent 1ab9939c92
commit fa68141d47
12 changed files with 1653 additions and 193 deletions
--- a/agent-evolution/ideas/evolve-command.md
+++ b/agent-evolution/ideas/evolve-command.md
@@ -0,0 +1,72 @@
+---
+description: Run evolution cycle — judge last workflow, optimize underperforming agents, re-test
+---
+
+# /evolve — Pipeline Evolution Command
+
+Runs the automated evolution cycle on the most recent (or specified) workflow.
+
+## Usage
+
+```
+/evolve                     # evolve last completed workflow
+/evolve --issue 42          # evolve workflow for issue #42
+/evolve --agent planner     # focus evolution on one agent
+/evolve --dry-run           # show what would change without applying
+/evolve --history           # print fitness trend chart
+```
+
+## Execution
+
+### Step 1: Judge
+```
+Task(subagent_type: "pipeline-judge")
+→ produces fitness report
+```
+
+### Step 2: Decide
+```
+IF fitness >= 0.85:
+  echo "✅ Pipeline healthy (fitness: {score}). No action needed."
+  append to fitness-history.jsonl
+  EXIT
+
+IF fitness >= 0.70:
+  echo "⚠ Pipeline marginal (fitness: {score}). Optimizing weak agents..."
+  identify agents with lowest per-agent scores
+  Task(subagent_type: "prompt-optimizer", target: weak_agents)
+
+IF fitness < 0.70:
+  echo "🔴 Pipeline underperforming (fitness: {score}). Major optimization..."
+  Task(subagent_type: "prompt-optimizer", target: all_flagged_agents)
+  IF fitness < 0.50:
+    Task(subagent_type: "agent-architect", action: "redesign", target: worst_agent)
+```
+
+### Step 3: Re-test
+```
+Re-run the SAME workflow with updated prompts
+Task(subagent_type: "pipeline-judge") → fitness_after
+
+IF fitness_after > fitness_before:
+  commit prompt changes
+  echo "📈 Fitness improved: {before} → {after}"
+ELSE:
+  revert prompt changes
+  echo "📉 No improvement. Reverting."
+```
+
+### Step 4: Log
+```
+Append to .kilo/logs/fitness-history.jsonl:
+{
+  "ts": "<now>",
+  "issue": <N>,
+  "workflow": "<type>",
+  "fitness_before": <score>,
+  "fitness_after": <score>,
+  "agents_optimized": ["planner", "requirement-refiner"],
+  "tokens_saved": <delta>,
+  "time_saved_ms": <delta>
+}
+```