Files

¨NW¨ fa68141d47 feat: add pipeline-judge agent and evolution workflow system

- Add pipeline-judge agent for objective fitness scoring
- Update capability-index.yaml with pipeline-judge, evolution config
- Add fitness-evaluation.md workflow for auto-optimization
- Update evolution.md command with /evolve CLI
- Create .kilo/logs/fitness-history.jsonl for metrics logging
- Update AGENTS.md with new workflow state machine
- Add 6 new issues to MILESTONE_ISSUES.md for evolution integration
- Preserve ideas in agent-evolution/ideas/

Pipeline Judge computes fitness = (test_rate*0.5) + (gates*0.25) + (efficiency*0.25)
Auto-triggers prompt-optimizer when fitness < 0.70

2026-04-06 00:23:50 +01:00

1.9 KiB

Raw Blame History

description

description
Run evolution cycle — judge last workflow, optimize underperforming agents, re-test

/evolve — Pipeline Evolution Command

Runs the automated evolution cycle on the most recent (or specified) workflow.

Usage

/evolve                     # evolve last completed workflow
/evolve --issue 42          # evolve workflow for issue #42
/evolve --agent planner     # focus evolution on one agent
/evolve --dry-run           # show what would change without applying
/evolve --history           # print fitness trend chart

Execution

Step 1: Judge

Task(subagent_type: "pipeline-judge")
→ produces fitness report

Step 2: Decide

IF fitness >= 0.85:
  echo "✅ Pipeline healthy (fitness: {score}). No action needed."
  append to fitness-history.jsonl
  EXIT

IF fitness >= 0.70:
  echo "⚠ Pipeline marginal (fitness: {score}). Optimizing weak agents..."
  identify agents with lowest per-agent scores
  Task(subagent_type: "prompt-optimizer", target: weak_agents)

IF fitness < 0.70:
  echo "🔴 Pipeline underperforming (fitness: {score}). Major optimization..."
  Task(subagent_type: "prompt-optimizer", target: all_flagged_agents)
  IF fitness < 0.50:
    Task(subagent_type: "agent-architect", action: "redesign", target: worst_agent)

Step 3: Re-test

Re-run the SAME workflow with updated prompts
Task(subagent_type: "pipeline-judge") → fitness_after

IF fitness_after > fitness_before:
  commit prompt changes
  echo "📈 Fitness improved: {before} → {after}"
ELSE:
  revert prompt changes
  echo "📉 No improvement. Reverting."

Step 4: Log

Append to .kilo/logs/fitness-history.jsonl:
{
  "ts": "<now>",
  "issue": <N>,
  "workflow": "<type>",
  "fitness_before": <score>,
  "fitness_after": <score>,
  "agents_optimized": ["planner", "requirement-refiner"],
  "tokens_saved": <delta>,
  "time_saved_ms": <delta>
}

1.9 KiB Raw Blame History

/evolve — Pipeline Evolution Command

Usage

Execution

Step 1: Judge

Step 2: Decide

Step 3: Re-test

Step 4: Log

1.9 KiB

Raw Blame History