APAW/.kilo/shared/self-evolution.md

# Self-Evolution Protocol

When task requirements exceed existing agent capabilities.

## Trigger Conditions

1. No agent matches task requirements
2. Required domain knowledge not in any skill
3. Complex multi-step task needs new workflow pattern
4. `@capability-analyst` reports critical gap
5. `/evolution` reports fitness < 0.70 and model research finds better model
6. Model benchmarks stale (>7 days) and research discovers new model

## Evolution Flow

```
[Gap Detected]
      ↓
1. Create Gitea Milestone → "[Evolution] {gap_description}"
      ↓
2. Create Research Issue → Track research phase
      ↓
3. Run History Search → @history-miner checks git history
      ↓
4. Analyze Gap → @capability-analyst classifies gap
      ↓
5. Design Component → @agent-architect creates specification
      ↓
6. Decision: Agent/Skill/Workflow?
      ↓
7. Create File → .kilo/agents/{name}.md (or skill/workflow)
      ↓
8. Self-Modify → Add permission to orchestrator.md whitelist
      ↓
9. Update capability-index.yaml → Register capabilities
      ↓
10. Verify Access → Test call to new agent
      ↓
11. Update Documentation → KILO_SPEC.md, AGENTS.md, EVOLUTION_LOG.md
      ↓
12. Close Milestone → Record results in Gitea
      ↓
[New Capability Available]
```

## Model Evolution Flow

When an agent's current model is suboptimal (score gap > 5 points in heatmap):

```
[Evolution Fitness < 0.85]
       ↓
1. Read model-benchmarks.json → load heatmap, recommendations
       ↓
2. IF stale (>7 days) → @capability-analyst researches models
   → Output: agent-evolution/data/model-research-latest.json
   → Validates against: agent-evolution/data/model-research.schema.json
       ↓
3. Identify agents where best_model ≠ current_model (gap > 5)
       ↓
4. Generate recommendations (action: update_model)
       ↓
5. Dry-run → /evolution --dry-run → Show what would change
       ↓
6. Apply → bun run agent-evolution/scripts/sync-model-research.ts
   → Updates: capability-index.yaml, agent-versions.json, kilo-meta.json, kilo.jsonc
   → Triggers: sync-agents.js --fix → propagates to .md files
   → Validates: sync-agents.js --check
       ↓
7. Re-test → @pipeline-judge → new fitness score
       ↓
8. IF fitness improved → commit changes
   IF fitness regressed → revert via agent-versions.json history
       ↓
9. Log to Gitea + fitness-history.jsonl
       ↓
[Models Optimized]
```

## Model Research Data Flow

```
[model-benchmarks.json]          ← Static benchmark data (refreshed weekly)
       ↓ read
[/evolution Step 0]              ← Checks staleness, triggers research if needed
[/research models]               ← Explicit research trigger
       ↓ produces
[model-research-latest.json]     ← Dynamic research output
       ↓ consumed by
[sync-model-research.ts]         ← Applies recommendations
       ↓ updates
[capability-index.yaml]          ← Model assignments
[agent-versions.json]            ← History tracking
[kilo-meta.json]                 ← Source of truth
[kilo.jsonc]                     ← Agent config (manual verify)
[.kilo/agents/*.md]              ← Frontmatter (via sync script)
       ↓ verified by
[sync-agents.js --check]         ← Consistency validation
```

### Key Files

| File | Purpose | Updated By |
|------|---------|------------|
| `agent-evolution/data/model-benchmarks.json` | Static benchmark data | `/research models`, `/evolution research` |
| `agent-evolution/data/model-research-latest.json` | Latest research output | `/research models`, `/evolution Step 0` |
| `agent-evolution/data/model-research.schema.json` | Validation schema | Manual (schema changes are rare) |
| `agent-evolution/data/model-benchmarks.schema.json` | Benchmarks data schema | Manual |
| `agent-evolution/data/agent-versions.json` | Version history | `sync-model-research.ts` |
| `agent-evolution/scripts/sync-model-research.ts` | Application script | Manual execution |

## Self-Modification Rules

1. ONLY modify own permission whitelist
2. NEVER modify other agents' definitions
3. ALWAYS create milestone before changes
4. ALWAYS verify access after changes
5. ALWAYS log results to `.kilo/EVOLUTION_LOG.md`
6. NEVER skip verification step
7. ALWAYS validate research output against schema before applying
8. NEVER apply model changes without dry-run preview first
9. ALWAYS run sync-agents.js --check after model changes
10. ALWAYS revert if fitness regresses after model change

## Evolution Triggers

- Task type not in capability Routing Map
- capability-analyst reports critical gap
- Repeated task failures for same reason
- User requests new specialized capability

## File Modifications (in order)

1. Create `.kilo/agents/{new-agent}.md` (or skill/workflow)
2. Update `.kilo/agents/orchestrator.md` (add permission)
3. Update `.kilo/capability-index.yaml` (register capabilities)
4. Update `.kilo/KILO_SPEC.md` (document)
5. Update `AGENTS.md` (reference)
6. Append to `.kilo/EVOLUTION_LOG.md` (log entry)
7. Update `agent-evolution/data/model-benchmarks.json` (if model data changed)
8. Update `agent-evolution/data/agent-versions.json` (add history entry)
9. Update `kilo-meta.json` (source of truth for sync)
10. Run `node scripts/sync-agents.js --fix` (propagate to all files)
11. Run `node scripts/sync-agents.js --check` (verify consistency)

## Verification Checklist

After each evolution:
- [ ] Agent file created and valid YAML frontmatter
- [ ] Permission added to orchestrator.md
- [ ] Capability registered in capability-index.yaml
- [ ] Test call succeeds (Task tool returns valid response)
- [ ] KILO_SPEC.md updated with new agent
- [ ] AGENTS.md updated with new agent
- [ ] EVOLUTION_LOG.md updated with entry
- [ ] Gitea milestone closed with results
- [ ] model-research-latest.json validates against schema
- [ ] sync-model-research.ts dry-run shows correct changes
- [ ] capability-index.yaml model field updated for affected agents
- [ ] agent-versions.json history entry added with rationale
- [ ] kilo-meta.json matches new model assignments
- [ ] kilo.jsonc manually verified (sync script does not guarantee this)
- [ ] sync-agents.js --check passes
- [ ] No stale models leaked (grep for previous model IDs)
- [ ] Cloud model suffix correct (kimi-k2.6:cloud, not kimi-k2.6)