- Integrate apaw_agent_model_research_v3.html as standalone dashboard - Add model-benchmarks.json with 32 agents, 11 scored models, 11 recommendations - Add build-research-dashboard.ts: inject live data into template → standalone HTML - Add rebuild-template.cjs: regenerate template from v3.html source - Add sync-benchmarks-from-yaml.cjs: sync YAML → JSON round-trip - Add sync-model-research.ts: apply recommendation matrix to config files - Add model-benchmarks.schema.json and model-research.schema.json for validation - Add bidirectional-data-flow.md architecture documentation - Add log-execution.cjs pipeline hook - Update capability-index.yaml: add fallback_models, failover_strategy - Update kilo-meta.json, kilo.jsonc, KILO_SPEC.md with synced models - Update evolution.md / research.md / self-evolution.md / evolutionary-sync.md docs - Fix security-auditor.md: quote YAML color (#DC2626) - Fix orchestrator.md: remove duplicate devops-engineer key - Build research-dashboard.html (106KB standalone) + dated archive
6.2 KiB
6.2 KiB
Self-Evolution Protocol
When task requirements exceed existing agent capabilities.
Trigger Conditions
- No agent matches task requirements
- Required domain knowledge not in any skill
- Complex multi-step task needs new workflow pattern
@capability-analystreports critical gap/evolutionreports fitness < 0.70 and model research finds better model- Model benchmarks stale (>7 days) and research discovers new model
Evolution Flow
[Gap Detected]
↓
1. Create Gitea Milestone → "[Evolution] {gap_description}"
↓
2. Create Research Issue → Track research phase
↓
3. Run History Search → @history-miner checks git history
↓
4. Analyze Gap → @capability-analyst classifies gap
↓
5. Design Component → @agent-architect creates specification
↓
6. Decision: Agent/Skill/Workflow?
↓
7. Create File → .kilo/agents/{name}.md (or skill/workflow)
↓
8. Self-Modify → Add permission to orchestrator.md whitelist
↓
9. Update capability-index.yaml → Register capabilities
↓
10. Verify Access → Test call to new agent
↓
11. Update Documentation → KILO_SPEC.md, AGENTS.md, EVOLUTION_LOG.md
↓
12. Close Milestone → Record results in Gitea
↓
[New Capability Available]
Model Evolution Flow
When an agent's current model is suboptimal (score gap > 5 points in heatmap):
[Evolution Fitness < 0.85]
↓
1. Read model-benchmarks.json → load heatmap, recommendations
↓
2. IF stale (>7 days) → @capability-analyst researches models
→ Output: agent-evolution/data/model-research-latest.json
→ Validates against: agent-evolution/data/model-research.schema.json
↓
3. Identify agents where best_model ≠ current_model (gap > 5)
↓
4. Generate recommendations (action: update_model)
↓
5. Dry-run → /evolution --dry-run → Show what would change
↓
6. Apply → bun run agent-evolution/scripts/sync-model-research.ts
→ Updates: capability-index.yaml, agent-versions.json, kilo-meta.json, kilo.jsonc
→ Triggers: sync-agents.js --fix → propagates to .md files
→ Validates: sync-agents.js --check
↓
7. Re-test → @pipeline-judge → new fitness score
↓
8. IF fitness improved → commit changes
IF fitness regressed → revert via agent-versions.json history
↓
9. Log to Gitea + fitness-history.jsonl
↓
[Models Optimized]
Model Research Data Flow
[model-benchmarks.json] ← Static benchmark data (refreshed weekly)
↓ read
[/evolution Step 0] ← Checks staleness, triggers research if needed
[/research models] ← Explicit research trigger
↓ produces
[model-research-latest.json] ← Dynamic research output
↓ consumed by
[sync-model-research.ts] ← Applies recommendations
↓ updates
[capability-index.yaml] ← Model assignments
[agent-versions.json] ← History tracking
[kilo-meta.json] ← Source of truth
[kilo.jsonc] ← Agent config (manual verify)
[.kilo/agents/*.md] ← Frontmatter (via sync script)
↓ verified by
[sync-agents.js --check] ← Consistency validation
Key Files
| File | Purpose | Updated By |
|---|---|---|
agent-evolution/data/model-benchmarks.json |
Static benchmark data | /research models, /evolution research |
agent-evolution/data/model-research-latest.json |
Latest research output | /research models, /evolution Step 0 |
agent-evolution/data/model-research.schema.json |
Validation schema | Manual (schema changes are rare) |
agent-evolution/data/model-benchmarks.schema.json |
Benchmarks data schema | Manual |
agent-evolution/data/agent-versions.json |
Version history | sync-model-research.ts |
agent-evolution/scripts/sync-model-research.ts |
Application script | Manual execution |
Self-Modification Rules
- ONLY modify own permission whitelist
- NEVER modify other agents' definitions
- ALWAYS create milestone before changes
- ALWAYS verify access after changes
- ALWAYS log results to
.kilo/EVOLUTION_LOG.md - NEVER skip verification step
- ALWAYS validate research output against schema before applying
- NEVER apply model changes without dry-run preview first
- ALWAYS run sync-agents.js --check after model changes
- ALWAYS revert if fitness regresses after model change
Evolution Triggers
- Task type not in capability Routing Map
- capability-analyst reports critical gap
- Repeated task failures for same reason
- User requests new specialized capability
File Modifications (in order)
- Create
.kilo/agents/{new-agent}.md(or skill/workflow) - Update
.kilo/agents/orchestrator.md(add permission) - Update
.kilo/capability-index.yaml(register capabilities) - Update
.kilo/KILO_SPEC.md(document) - Update
AGENTS.md(reference) - Append to
.kilo/EVOLUTION_LOG.md(log entry) - Update
agent-evolution/data/model-benchmarks.json(if model data changed) - Update
agent-evolution/data/agent-versions.json(add history entry) - Update
kilo-meta.json(source of truth for sync) - Run
node scripts/sync-agents.js --fix(propagate to all files) - Run
node scripts/sync-agents.js --check(verify consistency)
Verification Checklist
After each evolution:
- Agent file created and valid YAML frontmatter
- Permission added to orchestrator.md
- Capability registered in capability-index.yaml
- Test call succeeds (Task tool returns valid response)
- KILO_SPEC.md updated with new agent
- AGENTS.md updated with new agent
- EVOLUTION_LOG.md updated with entry
- Gitea milestone closed with results
- model-research-latest.json validates against schema
- sync-model-research.ts dry-run shows correct changes
- capability-index.yaml model field updated for affected agents
- agent-versions.json history entry added with rationale
- kilo-meta.json matches new model assignments
- kilo.jsonc manually verified (sync script does not guarantee this)
- sync-agents.js --check passes
- No stale models leaked (grep for previous model IDs)
- Cloud model suffix correct (kimi-k2.6:cloud, not kimi-k2.6)