- Integrate apaw_agent_model_research_v3.html as standalone dashboard - Add model-benchmarks.json with 32 agents, 11 scored models, 11 recommendations - Add build-research-dashboard.ts: inject live data into template → standalone HTML - Add rebuild-template.cjs: regenerate template from v3.html source - Add sync-benchmarks-from-yaml.cjs: sync YAML → JSON round-trip - Add sync-model-research.ts: apply recommendation matrix to config files - Add model-benchmarks.schema.json and model-research.schema.json for validation - Add bidirectional-data-flow.md architecture documentation - Add log-execution.cjs pipeline hook - Update capability-index.yaml: add fallback_models, failover_strategy - Update kilo-meta.json, kilo.jsonc, KILO_SPEC.md with synced models - Update evolution.md / research.md / self-evolution.md / evolutionary-sync.md docs - Fix security-auditor.md: quote YAML color (#DC2626) - Fix orchestrator.md: remove duplicate devops-engineer key - Build research-dashboard.html (106KB standalone) + dated archive
166 lines
6.2 KiB
Markdown
166 lines
6.2 KiB
Markdown
# Self-Evolution Protocol
|
|
|
|
When task requirements exceed existing agent capabilities.
|
|
|
|
## Trigger Conditions
|
|
|
|
1. No agent matches task requirements
|
|
2. Required domain knowledge not in any skill
|
|
3. Complex multi-step task needs new workflow pattern
|
|
4. `@capability-analyst` reports critical gap
|
|
5. `/evolution` reports fitness < 0.70 and model research finds better model
|
|
6. Model benchmarks stale (>7 days) and research discovers new model
|
|
|
|
## Evolution Flow
|
|
|
|
```
|
|
[Gap Detected]
|
|
↓
|
|
1. Create Gitea Milestone → "[Evolution] {gap_description}"
|
|
↓
|
|
2. Create Research Issue → Track research phase
|
|
↓
|
|
3. Run History Search → @history-miner checks git history
|
|
↓
|
|
4. Analyze Gap → @capability-analyst classifies gap
|
|
↓
|
|
5. Design Component → @agent-architect creates specification
|
|
↓
|
|
6. Decision: Agent/Skill/Workflow?
|
|
↓
|
|
7. Create File → .kilo/agents/{name}.md (or skill/workflow)
|
|
↓
|
|
8. Self-Modify → Add permission to orchestrator.md whitelist
|
|
↓
|
|
9. Update capability-index.yaml → Register capabilities
|
|
↓
|
|
10. Verify Access → Test call to new agent
|
|
↓
|
|
11. Update Documentation → KILO_SPEC.md, AGENTS.md, EVOLUTION_LOG.md
|
|
↓
|
|
12. Close Milestone → Record results in Gitea
|
|
↓
|
|
[New Capability Available]
|
|
```
|
|
|
|
## Model Evolution Flow
|
|
|
|
When an agent's current model is suboptimal (score gap > 5 points in heatmap):
|
|
|
|
```
|
|
[Evolution Fitness < 0.85]
|
|
↓
|
|
1. Read model-benchmarks.json → load heatmap, recommendations
|
|
↓
|
|
2. IF stale (>7 days) → @capability-analyst researches models
|
|
→ Output: agent-evolution/data/model-research-latest.json
|
|
→ Validates against: agent-evolution/data/model-research.schema.json
|
|
↓
|
|
3. Identify agents where best_model ≠ current_model (gap > 5)
|
|
↓
|
|
4. Generate recommendations (action: update_model)
|
|
↓
|
|
5. Dry-run → /evolution --dry-run → Show what would change
|
|
↓
|
|
6. Apply → bun run agent-evolution/scripts/sync-model-research.ts
|
|
→ Updates: capability-index.yaml, agent-versions.json, kilo-meta.json, kilo.jsonc
|
|
→ Triggers: sync-agents.js --fix → propagates to .md files
|
|
→ Validates: sync-agents.js --check
|
|
↓
|
|
7. Re-test → @pipeline-judge → new fitness score
|
|
↓
|
|
8. IF fitness improved → commit changes
|
|
IF fitness regressed → revert via agent-versions.json history
|
|
↓
|
|
9. Log to Gitea + fitness-history.jsonl
|
|
↓
|
|
[Models Optimized]
|
|
```
|
|
|
|
## Model Research Data Flow
|
|
|
|
```
|
|
[model-benchmarks.json] ← Static benchmark data (refreshed weekly)
|
|
↓ read
|
|
[/evolution Step 0] ← Checks staleness, triggers research if needed
|
|
[/research models] ← Explicit research trigger
|
|
↓ produces
|
|
[model-research-latest.json] ← Dynamic research output
|
|
↓ consumed by
|
|
[sync-model-research.ts] ← Applies recommendations
|
|
↓ updates
|
|
[capability-index.yaml] ← Model assignments
|
|
[agent-versions.json] ← History tracking
|
|
[kilo-meta.json] ← Source of truth
|
|
[kilo.jsonc] ← Agent config (manual verify)
|
|
[.kilo/agents/*.md] ← Frontmatter (via sync script)
|
|
↓ verified by
|
|
[sync-agents.js --check] ← Consistency validation
|
|
```
|
|
|
|
### Key Files
|
|
|
|
| File | Purpose | Updated By |
|
|
|------|---------|------------|
|
|
| `agent-evolution/data/model-benchmarks.json` | Static benchmark data | `/research models`, `/evolution research` |
|
|
| `agent-evolution/data/model-research-latest.json` | Latest research output | `/research models`, `/evolution Step 0` |
|
|
| `agent-evolution/data/model-research.schema.json` | Validation schema | Manual (schema changes are rare) |
|
|
| `agent-evolution/data/model-benchmarks.schema.json` | Benchmarks data schema | Manual |
|
|
| `agent-evolution/data/agent-versions.json` | Version history | `sync-model-research.ts` |
|
|
| `agent-evolution/scripts/sync-model-research.ts` | Application script | Manual execution |
|
|
|
|
## Self-Modification Rules
|
|
|
|
1. ONLY modify own permission whitelist
|
|
2. NEVER modify other agents' definitions
|
|
3. ALWAYS create milestone before changes
|
|
4. ALWAYS verify access after changes
|
|
5. ALWAYS log results to `.kilo/EVOLUTION_LOG.md`
|
|
6. NEVER skip verification step
|
|
7. ALWAYS validate research output against schema before applying
|
|
8. NEVER apply model changes without dry-run preview first
|
|
9. ALWAYS run sync-agents.js --check after model changes
|
|
10. ALWAYS revert if fitness regresses after model change
|
|
|
|
## Evolution Triggers
|
|
|
|
- Task type not in capability Routing Map
|
|
- capability-analyst reports critical gap
|
|
- Repeated task failures for same reason
|
|
- User requests new specialized capability
|
|
|
|
## File Modifications (in order)
|
|
|
|
1. Create `.kilo/agents/{new-agent}.md` (or skill/workflow)
|
|
2. Update `.kilo/agents/orchestrator.md` (add permission)
|
|
3. Update `.kilo/capability-index.yaml` (register capabilities)
|
|
4. Update `.kilo/KILO_SPEC.md` (document)
|
|
5. Update `AGENTS.md` (reference)
|
|
6. Append to `.kilo/EVOLUTION_LOG.md` (log entry)
|
|
7. Update `agent-evolution/data/model-benchmarks.json` (if model data changed)
|
|
8. Update `agent-evolution/data/agent-versions.json` (add history entry)
|
|
9. Update `kilo-meta.json` (source of truth for sync)
|
|
10. Run `node scripts/sync-agents.js --fix` (propagate to all files)
|
|
11. Run `node scripts/sync-agents.js --check` (verify consistency)
|
|
|
|
## Verification Checklist
|
|
|
|
After each evolution:
|
|
- [ ] Agent file created and valid YAML frontmatter
|
|
- [ ] Permission added to orchestrator.md
|
|
- [ ] Capability registered in capability-index.yaml
|
|
- [ ] Test call succeeds (Task tool returns valid response)
|
|
- [ ] KILO_SPEC.md updated with new agent
|
|
- [ ] AGENTS.md updated with new agent
|
|
- [ ] EVOLUTION_LOG.md updated with entry
|
|
- [ ] Gitea milestone closed with results
|
|
- [ ] model-research-latest.json validates against schema
|
|
- [ ] sync-model-research.ts dry-run shows correct changes
|
|
- [ ] capability-index.yaml model field updated for affected agents
|
|
- [ ] agent-versions.json history entry added with rationale
|
|
- [ ] kilo-meta.json matches new model assignments
|
|
- [ ] kilo.jsonc manually verified (sync script does not guarantee this)
|
|
- [ ] sync-agents.js --check passes
|
|
- [ ] No stale models leaked (grep for previous model IDs)
|
|
- [ ] Cloud model suffix correct (kimi-k2.6:cloud, not kimi-k2.6)
|