- capability-analyst: nemotron-3-super → qwen3.6-plus:free (+23% quality, IF:90, FREE) - requirement-refiner: nemotron-3-super → glm-5 (+33% quality) - agent-architect: nemotron-3-super → qwen3.6-plus:free (+22% quality) - evaluator: nemotron-3-super → qwen3.6-plus:free (+4% quality) - Add /evolution workflow for tracking agent improvements - Update agent-versions.json with evolution history
5.1 KiB
5.1 KiB
Agent Evolution Workflow
Tracks and records agent model improvements, capability changes, and performance metrics.
Usage
/evolution [action] [agent]
Actions
| Action | Description |
|---|---|
log |
Log an agent improvement to Gitea and evolution data |
report |
Generate evolution report for agent or all agents |
history |
Show model change history |
metrics |
Display performance metrics |
recommend |
Get model recommendations |
Examples
# Log improvement
/evolution log capability-analyst "Updated to qwen3.6-plus for better IF score"
# Generate report
/evolution report capability-analyst
# Show all changes
/evolution history
# Get recommendations
/evolution recommend
Workflow Steps
Step 1: Parse Command
action=$1
agent=$2
message=$3
Step 2: Execute Action
Log Action
When logging an improvement:
-
Read current model
# From .kilo/agents/{agent}.md current_model=$(grep "^model:" .kilo/agents/${agent}.md | cut -d' ' -f2) # From .kilo/capability-index.yaml yaml_model=$(grep -A1 "${agent}:" .kilo/capability-index.yaml | grep "model:" | cut -d' ' -f2) -
Get previous model from history
# Read from agent-evolution/data/agent-versions.json previous_model=$(cat agent-evolution/data/agent-versions.json | ...) -
Calculate improvement
- Look up model scores from capability-index.yaml
- Compare IF scores
- Compare context windows
-
Write to evolution data
{ "agent": "capability-analyst", "timestamp": "2026-04-05T22:20:00Z", "type": "model_change", "from": "ollama-cloud/nemotron-3-super", "to": "qwen/qwen3.6-plus:free", "improvement": { "quality": "+23%", "context_window": "130K→1M", "if_score": "85→90" }, "rationale": "Better structured output, FREE via OpenRouter" } -
Post Gitea comment
## 🚀 Agent Evolution: {agent} | Metric | Before | After | Change | |--------|--------|-------|--------| | Model | {old} | {new} | ⬆️ | | IF Score | 85 | 90 | +5 | | Quality | 64 | 79 | +23% | | Context | 130K | 1M | +670K | **Rationale**: {message}
Report Action
Generate comprehensive report:
# Agent Evolution Report
## Overview
- Total agents: 28
- Model changes this month: 4
- Average quality improvement: +18%
## Recent Changes
| Date | Agent | Old Model | New Model | Impact |
|------|-------|-----------|-----------|--------|
| 2026-04-05 | capability-analyst | nemotron-3-super | qwen3.6-plus | +23% |
| 2026-04-05 | requirement-refiner | nemotron-3-super | glm-5 | +33% |
| ... | ... | ... | ... | ... |
## Performance Metrics
### Agent Scores Over Time
capability-analyst: 64 → 79 (+23%) requirement-refiner: 60 → 80 (+33%) agent-architect: 67 → 82 (+22%) evaluator: 78 → 81 (+4%)
### Model Distribution
- qwen3.6-plus: 5 agents
- nemotron-3-super: 8 agents
- glm-5: 3 agents
- minimax-m2.5: 1 agent
- ...
## Recommendations
1. Consider updating history-miner to nemotron-3-super-120b
2. code-skeptic optimal with minimax-m2.5
3. ...
Step 3: Update Files
After logging:
- Update
agent-evolution/data/agent-versions.json - Post comment to related Gitea issue
- Update capability-index.yaml metrics
Data Storage
agent-versions.json
{
"version": "1.0",
"agents": {
"capability-analyst": {
"current": {
"model": "qwen/qwen3.6-plus:free",
"provider": "openrouter",
"if_score": 90,
"quality_score": 79,
"context_window": "1M"
},
"history": [
{
"date": "2026-04-05T22:20:00Z",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "qwen/qwen3.6-plus:free",
"rationale": "Better IF score, FREE via OpenRouter"
}
]
}
}
}
Gitea Issue Comments
Each evolution log posts a formatted comment:
## 🚀 Agent Evolution Log
### {agent}
- **Model**: {old} → {new}
- **Quality**: {old_score} → {new_score} ({change}%)
- **Context**: {old_ctx} → {new_ctx}
- **Rationale**: {reason}
_This change was tracked by /evolution workflow._
Integration Points
- After
/pipeline: Evaluator scores logged - After model update: Evolution logged
- Weekly: Performance report generated
- On request: Recommendations provided
Metrics Tracked
| Metric | Source | Purpose |
|---|---|---|
| IF Score | KILO_SPEC.md | Instruction Following |
| Quality Score | Research | Overall performance |
| Context Window | Model spec | Max tokens |
| Provider | Config | API endpoint |
| Cost | Pricing | Resource planning |
| SWE-bench | Research | Code benchmark |
| RULER | Research | Long-context benchmark |
Example Session
$ /evolution log capability-analyst "Updated to qwen3.6-plus for FREE tier and better IF"
✅ Logged evolution for capability-analyst
📊 Quality improvement: +23%
📄 Posted comment to Issue #27
📝 Updated agent-versions.json
Evolution workflow v1.0 - Track agent improvements