feat: upgrade agent models based on research findings
- capability-analyst: nemotron-3-super → qwen3.6-plus:free (+23% quality, IF:90, FREE) - requirement-refiner: nemotron-3-super → glm-5 (+33% quality) - agent-architect: nemotron-3-super → qwen3.6-plus:free (+22% quality) - evaluator: nemotron-3-super → qwen3.6-plus:free (+4% quality) - Add /evolution workflow for tracking agent improvements - Update agent-versions.json with evolution history
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
---
|
||||
name: Agent Architect
|
||||
mode: subagent
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: qwen/qwen3.6-plus:free
|
||||
description: Creates, modifies, and reviews new agents, workflows, and skills based on capability gap analysis
|
||||
color: "#8B5CF6"
|
||||
permission:
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
description: Analyzes task requirements against available agents, workflows, and skills. Identifies gaps and recommends new components.
|
||||
mode: subagent
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: qwen/qwen3.6-plus:free
|
||||
color: "#6366F1"
|
||||
---
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
description: Scores agent effectiveness after task completion for continuous improvement
|
||||
mode: subagent
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: qwen/qwen3.6-plus:free
|
||||
color: "#047857"
|
||||
permission:
|
||||
read: allow
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
description: Converts vague ideas and bug reports into strict User Stories with acceptance criteria checklists
|
||||
mode: all
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: ollama-cloud/glm-5
|
||||
color: "#4F46E5"
|
||||
permission:
|
||||
read: allow
|
||||
|
||||
@@ -267,7 +267,7 @@ agents:
|
||||
- requirements_doc
|
||||
forbidden:
|
||||
- design_decisions
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: ollama-cloud/glm-5
|
||||
mode: subagent
|
||||
|
||||
history-miner:
|
||||
@@ -302,7 +302,7 @@ agents:
|
||||
- new_agent_specs
|
||||
forbidden:
|
||||
- implementation
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: qwen/qwen3.6-plus:free
|
||||
mode: subagent
|
||||
|
||||
# Process Management
|
||||
@@ -358,7 +358,7 @@ agents:
|
||||
- recommendations
|
||||
forbidden:
|
||||
- code_changes
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: qwen/qwen3.6-plus:free
|
||||
mode: subagent
|
||||
|
||||
prompt-optimizer:
|
||||
@@ -457,7 +457,7 @@ agents:
|
||||
- integration_plan
|
||||
forbidden:
|
||||
- agent_execution
|
||||
model: ollama-cloud/nemotron-3-super
|
||||
model: qwen/qwen3.6-plus:free
|
||||
mode: subagent
|
||||
|
||||
# Cognitive Enhancement (New - Research Based)
|
||||
|
||||
237
.kilo/commands/evolution.md
Normal file
237
.kilo/commands/evolution.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Agent Evolution Workflow
|
||||
|
||||
Tracks and records agent model improvements, capability changes, and performance metrics.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/evolution [action] [agent]
|
||||
```
|
||||
|
||||
### Actions
|
||||
|
||||
| Action | Description |
|
||||
|--------|-------------|
|
||||
| `log` | Log an agent improvement to Gitea and evolution data |
|
||||
| `report` | Generate evolution report for agent or all agents |
|
||||
| `history` | Show model change history |
|
||||
| `metrics` | Display performance metrics |
|
||||
| `recommend` | Get model recommendations |
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Log improvement
|
||||
/evolution log capability-analyst "Updated to qwen3.6-plus for better IF score"
|
||||
|
||||
# Generate report
|
||||
/evolution report capability-analyst
|
||||
|
||||
# Show all changes
|
||||
/evolution history
|
||||
|
||||
# Get recommendations
|
||||
/evolution recommend
|
||||
```
|
||||
|
||||
## Workflow Steps
|
||||
|
||||
### Step 1: Parse Command
|
||||
|
||||
```bash
|
||||
action=$1
|
||||
agent=$2
|
||||
message=$3
|
||||
```
|
||||
|
||||
### Step 2: Execute Action
|
||||
|
||||
#### Log Action
|
||||
|
||||
When logging an improvement:
|
||||
|
||||
1. **Read current model**
|
||||
```bash
|
||||
# From .kilo/agents/{agent}.md
|
||||
current_model=$(grep "^model:" .kilo/agents/${agent}.md | cut -d' ' -f2)
|
||||
|
||||
# From .kilo/capability-index.yaml
|
||||
yaml_model=$(grep -A1 "${agent}:" .kilo/capability-index.yaml | grep "model:" | cut -d' ' -f2)
|
||||
```
|
||||
|
||||
2. **Get previous model from history**
|
||||
```bash
|
||||
# Read from agent-evolution/data/agent-versions.json
|
||||
previous_model=$(cat agent-evolution/data/agent-versions.json | ...)
|
||||
```
|
||||
|
||||
3. **Calculate improvement**
|
||||
- Look up model scores from capability-index.yaml
|
||||
- Compare IF scores
|
||||
- Compare context windows
|
||||
|
||||
4. **Write to evolution data**
|
||||
```json
|
||||
{
|
||||
"agent": "capability-analyst",
|
||||
"timestamp": "2026-04-05T22:20:00Z",
|
||||
"type": "model_change",
|
||||
"from": "ollama-cloud/nemotron-3-super",
|
||||
"to": "qwen/qwen3.6-plus:free",
|
||||
"improvement": {
|
||||
"quality": "+23%",
|
||||
"context_window": "130K→1M",
|
||||
"if_score": "85→90"
|
||||
},
|
||||
"rationale": "Better structured output, FREE via OpenRouter"
|
||||
}
|
||||
```
|
||||
|
||||
5. **Post Gitea comment**
|
||||
```markdown
|
||||
## 🚀 Agent Evolution: {agent}
|
||||
|
||||
| Metric | Before | After | Change |
|
||||
|--------|--------|-------|--------|
|
||||
| Model | {old} | {new} | ⬆️ |
|
||||
| IF Score | 85 | 90 | +5 |
|
||||
| Quality | 64 | 79 | +23% |
|
||||
| Context | 130K | 1M | +670K |
|
||||
|
||||
**Rationale**: {message}
|
||||
```
|
||||
|
||||
#### Report Action
|
||||
|
||||
Generate comprehensive report:
|
||||
|
||||
```markdown
|
||||
# Agent Evolution Report
|
||||
|
||||
## Overview
|
||||
|
||||
- Total agents: 28
|
||||
- Model changes this month: 4
|
||||
- Average quality improvement: +18%
|
||||
|
||||
## Recent Changes
|
||||
|
||||
| Date | Agent | Old Model | New Model | Impact |
|
||||
|------|-------|-----------|-----------|--------|
|
||||
| 2026-04-05 | capability-analyst | nemotron-3-super | qwen3.6-plus | +23% |
|
||||
| 2026-04-05 | requirement-refiner | nemotron-3-super | glm-5 | +33% |
|
||||
| ... | ... | ... | ... | ... |
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
### Agent Scores Over Time
|
||||
|
||||
```
|
||||
capability-analyst: 64 → 79 (+23%)
|
||||
requirement-refiner: 60 → 80 (+33%)
|
||||
agent-architect: 67 → 82 (+22%)
|
||||
evaluator: 78 → 81 (+4%)
|
||||
```
|
||||
|
||||
### Model Distribution
|
||||
|
||||
- qwen3.6-plus: 5 agents
|
||||
- nemotron-3-super: 8 agents
|
||||
- glm-5: 3 agents
|
||||
- minimax-m2.5: 1 agent
|
||||
- ...
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. Consider updating history-miner to nemotron-3-super-120b
|
||||
2. code-skeptic optimal with minimax-m2.5
|
||||
3. ...
|
||||
```
|
||||
|
||||
### Step 3: Update Files
|
||||
|
||||
After logging:
|
||||
|
||||
1. Update `agent-evolution/data/agent-versions.json`
|
||||
2. Post comment to related Gitea issue
|
||||
3. Update capability-index.yaml metrics
|
||||
|
||||
## Data Storage
|
||||
|
||||
### agent-versions.json
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"agents": {
|
||||
"capability-analyst": {
|
||||
"current": {
|
||||
"model": "qwen/qwen3.6-plus:free",
|
||||
"provider": "openrouter",
|
||||
"if_score": 90,
|
||||
"quality_score": 79,
|
||||
"context_window": "1M"
|
||||
},
|
||||
"history": [
|
||||
{
|
||||
"date": "2026-04-05T22:20:00Z",
|
||||
"type": "model_change",
|
||||
"from": "ollama-cloud/nemotron-3-super",
|
||||
"to": "qwen/qwen3.6-plus:free",
|
||||
"rationale": "Better IF score, FREE via OpenRouter"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Gitea Issue Comments
|
||||
|
||||
Each evolution log posts a formatted comment:
|
||||
|
||||
```markdown
|
||||
## 🚀 Agent Evolution Log
|
||||
|
||||
### {agent}
|
||||
- **Model**: {old} → {new}
|
||||
- **Quality**: {old_score} → {new_score} ({change}%)
|
||||
- **Context**: {old_ctx} → {new_ctx}
|
||||
- **Rationale**: {reason}
|
||||
|
||||
_This change was tracked by /evolution workflow._
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
- **After `/pipeline`**: Evaluator scores logged
|
||||
- **After model update**: Evolution logged
|
||||
- **Weekly**: Performance report generated
|
||||
- **On request**: Recommendations provided
|
||||
|
||||
## Metrics Tracked
|
||||
|
||||
| Metric | Source | Purpose |
|
||||
|--------|--------|---------|
|
||||
| IF Score | KILO_SPEC.md | Instruction Following |
|
||||
| Quality Score | Research | Overall performance |
|
||||
| Context Window | Model spec | Max tokens |
|
||||
| Provider | Config | API endpoint |
|
||||
| Cost | Pricing | Resource planning |
|
||||
| SWE-bench | Research | Code benchmark |
|
||||
| RULER | Research | Long-context benchmark |
|
||||
|
||||
## Example Session
|
||||
|
||||
```bash
|
||||
$ /evolution log capability-analyst "Updated to qwen3.6-plus for FREE tier and better IF"
|
||||
|
||||
✅ Logged evolution for capability-analyst
|
||||
📊 Quality improvement: +23%
|
||||
📄 Posted comment to Issue #27
|
||||
📝 Updated agent-versions.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
_Evolution workflow v1.0 - Track agent improvements_
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"$schema": "./agent-versions.schema.json",
|
||||
"version": "1.0.0",
|
||||
"lastUpdated": "2026-04-05T17:27:00Z",
|
||||
"lastUpdated": "2026-04-05T22:30:00Z",
|
||||
"agents": {
|
||||
"lead-developer": {
|
||||
"current": {
|
||||
@@ -268,26 +268,30 @@
|
||||
},
|
||||
"requirement-refiner": {
|
||||
"current": {
|
||||
"model": "ollama-cloud/gpt-oss:120b",
|
||||
"model": "ollama-cloud/glm-5",
|
||||
"provider": "Ollama",
|
||||
"category": "Analysis",
|
||||
"mode": "subagent",
|
||||
"color": "#8B5CF6",
|
||||
"description": "Converts vague ideas into strict User Stories with acceptance criteria",
|
||||
"benchmark": {
|
||||
"swe_bench": 62.4,
|
||||
"fit_score": 62
|
||||
"swe_bench": null,
|
||||
"fit_score": 80,
|
||||
"context": "128K"
|
||||
},
|
||||
"capabilities": ["requirement_analysis", "user_story_creation", "acceptance_criteria", "clarification"],
|
||||
"recommendations": [
|
||||
{
|
||||
"target": "ollama-cloud/nemotron-3-super",
|
||||
"reason": "+22% quality, 1M context for specifications",
|
||||
"priority": "critical"
|
||||
}
|
||||
]
|
||||
"capabilities": ["requirement_analysis", "user_story_creation", "acceptance_criteria", "clarification"]
|
||||
},
|
||||
"history": [],
|
||||
"history": [
|
||||
{
|
||||
"date": "2026-04-05T22:30:00Z",
|
||||
"commit": "auto",
|
||||
"type": "model_change",
|
||||
"from": "ollama-cloud/nemotron-3-super",
|
||||
"to": "ollama-cloud/glm-5",
|
||||
"reason": "+33% quality. GLM-5 excels at requirement analysis and system engineering",
|
||||
"source": "research"
|
||||
}
|
||||
],
|
||||
"performance_log": []
|
||||
},
|
||||
"history-miner": {
|
||||
@@ -309,26 +313,31 @@
|
||||
},
|
||||
"capability-analyst": {
|
||||
"current": {
|
||||
"model": "ollama-cloud/gpt-oss:120b",
|
||||
"provider": "Ollama",
|
||||
"model": "qwen/qwen3.6-plus:free",
|
||||
"provider": "OpenRouter",
|
||||
"category": "Analysis",
|
||||
"mode": "subagent",
|
||||
"color": "#14B8A6",
|
||||
"description": "Analyzes task coverage and identifies gaps",
|
||||
"benchmark": {
|
||||
"swe_bench": 62.4,
|
||||
"fit_score": 66
|
||||
"swe_bench": 78.8,
|
||||
"fit_score": 90,
|
||||
"context": "1M",
|
||||
"free": true
|
||||
},
|
||||
"capabilities": ["gap_analysis", "capability_mapping", "recommendation_generation", "coverage_analysis"],
|
||||
"recommendations": [
|
||||
{
|
||||
"target": "ollama-cloud/nemotron-3-super",
|
||||
"reason": "+21% quality for gap analysis and recommendations",
|
||||
"priority": "critical"
|
||||
}
|
||||
]
|
||||
"capabilities": ["gap_analysis", "capability_mapping", "recommendation_generation", "coverage_analysis"]
|
||||
},
|
||||
"history": [],
|
||||
"history": [
|
||||
{
|
||||
"date": "2026-04-05T22:30:00Z",
|
||||
"commit": "auto",
|
||||
"type": "model_change",
|
||||
"from": "ollama-cloud/nemotron-3-super",
|
||||
"to": "qwen/qwen3.6-plus:free",
|
||||
"reason": "+23% quality, IF:90 score, 1M context, FREE via OpenRouter",
|
||||
"source": "research"
|
||||
}
|
||||
],
|
||||
"performance_log": []
|
||||
},
|
||||
"orchestrator": {
|
||||
@@ -367,15 +376,17 @@
|
||||
},
|
||||
"evaluator": {
|
||||
"current": {
|
||||
"model": "ollama-cloud/nemotron-3-super",
|
||||
"provider": "Ollama",
|
||||
"model": "qwen/qwen3.6-plus:free",
|
||||
"provider": "OpenRouter",
|
||||
"category": "Process",
|
||||
"mode": "subagent",
|
||||
"color": "#F97316",
|
||||
"description": "Scores agent effectiveness after task completion",
|
||||
"benchmark": {
|
||||
"swe_bench": 60.5,
|
||||
"fit_score": 82
|
||||
"swe_bench": 78.8,
|
||||
"fit_score": 90,
|
||||
"context": "1M",
|
||||
"free": true
|
||||
},
|
||||
"capabilities": ["performance_scoring", "process_analysis", "pattern_identification", "improvement_recommendations"]
|
||||
},
|
||||
@@ -388,6 +399,15 @@
|
||||
"to": "ollama-cloud/nemotron-3-super",
|
||||
"reason": "Nemotron 3 Super better for evaluation tasks",
|
||||
"source": "git"
|
||||
},
|
||||
{
|
||||
"date": "2026-04-05T22:30:00Z",
|
||||
"commit": "auto",
|
||||
"type": "model_change",
|
||||
"from": "ollama-cloud/nemotron-3-super",
|
||||
"to": "qwen/qwen3.6-plus:free",
|
||||
"reason": "+4% quality, IF:90 for scoring accuracy, FREE",
|
||||
"source": "research"
|
||||
}
|
||||
],
|
||||
"performance_log": []
|
||||
@@ -516,26 +536,31 @@
|
||||
},
|
||||
"agent-architect": {
|
||||
"current": {
|
||||
"model": "ollama-cloud/gpt-oss:120b",
|
||||
"provider": "Ollama",
|
||||
"model": "qwen/qwen3.6-plus:free",
|
||||
"provider": "OpenRouter",
|
||||
"category": "Meta",
|
||||
"mode": "subagent",
|
||||
"color": "#A855F7",
|
||||
"description": "Creates new agents when gaps identified",
|
||||
"benchmark": {
|
||||
"swe_bench": 62.4,
|
||||
"fit_score": 69
|
||||
"swe_bench": 78.8,
|
||||
"fit_score": 90,
|
||||
"context": "1M",
|
||||
"free": true
|
||||
},
|
||||
"capabilities": ["agent_design", "prompt_engineering", "capability_definition"],
|
||||
"recommendations": [
|
||||
{
|
||||
"target": "ollama-cloud/nemotron-3-super",
|
||||
"reason": "+19% quality for agent design",
|
||||
"priority": "high"
|
||||
}
|
||||
]
|
||||
"capabilities": ["agent_design", "prompt_engineering", "capability_definition"]
|
||||
},
|
||||
"history": [],
|
||||
"history": [
|
||||
{
|
||||
"date": "2026-04-05T22:30:00Z",
|
||||
"commit": "auto",
|
||||
"type": "model_change",
|
||||
"from": "ollama-cloud/nemotron-3-super",
|
||||
"to": "qwen/qwen3.6-plus:free",
|
||||
"reason": "+22% quality, IF:90 for YAML frontmatter generation, 1M context for all agents analysis",
|
||||
"source": "research"
|
||||
}
|
||||
],
|
||||
"performance_log": []
|
||||
},
|
||||
"planner": {
|
||||
@@ -701,11 +726,11 @@
|
||||
]
|
||||
}
|
||||
},
|
||||
"evolution_metrics": {
|
||||
"evolution_metrics": {
|
||||
"total_agents": 32,
|
||||
"agents_with_history": 12,
|
||||
"pending_recommendations": 6,
|
||||
"last_sync": "2026-04-05T17:27:00Z",
|
||||
"sync_sources": ["git", "capability-index.yaml", "kilo.jsonc"]
|
||||
"agents_with_history": 16,
|
||||
"pending_recommendations": 0,
|
||||
"last_sync": "2026-04-05T22:30:00Z",
|
||||
"sync_sources": ["git", "capability-index.yaml", "kilo.jsonc", "research"]
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user