feat: upgrade agent models based on research findings

- capability-analyst: nemotron-3-super → qwen3.6-plus:free (+23% quality, IF:90, FREE)
- requirement-refiner: nemotron-3-super → glm-5 (+33% quality)
- agent-architect: nemotron-3-super → qwen3.6-plus:free (+22% quality)
- evaluator: nemotron-3-super → qwen3.6-plus:free (+4% quality)
- Add /evolution workflow for tracking agent improvements
- Update agent-versions.json with evolution history
This commit is contained in:
¨NW¨
2026-04-05 23:37:23 +01:00
parent fe28aa5922
commit a4e09ad5d5
7 changed files with 318 additions and 56 deletions

View File

@@ -1,7 +1,7 @@
---
name: Agent Architect
mode: subagent
model: ollama-cloud/nemotron-3-super
model: qwen/qwen3.6-plus:free
description: Creates, modifies, and reviews new agents, workflows, and skills based on capability gap analysis
color: "#8B5CF6"
permission:

View File

@@ -1,7 +1,7 @@
---
description: Analyzes task requirements against available agents, workflows, and skills. Identifies gaps and recommends new components.
mode: subagent
model: ollama-cloud/nemotron-3-super
model: qwen/qwen3.6-plus:free
color: "#6366F1"
---

View File

@@ -1,7 +1,7 @@
---
description: Scores agent effectiveness after task completion for continuous improvement
mode: subagent
model: ollama-cloud/nemotron-3-super
model: qwen/qwen3.6-plus:free
color: "#047857"
permission:
read: allow

View File

@@ -1,7 +1,7 @@
---
description: Converts vague ideas and bug reports into strict User Stories with acceptance criteria checklists
mode: all
model: ollama-cloud/nemotron-3-super
model: ollama-cloud/glm-5
color: "#4F46E5"
permission:
read: allow

View File

@@ -267,7 +267,7 @@ agents:
- requirements_doc
forbidden:
- design_decisions
model: ollama-cloud/nemotron-3-super
model: ollama-cloud/glm-5
mode: subagent
history-miner:
@@ -302,7 +302,7 @@ agents:
- new_agent_specs
forbidden:
- implementation
model: ollama-cloud/nemotron-3-super
model: qwen/qwen3.6-plus:free
mode: subagent
# Process Management
@@ -358,7 +358,7 @@ agents:
- recommendations
forbidden:
- code_changes
model: ollama-cloud/nemotron-3-super
model: qwen/qwen3.6-plus:free
mode: subagent
prompt-optimizer:
@@ -457,7 +457,7 @@ agents:
- integration_plan
forbidden:
- agent_execution
model: ollama-cloud/nemotron-3-super
model: qwen/qwen3.6-plus:free
mode: subagent
# Cognitive Enhancement (New - Research Based)

237
.kilo/commands/evolution.md Normal file
View File

@@ -0,0 +1,237 @@
# Agent Evolution Workflow
Tracks and records agent model improvements, capability changes, and performance metrics.
## Usage
```
/evolution [action] [agent]
```
### Actions
| Action | Description |
|--------|-------------|
| `log` | Log an agent improvement to Gitea and evolution data |
| `report` | Generate evolution report for agent or all agents |
| `history` | Show model change history |
| `metrics` | Display performance metrics |
| `recommend` | Get model recommendations |
### Examples
```bash
# Log improvement
/evolution log capability-analyst "Updated to qwen3.6-plus for better IF score"
# Generate report
/evolution report capability-analyst
# Show all changes
/evolution history
# Get recommendations
/evolution recommend
```
## Workflow Steps
### Step 1: Parse Command
```bash
action=$1
agent=$2
message=$3
```
### Step 2: Execute Action
#### Log Action
When logging an improvement:
1. **Read current model**
```bash
# From .kilo/agents/{agent}.md
current_model=$(grep "^model:" .kilo/agents/${agent}.md | cut -d' ' -f2)
# From .kilo/capability-index.yaml
yaml_model=$(grep -A1 "${agent}:" .kilo/capability-index.yaml | grep "model:" | cut -d' ' -f2)
```
2. **Get previous model from history**
```bash
# Read from agent-evolution/data/agent-versions.json
previous_model=$(cat agent-evolution/data/agent-versions.json | ...)
```
3. **Calculate improvement**
- Look up model scores from capability-index.yaml
- Compare IF scores
- Compare context windows
4. **Write to evolution data**
```json
{
"agent": "capability-analyst",
"timestamp": "2026-04-05T22:20:00Z",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "qwen/qwen3.6-plus:free",
"improvement": {
"quality": "+23%",
"context_window": "130K→1M",
"if_score": "85→90"
},
"rationale": "Better structured output, FREE via OpenRouter"
}
```
5. **Post Gitea comment**
```markdown
## 🚀 Agent Evolution: {agent}
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Model | {old} | {new} | ⬆️ |
| IF Score | 85 | 90 | +5 |
| Quality | 64 | 79 | +23% |
| Context | 130K | 1M | +670K |
**Rationale**: {message}
```
#### Report Action
Generate comprehensive report:
```markdown
# Agent Evolution Report
## Overview
- Total agents: 28
- Model changes this month: 4
- Average quality improvement: +18%
## Recent Changes
| Date | Agent | Old Model | New Model | Impact |
|------|-------|-----------|-----------|--------|
| 2026-04-05 | capability-analyst | nemotron-3-super | qwen3.6-plus | +23% |
| 2026-04-05 | requirement-refiner | nemotron-3-super | glm-5 | +33% |
| ... | ... | ... | ... | ... |
## Performance Metrics
### Agent Scores Over Time
```
capability-analyst: 64 → 79 (+23%)
requirement-refiner: 60 → 80 (+33%)
agent-architect: 67 → 82 (+22%)
evaluator: 78 → 81 (+4%)
```
### Model Distribution
- qwen3.6-plus: 5 agents
- nemotron-3-super: 8 agents
- glm-5: 3 agents
- minimax-m2.5: 1 agent
- ...
## Recommendations
1. Consider updating history-miner to nemotron-3-super-120b
2. code-skeptic optimal with minimax-m2.5
3. ...
```
### Step 3: Update Files
After logging:
1. Update `agent-evolution/data/agent-versions.json`
2. Post comment to related Gitea issue
3. Update capability-index.yaml metrics
## Data Storage
### agent-versions.json
```json
{
"version": "1.0",
"agents": {
"capability-analyst": {
"current": {
"model": "qwen/qwen3.6-plus:free",
"provider": "openrouter",
"if_score": 90,
"quality_score": 79,
"context_window": "1M"
},
"history": [
{
"date": "2026-04-05T22:20:00Z",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "qwen/qwen3.6-plus:free",
"rationale": "Better IF score, FREE via OpenRouter"
}
]
}
}
}
```
### Gitea Issue Comments
Each evolution log posts a formatted comment:
```markdown
## 🚀 Agent Evolution Log
### {agent}
- **Model**: {old} → {new}
- **Quality**: {old_score} → {new_score} ({change}%)
- **Context**: {old_ctx} → {new_ctx}
- **Rationale**: {reason}
_This change was tracked by /evolution workflow._
```
## Integration Points
- **After `/pipeline`**: Evaluator scores logged
- **After model update**: Evolution logged
- **Weekly**: Performance report generated
- **On request**: Recommendations provided
## Metrics Tracked
| Metric | Source | Purpose |
|--------|--------|---------|
| IF Score | KILO_SPEC.md | Instruction Following |
| Quality Score | Research | Overall performance |
| Context Window | Model spec | Max tokens |
| Provider | Config | API endpoint |
| Cost | Pricing | Resource planning |
| SWE-bench | Research | Code benchmark |
| RULER | Research | Long-context benchmark |
## Example Session
```bash
$ /evolution log capability-analyst "Updated to qwen3.6-plus for FREE tier and better IF"
✅ Logged evolution for capability-analyst
📊 Quality improvement: +23%
📄 Posted comment to Issue #27
📝 Updated agent-versions.json
```
---
_Evolution workflow v1.0 - Track agent improvements_

View File

@@ -1,7 +1,7 @@
{
"$schema": "./agent-versions.schema.json",
"version": "1.0.0",
"lastUpdated": "2026-04-05T17:27:00Z",
"lastUpdated": "2026-04-05T22:30:00Z",
"agents": {
"lead-developer": {
"current": {
@@ -268,26 +268,30 @@
},
"requirement-refiner": {
"current": {
"model": "ollama-cloud/gpt-oss:120b",
"model": "ollama-cloud/glm-5",
"provider": "Ollama",
"category": "Analysis",
"mode": "subagent",
"color": "#8B5CF6",
"description": "Converts vague ideas into strict User Stories with acceptance criteria",
"benchmark": {
"swe_bench": 62.4,
"fit_score": 62
"swe_bench": null,
"fit_score": 80,
"context": "128K"
},
"capabilities": ["requirement_analysis", "user_story_creation", "acceptance_criteria", "clarification"],
"recommendations": [
{
"target": "ollama-cloud/nemotron-3-super",
"reason": "+22% quality, 1M context for specifications",
"priority": "critical"
}
]
"capabilities": ["requirement_analysis", "user_story_creation", "acceptance_criteria", "clarification"]
},
"history": [],
"history": [
{
"date": "2026-04-05T22:30:00Z",
"commit": "auto",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "ollama-cloud/glm-5",
"reason": "+33% quality. GLM-5 excels at requirement analysis and system engineering",
"source": "research"
}
],
"performance_log": []
},
"history-miner": {
@@ -309,26 +313,31 @@
},
"capability-analyst": {
"current": {
"model": "ollama-cloud/gpt-oss:120b",
"provider": "Ollama",
"model": "qwen/qwen3.6-plus:free",
"provider": "OpenRouter",
"category": "Analysis",
"mode": "subagent",
"color": "#14B8A6",
"description": "Analyzes task coverage and identifies gaps",
"benchmark": {
"swe_bench": 62.4,
"fit_score": 66
"swe_bench": 78.8,
"fit_score": 90,
"context": "1M",
"free": true
},
"capabilities": ["gap_analysis", "capability_mapping", "recommendation_generation", "coverage_analysis"],
"recommendations": [
{
"target": "ollama-cloud/nemotron-3-super",
"reason": "+21% quality for gap analysis and recommendations",
"priority": "critical"
}
]
"capabilities": ["gap_analysis", "capability_mapping", "recommendation_generation", "coverage_analysis"]
},
"history": [],
"history": [
{
"date": "2026-04-05T22:30:00Z",
"commit": "auto",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "qwen/qwen3.6-plus:free",
"reason": "+23% quality, IF:90 score, 1M context, FREE via OpenRouter",
"source": "research"
}
],
"performance_log": []
},
"orchestrator": {
@@ -367,15 +376,17 @@
},
"evaluator": {
"current": {
"model": "ollama-cloud/nemotron-3-super",
"provider": "Ollama",
"model": "qwen/qwen3.6-plus:free",
"provider": "OpenRouter",
"category": "Process",
"mode": "subagent",
"color": "#F97316",
"description": "Scores agent effectiveness after task completion",
"benchmark": {
"swe_bench": 60.5,
"fit_score": 82
"swe_bench": 78.8,
"fit_score": 90,
"context": "1M",
"free": true
},
"capabilities": ["performance_scoring", "process_analysis", "pattern_identification", "improvement_recommendations"]
},
@@ -388,6 +399,15 @@
"to": "ollama-cloud/nemotron-3-super",
"reason": "Nemotron 3 Super better for evaluation tasks",
"source": "git"
},
{
"date": "2026-04-05T22:30:00Z",
"commit": "auto",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "qwen/qwen3.6-plus:free",
"reason": "+4% quality, IF:90 for scoring accuracy, FREE",
"source": "research"
}
],
"performance_log": []
@@ -516,26 +536,31 @@
},
"agent-architect": {
"current": {
"model": "ollama-cloud/gpt-oss:120b",
"provider": "Ollama",
"model": "qwen/qwen3.6-plus:free",
"provider": "OpenRouter",
"category": "Meta",
"mode": "subagent",
"color": "#A855F7",
"description": "Creates new agents when gaps identified",
"benchmark": {
"swe_bench": 62.4,
"fit_score": 69
"swe_bench": 78.8,
"fit_score": 90,
"context": "1M",
"free": true
},
"capabilities": ["agent_design", "prompt_engineering", "capability_definition"],
"recommendations": [
{
"target": "ollama-cloud/nemotron-3-super",
"reason": "+19% quality for agent design",
"priority": "high"
}
]
"capabilities": ["agent_design", "prompt_engineering", "capability_definition"]
},
"history": [],
"history": [
{
"date": "2026-04-05T22:30:00Z",
"commit": "auto",
"type": "model_change",
"from": "ollama-cloud/nemotron-3-super",
"to": "qwen/qwen3.6-plus:free",
"reason": "+22% quality, IF:90 for YAML frontmatter generation, 1M context for all agents analysis",
"source": "research"
}
],
"performance_log": []
},
"planner": {
@@ -701,11 +726,11 @@
]
}
},
"evolution_metrics": {
"evolution_metrics": {
"total_agents": 32,
"agents_with_history": 12,
"pending_recommendations": 6,
"last_sync": "2026-04-05T17:27:00Z",
"sync_sources": ["git", "capability-index.yaml", "kilo.jsonc"]
"agents_with_history": 16,
"pending_recommendations": 0,
"last_sync": "2026-04-05T22:30:00Z",
"sync_sources": ["git", "capability-index.yaml", "kilo.jsonc", "research"]
}
}