Files

¨NW¨ a4e09ad5d5 feat: upgrade agent models based on research findings

- capability-analyst: nemotron-3-super → qwen3.6-plus:free (+23% quality, IF:90, FREE)
- requirement-refiner: nemotron-3-super → glm-5 (+33% quality)
- agent-architect: nemotron-3-super → qwen3.6-plus:free (+22% quality)
- evaluator: nemotron-3-super → qwen3.6-plus:free (+4% quality)
- Add /evolution workflow for tracking agent improvements
- Update agent-versions.json with evolution history

2026-04-05 23:37:23 +01:00

5.1 KiB

Raw Blame History

Agent Evolution Workflow

Tracks and records agent model improvements, capability changes, and performance metrics.

Usage

/evolution [action] [agent]

Actions

Action	Description
`log`	Log an agent improvement to Gitea and evolution data
`report`	Generate evolution report for agent or all agents
`history`	Show model change history
`metrics`	Display performance metrics
`recommend`	Get model recommendations

Examples

# Log improvement
/evolution log capability-analyst "Updated to qwen3.6-plus for better IF score"

# Generate report
/evolution report capability-analyst

# Show all changes
/evolution history

# Get recommendations
/evolution recommend

Workflow Steps

Step 1: Parse Command

action=$1
agent=$2
message=$3

Step 2: Execute Action

Log Action

When logging an improvement:

Read current model

# From .kilo/agents/{agent}.md
current_model=$(grep "^model:" .kilo/agents/${agent}.md | cut -d' ' -f2)

# From .kilo/capability-index.yaml
yaml_model=$(grep -A1 "${agent}:" .kilo/capability-index.yaml | grep "model:" | cut -d' ' -f2)

Get previous model from history

# Read from agent-evolution/data/agent-versions.json
previous_model=$(cat agent-evolution/data/agent-versions.json | ...)

Calculate improvement
- Look up model scores from capability-index.yaml
- Compare IF scores
- Compare context windows

Write to evolution data

{
  "agent": "capability-analyst",
  "timestamp": "2026-04-05T22:20:00Z",
  "type": "model_change",
  "from": "ollama-cloud/nemotron-3-super",
  "to": "qwen/qwen3.6-plus:free",
  "improvement": {
    "quality": "+23%",
    "context_window": "130K→1M",
    "if_score": "85→90"
  },
  "rationale": "Better structured output, FREE via OpenRouter"
}

Post Gitea comment

## 🚀 Agent Evolution: {agent}

| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Model | {old} | {new} | ⬆️ |
| IF Score | 85 | 90 | +5 |
| Quality | 64 | 79 | +23% |
| Context | 130K | 1M | +670K |

**Rationale**: {message}

Report Action

Generate comprehensive report:

# Agent Evolution Report

## Overview

- Total agents: 28
- Model changes this month: 4
- Average quality improvement: +18%

## Recent Changes

| Date | Agent | Old Model | New Model | Impact |
|------|-------|-----------|-----------|--------|
| 2026-04-05 | capability-analyst | nemotron-3-super | qwen3.6-plus | +23% |
| 2026-04-05 | requirement-refiner | nemotron-3-super | glm-5 | +33% |
| ... | ... | ... | ... | ... |

## Performance Metrics

### Agent Scores Over Time

capability-analyst: 64 → 79 (+23%) requirement-refiner: 60 → 80 (+33%) agent-architect: 67 → 82 (+22%) evaluator: 78 → 81 (+4%)


### Model Distribution

- qwen3.6-plus: 5 agents
- nemotron-3-super: 8 agents
- glm-5: 3 agents
- minimax-m2.5: 1 agent
- ...

## Recommendations

1. Consider updating history-miner to nemotron-3-super-120b
2. code-skeptic optimal with minimax-m2.5
3. ...

Step 3: Update Files

After logging:

Update agent-evolution/data/agent-versions.json
Post comment to related Gitea issue
Update capability-index.yaml metrics

Data Storage

agent-versions.json

{
  "version": "1.0",
  "agents": {
    "capability-analyst": {
      "current": {
        "model": "qwen/qwen3.6-plus:free",
        "provider": "openrouter",
        "if_score": 90,
        "quality_score": 79,
        "context_window": "1M"
      },
      "history": [
        {
          "date": "2026-04-05T22:20:00Z",
          "type": "model_change",
          "from": "ollama-cloud/nemotron-3-super",
          "to": "qwen/qwen3.6-plus:free",
          "rationale": "Better IF score, FREE via OpenRouter"
        }
      ]
    }
  }
}

Gitea Issue Comments

Each evolution log posts a formatted comment:

## 🚀 Agent Evolution Log

### {agent}
- **Model**: {old} → {new}
- **Quality**: {old_score} → {new_score} ({change}%)
- **Context**: {old_ctx} → {new_ctx}
- **Rationale**: {reason}

_This change was tracked by /evolution workflow._

Integration Points

After /pipeline: Evaluator scores logged
After model update: Evolution logged
Weekly: Performance report generated
On request: Recommendations provided

Metrics Tracked

Metric	Source	Purpose
IF Score	KILO_SPEC.md	Instruction Following
Quality Score	Research	Overall performance
Context Window	Model spec	Max tokens
Provider	Config	API endpoint
Cost	Pricing	Resource planning
SWE-bench	Research	Code benchmark
RULER	Research	Long-context benchmark

Example Session

$ /evolution log capability-analyst "Updated to qwen3.6-plus for FREE tier and better IF"

✅ Logged evolution for capability-analyst
📊 Quality improvement: +23%
📄 Posted comment to Issue #27
📝 Updated agent-versions.json

Evolution workflow v1.0 - Track agent improvements

5.1 KiB Raw Blame History