Files
APAW/.kilo/commands/evolution.md
¨NW¨ a4e09ad5d5 feat: upgrade agent models based on research findings
- capability-analyst: nemotron-3-super → qwen3.6-plus:free (+23% quality, IF:90, FREE)
- requirement-refiner: nemotron-3-super → glm-5 (+33% quality)
- agent-architect: nemotron-3-super → qwen3.6-plus:free (+22% quality)
- evaluator: nemotron-3-super → qwen3.6-plus:free (+4% quality)
- Add /evolution workflow for tracking agent improvements
- Update agent-versions.json with evolution history
2026-04-05 23:37:23 +01:00

5.1 KiB

Agent Evolution Workflow

Tracks and records agent model improvements, capability changes, and performance metrics.

Usage

/evolution [action] [agent]

Actions

Action Description
log Log an agent improvement to Gitea and evolution data
report Generate evolution report for agent or all agents
history Show model change history
metrics Display performance metrics
recommend Get model recommendations

Examples

# Log improvement
/evolution log capability-analyst "Updated to qwen3.6-plus for better IF score"

# Generate report
/evolution report capability-analyst

# Show all changes
/evolution history

# Get recommendations
/evolution recommend

Workflow Steps

Step 1: Parse Command

action=$1
agent=$2
message=$3

Step 2: Execute Action

Log Action

When logging an improvement:

  1. Read current model

    # From .kilo/agents/{agent}.md
    current_model=$(grep "^model:" .kilo/agents/${agent}.md | cut -d' ' -f2)
    
    # From .kilo/capability-index.yaml
    yaml_model=$(grep -A1 "${agent}:" .kilo/capability-index.yaml | grep "model:" | cut -d' ' -f2)
    
  2. Get previous model from history

    # Read from agent-evolution/data/agent-versions.json
    previous_model=$(cat agent-evolution/data/agent-versions.json | ...)
    
  3. Calculate improvement

    • Look up model scores from capability-index.yaml
    • Compare IF scores
    • Compare context windows
  4. Write to evolution data

    {
      "agent": "capability-analyst",
      "timestamp": "2026-04-05T22:20:00Z",
      "type": "model_change",
      "from": "ollama-cloud/nemotron-3-super",
      "to": "qwen/qwen3.6-plus:free",
      "improvement": {
        "quality": "+23%",
        "context_window": "130K→1M",
        "if_score": "85→90"
      },
      "rationale": "Better structured output, FREE via OpenRouter"
    }
    
  5. Post Gitea comment

    ## 🚀 Agent Evolution: {agent}
    
    | Metric | Before | After | Change |
    |--------|--------|-------|--------|
    | Model | {old} | {new} | ⬆️ |
    | IF Score | 85 | 90 | +5 |
    | Quality | 64 | 79 | +23% |
    | Context | 130K | 1M | +670K |
    
    **Rationale**: {message}
    

Report Action

Generate comprehensive report:

# Agent Evolution Report

## Overview

- Total agents: 28
- Model changes this month: 4
- Average quality improvement: +18%

## Recent Changes

| Date | Agent | Old Model | New Model | Impact |
|------|-------|-----------|-----------|--------|
| 2026-04-05 | capability-analyst | nemotron-3-super | qwen3.6-plus | +23% |
| 2026-04-05 | requirement-refiner | nemotron-3-super | glm-5 | +33% |
| ... | ... | ... | ... | ... |

## Performance Metrics

### Agent Scores Over Time

capability-analyst: 64 → 79 (+23%) requirement-refiner: 60 → 80 (+33%) agent-architect: 67 → 82 (+22%) evaluator: 78 → 81 (+4%)


### Model Distribution

- qwen3.6-plus: 5 agents
- nemotron-3-super: 8 agents
- glm-5: 3 agents
- minimax-m2.5: 1 agent
- ...

## Recommendations

1. Consider updating history-miner to nemotron-3-super-120b
2. code-skeptic optimal with minimax-m2.5
3. ...

Step 3: Update Files

After logging:

  1. Update agent-evolution/data/agent-versions.json
  2. Post comment to related Gitea issue
  3. Update capability-index.yaml metrics

Data Storage

agent-versions.json

{
  "version": "1.0",
  "agents": {
    "capability-analyst": {
      "current": {
        "model": "qwen/qwen3.6-plus:free",
        "provider": "openrouter",
        "if_score": 90,
        "quality_score": 79,
        "context_window": "1M"
      },
      "history": [
        {
          "date": "2026-04-05T22:20:00Z",
          "type": "model_change",
          "from": "ollama-cloud/nemotron-3-super",
          "to": "qwen/qwen3.6-plus:free",
          "rationale": "Better IF score, FREE via OpenRouter"
        }
      ]
    }
  }
}

Gitea Issue Comments

Each evolution log posts a formatted comment:

## 🚀 Agent Evolution Log

### {agent}
- **Model**: {old} → {new}
- **Quality**: {old_score} → {new_score} ({change}%)
- **Context**: {old_ctx} → {new_ctx}
- **Rationale**: {reason}

_This change was tracked by /evolution workflow._

Integration Points

  • After /pipeline: Evaluator scores logged
  • After model update: Evolution logged
  • Weekly: Performance report generated
  • On request: Recommendations provided

Metrics Tracked

Metric Source Purpose
IF Score KILO_SPEC.md Instruction Following
Quality Score Research Overall performance
Context Window Model spec Max tokens
Provider Config API endpoint
Cost Pricing Resource planning
SWE-bench Research Code benchmark
RULER Research Long-context benchmark

Example Session

$ /evolution log capability-analyst "Updated to qwen3.6-plus for FREE tier and better IF"

✅ Logged evolution for capability-analyst
📊 Quality improvement: +23%
📄 Posted comment to Issue #27
📝 Updated agent-versions.json

Evolution workflow v1.0 - Track agent improvements