Files
APAW/agent-evolution/docs/model-evolution-guard.md
¨NW¨ 9e48a4960e fix: restore optimal v3 models + add fitness gate protection
- Restore all 30 agents to v3.html heatmap optimal models:
  * frontend-developer: qwen3-coder -> minimax-m2.5 (92★)
  * devops-engineer: nemotron-3-super -> kimi-k2.6:cloud (88★)
  * browser-automation: qwen3-coder -> kimi-k2.6:cloud (86★)
  * agent-architect: glm-5.1 -> kimi-k2.6:cloud (86★)
- Add Model Evolution Guard system:
  * agent-evolution/scripts/lib/fitness-gate.cjs
  * Rejects downgrades >3 points or below score 75
  * Produces detailed diff report before any file modifications
  * Normalized model ID lookup (v3.html ':' vs JSON '-')
- Update sync-benchmarks-from-yaml.cjs with fitness gate
- Update model-benchmarks.json with v3 optimal assignments
- Rebuild research-dashboard.html (104KB, 30 agents, 11 models)
- Add model-evolution-guard.md architecture documentation
- Add v3-optimal-models.json as source-of-truth reference

Fixes regression introduced by commit 3badb25 where models were
silently downgraded from heatmap optimal to inferior assignments.
2026-04-29 23:19:16 +01:00

7.5 KiB

Model Evolution Guard System

Problem Statement

During the bidirectional sync integration (sync-benchmarks-from-yaml.cjs), the script copied models from capability-index.yaml (which contained suboptimal assignments) into model-benchmarks.json as "current". This silently downgraded multiple agents from their ★-optimal heatmap scores:

Agent Optimal (v3 heatmap) Downgraded To Score Loss
lead-developer qwen3-coder:480b (92★) nemotron-3-super -22
system-analyst glm-5.1 (90★) nemotron-3-super -16
evaluator glm-5.1 nemotron-3-super -16
devops-engineer kimi-k2.6 (88★) nemotron-3-super -10

Root Causes

  1. No single source of truthcapability-index.yaml, kilo-meta.json, agent .md files, and model-benchmarks.json could each claim to be canonical.
  2. No downgrade protectionsync-benchmarks-from-yaml.cjs blindly overwrote scores without checking if the new model was worse than the old.
  3. No fitness gate — changes propagated to all downstream files (dashboard, configs) before any validation.
  4. Bidirectional sync ambiguity — the sync was "YAML → JSON" but looked like "JSON ← YAML", creating confusion about direction.

Architectural Solution: Model Evolution Guard (MEG)

Layer 0: Single Source of Truth

PRIMARY: agent-evolution/data/model-benchmarks.json
  └── source: heatmap_scores from agent_model_scores[]
  └── validated_by: fitness gate (see below)

SECONDARY (derived, read-only for sync):
  ├── .kilo/capability-index.yaml ← receives models FROM benchmarks
  ├── .kilo/agents/*.md ← receive models FROM benchmarks via sync-agents.js
  ├── kilo-meta.json ← receives models FROM benchmarks
  └── kilo.jsonc ← receives models FROM benchmarks

Rule: model-benchmarks.json is the ONLY file that contains heatmap-derived scores. All other configs receive models FROM it, never the reverse.

Layer 1: Fitness Gate (Mandatory)

Every model change must pass the fitness gate. A change is "acceptable" only if:

interface ModelFitnessGate {
  // Agent's current score with existing model
  previous_score: number;
  
  // Agent's score with proposed model  
  proposed_score: number;
  
  // Absolute minimum score for any agent
  min_global_threshold: number;  // e.g. 75
  
  // Maximum regression allowed
  max_regression: number;  // e.g. -3 points
  
  // Is proposed model in agent's top-N from heatmap?
  top_n_required: number;  // e.g. top-3
}

function isChangeAcceptable(gate: ModelFitnessGate): boolean {
  if (gate.proposed_score < gate.min_global_threshold) return false;
  if (gate.proposed_score < gate.previous_score - gate.max_regression) return false;
  return true;
}

Hard rule: If proposed_score < previous_score - 3, the change MUST be rejected with a clear error. No exceptions.

Layer 2: Immutable Recommendations

Recommendations in model-benchmarks.json are append-only. Once a recommendation is generated, it cannot be silently overwritten by a sync — it can only be superseded by a NEW recommendation with a higher timestamp.

{
  "recommendations": [
    {
      "agent": "lead-developer",
      "from_model": "qwen3-coder:480b",
      "to_model": "nemotron-3-super",
      "score_delta": -22,
      "status": "rejected",
      "rejected_at": "2026-04-29T20:00:00Z",
      "rejected_reason": "Downgrade: 92→70 exceeds max regression of 3"
    }
  ]
}

Layer 3: Sync Direction Lock

All sync scripts must declare their direction explicitly:

// ✅ CORRECT: benchmarks → configs
// src: model-benchmarks.json
// dst: capability-index.yaml, agents/*.md, kilo-meta.json
// validates: fitness gate

// ❌ INCORRECT: configs → benchmarks
// This should NEVER happen. Benchmarks come from heatmap analytics only.

Layer 4: Diff Report on Every Sync

Before writing any file, the sync script must produce:

=== Model Sync Diff Report ===
Agent              Old Model              Old Score  New Model              New Score  Status
lead-developer     qwen3-coder:480b       92★        nemotron-3-super       70         ⚠️ REJECTED (regression -22 > max -3)
system-analyst     glm-5.1                90★        nemotron-3-super       74         ⚠️ REJECTED (regression -16 > max -3)

No files are modified until the DIFF is reviewed (or --auto-approve is used for improvements only).

Layer 5: Recovery Checkpoint

Before any sync that touches model assignments, create a git checkpoint:

# In the sync script
git stash push -m "pre-model-sync-$(date +%s)"
git checkout -b auto/model-sync-$(date +%s)

If fitness gate rejects changes, auto-rollback:

git checkout HEAD -- kilo-meta.json .kilo/capability-index.yaml .kilo/agents/

Implementation

1. Fitness Gate Module

// agent-evolution/scripts/lib/fitness-gate.ts
export class ModelFitnessGate {
  constructor(
    private benchmarks: ModelBenchmarks,
    private minThreshold = 75,
    private maxRegression = 3
  ) {}

  validateChange(agent: string, fromModel: string, toModel: string): GateResult {
    const oldScore = this.getAgentModelScore(agent, fromModel);
    const newScore = this.getAgentModelScore(agent, toModel);
    
    if (newScore < this.minThreshold) {
      return { acceptable: false, reason: `Score ${newScore} below threshold ${this.minThreshold}` };
    }
    
    if (newScore < oldScore - this.maxRegression) {
      return { acceptable: false, reason: `Regression ${oldScore}${newScore} exceeds max ${this.maxRegression}` };
    }
    
    return { acceptable: true, delta: newScore - oldScore };
  }
}

2. Sync Wrapper

// agent-evolution/scripts/sync-with-guard.cjs (wraps any sync script)
const { validateAllChanges } = require('./lib/fitness-gate');
const changes = detectChanges();  // what the sync WOULD do
const report = validateAllChanges(changes);

if (report.rejections.length > 0) {
  console.error('❌ FITNESS GATE BLOCKED:');
  report.rejections.forEach(r => console.error(`  ${r.agent}: ${r.reason}`));
  process.exit(1);
}

console.log(`✅ All ${changes.length} changes passed fitness gate`);
applyChanges(changes);

3. Git Checkpoint

# Every sync script must run this first
#!/bin/bash
set -e
STASH_NAME="model-sync-$(date +%s)"
git stash push -m "$STASH_NAME" -- kilo-meta.json .kilo/capability-index.yaml .kilo/agents/

Verification Checklist

After implementing the guard:

  • sync-benchmarks-from-yaml.cjs validates every model change against heatmap scores
  • Downgrades of >3 points are rejected with clear error
  • Diff report is printed before any file is written
  • Git checkpoint is created before sync
  • model-benchmarks.json has source: "heatmap" locked field
  • All sync scripts declare direction: benchmarks → configs only
  • CI pipeline runs fitness gate as pre-commit hook

Integration with Existing Workflow

The guard integrates at the existing /evolution command step 0:

## Step 0: Model Research & Guard
1. Run heatmap analysis → produce raw scores
2. **Fitness Gate** validates all proposed changes
3. If any downgrade >3 points → HALT, report to human
4. If all pass → generate recommendations append-only
5. Sync to configs with direction lock: benchmarks → configs

Bottom line: Never again should a script silently replace a ★-optimal model with one scoring 20+ points lower.