Deploy Bot
530c1c1384
fix(commands): replace all non-Ollama providers with ollama-cloud/\n\nRemoved: openrouter, openai (unreliable/foreign)\nReplaced 6 commands:\n /status → ollama-cloud/qwen3.5-122b\n /ask → ollama-cloud/qwen3.5-122b\n /hotfix → ollama-cloud/deepseek-v4-pro-max\n /review → ollama-cloud/kimi-k2.6\n /code → ollama-cloud/deepseek-v4-pro-max (prev commit)\n /plan → ollama-cloud/deepseek-v4-pro-max (prev commit)\nAll models now served from ollama-cloud/ exclusively.
2026-05-27 14:02:10 +01:00
Deploy Bot
b075189b83
chore(dashboard): rebuild standalone after model migration\n\n- All 18 recommendations applied → pending: 0\n- File size: 246.1 KB
2026-05-27 13:47:33 +01:00
Deploy Bot
7635cb62cd
fix(dashboard): heatmap cell click + 5th tab + model sync fixes\n\n- restore hmModal with 4 legacy tabs + new Performance Graph tab\n- fix event.target in research-dashboard.template.html switchTab\n- fix showCellDetail event.stopPropagation for modal persistence\n- update agent models + sync KILO_SPEC.md and AGENTS.md
2026-05-27 13:46:55 +01:00
Deploy Bot
36455ccf24
feat:apply model recommendations - 18 agents migrated to kimi-k2.6\n\nSources from agent-evolution/data/evolution.json\nAgents: architect-indexer, backend-developer, browser-automation,\n code-skeptic, evaluator, flutter-developer, frontend-developer,\n history-miner, lead-developer, markdown-validator, php-developer,\n product-owner, prompt-optimizer, python-developer,\n requirement-refiner, sdet-engineer, visual-tester,\n workflow-architect\nAlso synced 4 agents via sync-agents.cjs
2026-05-27 13:38:49 +01:00
Deploy Bot
95e0866b46
fix(dashboard): remove all event.target dependencies
...
- switchTab(tabId, el): uses el or document.querySelector fallback
- switchHmTab(tabName, btn): uses btn or querySelector fallback
- All 6 tab buttons + 4 heatmap modal tabs pass 'this' as parameter
- Rebuilt index.standalone.html (261.6 KB)
- Verified: grep event.target returns 0 occurrences
2026-05-26 13:22:40 +01:00
Deploy Bot
c212a0a34e
fix(build): remove broken heatmap string replacement
...
- build-standalone-fixed.cjs: removed renderHeatmap() replacement block
- The replacement used string concatenation with '\'' which broke
single quotes in generated HTML, causing SyntaxError: unexpected token
- Original renderHeatmap() in index.html uses template literals (`...`)
which are safe and already contain showCellDetail onclick handler
- Rebuilt index.standalone.html from fixed source
- Zero console errors, zero JS syntax errors verified on port 3003
2026-05-25 22:31:32 +01:00
Deploy Bot
7f1269a370
fix(dashboard): 3 UI bugs + new DB watch tool
...
1. filterCategory: fix inline event.target → uses btn parameter
- All Agents tab filter buttons now correctly toggle active class
2. exportRecommendations/showApplyModal: read from agentData, not removed INLINE_RECOMMENDATIONS
- Apply modal shows real recommendations
- Export generates JSON with real data
3. Heatmap cell click: add showCellDetail modal with Chart.js line chart + prompt history
- onclick='showCellDetail(model, agent)' on every td
- renderCellChart computes score history from agent.history
- prompt_change items filtered and displayed
4. watch-db.cjs: incremental DB sync tool
- Polls git for changes in .kilo/agents/*.md and kilo-meta.json
- Detects model_change vs prompt_change by comparing with previous version
- Exports to JSON after sync, logs to .kilo/logs/watch-db.log
- SIGINT/SIGTERM graceful shutdown
- Trigger: npm run evolution:watch
2026-05-25 21:50:55 +01:00
Deploy Bot
a0604afaf6
chore: archive generated files and clean up runtime outputs
...
- index.standalone.html → agent-evolution/archive/index.standalone-2026-05-25.html (generated build output)
- tests/visual/dashboard-tabs/current/*.png → tests/visual/archive/dashboard-tabs-current-2026-05-25/ (runtime capture output)
- Cleaned empty tests/visual/dashboard-tabs/current/ directory
2026-05-25 21:23:47 +01:00
Deploy Bot
3cca6814f6
test(dashboard): add visual regression baselines for all 6 tabs
...
- Captured via Playwright in Docker container
- Viewport: desktop 1280x720
- Tabs: overview, all_agents, timeline, recommendations, heatmap, impact
- Zero console errors, zero network errors during capture
2026-05-25 21:16:49 +01:00
Deploy Bot
a37bbee9e0
test(dashboard): add SPA screenshot and console error monitoring scripts
...
- capture-dashboard-tabs.cjs: Playwright script to capture all 6 dashboard tabs
- console-error-dashboard.cjs: Console + network error monitor with tab switching
- both scripts run via docker/docker-compose.web-testing.yml Playwright container
- zero console errors and zero network errors verified across all tabs
2026-05-25 21:15:49 +01:00
Deploy Bot
bac09bee02
feat(dashboard): add SPA tab screenshot capture for visual testing
2026-05-25 21:12:29 +01:00
Deploy Bot
9b0f160587
feat(dashboard): unified data pipeline, verified benchmarks, and browser testing
...
- build-standalone-fixed.cjs: reads from 4 real sources (agents md, kilo-meta.json, model-benchmarks-verified.json, agent-versions.json); computes recommendations dynamically
- build-standalone-direct.cjs: direct data export + HTML embed pipeline
- dashboard-smoke-test.ts: Playwright E2E smoke test covering all 6 tabs
- model-benchmarks-verified.json: verified IF scores from artificialanalysis.ai for 15 models (SWE-bench unverifiable → null)
- agent-versions.json: 347 git history entries extracted for 34 agents
- kilo-meta.json: prompt-optimizer → qwen3.5-122b, memory-manager → deepseek-v4-pro-max
- index.html: Recommendations tab rendering updated for dynamic data
- Dockerfile + docker-compose.yml: mount-driven build, no image rebuild for data changes
- README.md: updated dashboard docs and verified benchmark sources
2026-05-25 21:05:14 +01:00
Deploy Bot
f9bed0f262
fix(dashboard): correct computeAgentScore formula and inline benchmark data
...
- SWE=null no longer zeroes score; weight IF at 0.85 for reasoning-only models
- Inline MODEL_BENCHMARKS const (sync script doesn't populate benchmarks)
- Hash fallback tightened from 50-85 to 55-80
- History-miner now shows +10 improvement (82 vs 72) instead of false regression
2026-05-25 16:31:15 +01:00
Deploy Bot
699456b49e
feat(dashboard): replace raw Canvas with Chart.js for all Impact tab charts
...
- Add Chart.js 4.4.7 via CDN + datalabels plugin
- Agent Score: horizontal bar chart, sorted descending, color-coded
- Model Distribution: doughnut with right-side legend + percentages
- Migration Impact: grouped before/after bars with tooltip showing delta
- Dark theme defaults: #8ba3c0 text, #1e2d45 grid
- Chart instances destroyed before re-render to prevent memory leaks
- Responsive: maintainAspectRatio: false
2026-05-25 15:45:14 +01:00
Deploy Bot
19be5cf229
fix(dashboard): rewrite Impact tab charts to work with actual data structure
...
Replaced broken chart functions that expected non-existent fit_score_after/before
with data-agnostic implementations using model names + benchmark lookup.
- Agent Score Bar Chart: horizontal bars per agent, sorted descending, color-coded
- Model Distribution: donut chart with legend on the right
- Migration Impact Bars: before/after comparison from history entries
- Added getModelScore() helper with deterministic fallback
- Added 'Sync Evolution Data' button if data missing
Fixes: canvas dimensions, getBoundingClientRect() == 0 when tab hidden
2026-05-25 15:18:35 +01:00
Deploy Bot
047a87afb4
feat(agent-models): apply MEDIUM+LOW priority model migrations
...
- markdown-validator: deepseek-v4-pro-max → nemotron-3-nano (90% cost cut)
- release-manager: glm-5.1 → kimi-k2.6 (+2 matrix, 1M context for diffs)
- capability-analyst: glm-5.1 → deepseek-v4-pro-max (+4 matrix, 1M ctx)
- browser-automation: qwen3-coder → deepseek-v4-flash (3× faster inference)
- history-miner: nemotron-3-super → qwen3.5-122b (+14 IF, 12.4M pulls)
2026-05-25 15:07:17 +01:00
Deploy Bot
4a0c78e5c9
feat(agent-models): apply CRITICAL+HIGH model migrations from research
...
Migrations based on model-research-2026-05-24:
- prompt-optimizer: qwen3.6-plus → qwen3.5-122b (CRITICAL, IF=92)
- memory-manager: qwen3.6-plus → deepseek-v4-pro-max (CRITICAL, 1M ctx)
- system-analyst: glm-5.1 → deepseek-v4-pro-max (HIGH, matrix +6)
- evaluator: glm-5.1 → qwen3.5-122b (HIGH, IF=92)
- pipeline-judge: glm-5.1 → kimi-k2.6 (HIGH, matrix +8, 1M ctx)
- workflow-architect: glm-5.1 → qwen3.5-122b (HIGH, IF=92)
7 files changed, 12 insertions(+), 12 deletions(-)
Closes: model-research data gaps for idle models
2026-05-25 14:36:31 +01:00
Deploy Bot
81b130471d
fix(tool-use): add question tool schema with mandatory description field
2026-05-25 14:31:54 +01:00
Deploy Bot
e6e8e9cb2a
feat(workflow-cross-checker): add pre-flight inter-agent validation agent with gate protocol
...
- Create .kilo/agents/workflow-cross-checker.md as a process inspector
- Requires bash: ask, task: deny (subagent security compliant)
- Defines Role Boundaries clarifying it does NOT replace code-skeptic, planner, or capability-analyst
- Adds 7-question Uncomfortable Questions Protocol for architecture and conflict validation
- Adds Error Handling table (Gitea API failure, corrupted checkpoint, unreadable logs)
- Inserts Cross-Check Verification (Gate #1/#2/#3) into orchestrator state machine
- Registers agent in kilo-meta.json, kilo.jsonc, capability-index.yaml, AGENTS.md, KILO_SPEC.md
- Model: ollama-cloud/kimi-k2.6 (higher IF 91, better instruction following for structured verdicts)
2026-05-24 00:11:25 +01:00
Deploy Bot
bb043cb23d
feat(landing): add APAW marketing landing page with dark/light theme toggle
...
- Responsive HTML/CSS landing with full project presentation
- 30+ agent matrix table, pipeline phases, evolution section
- Domain skills showcase with Docker-native approach
- Pricing tiers: Developer 35€/mo, Team 200€/mo
- Dark/light theme toggle with system preference detection
- Theme persisted in localStorage, smooth CSS transitions
- Docker container running on port 3002 via nginx:alpine
- Cross-browser compatible, no horizontal scroll, mobile nav
2026-05-23 22:48:19 +01:00
Kilo Orchestrator
ded8e3022d
feat(parallel-coordination): evolution — Gitea comment-based task claiming for parallel agent execution
...
New rule:
- parallel-coordination.md — claim protocol, overlap check, claim release, deadlock prevention
Updated:
- orchestrator.md — Overlap Verification MANDATORY before parallel spawn
- capability-index.yaml — implementation_phase parallel group with claim_protocol
- gns-agent-protocol.md — task_claim and task_claim_release event types
- EVOLUTION_LOG.md — evolution entry #6
Fixes: parallel agents writing to same files, migration collisions, worktree merge conflicts.
No new agent, no new Docker service (per TCA rule).
2026-05-18 16:13:33 +01:00
Kilo Orchestrator
46d6752890
feat(context-window): evolution — Gitea-centric checkpoint pruning + agent context hygiene
...
New rules:
- context-window-budget.md — budget per task size, what to load/offload, recovery protocol
- gns-checkpoint-pruning.md — minimal checkpoint v2 schema, agent entry/exit protocols
Updated:
- orchestrator.md — Context Budget Governance section (prune if consumed > 80%)
- gns-agent-protocol.md — checkpoint schema trimmed (history → history_tail), added current_task + agent_chain
- EVOLUTION_LOG.md — logged evolution entry #5
Fixes: context window overflow, agents loading 15,000+ tokens of irrelevant comments,
state held in RAM instead of offloaded to Gitea.
2026-05-18 15:54:15 +01:00
Kilo Orchestrator
4e9ea678bd
feat(orchestrator): evolution — capability-first routing, parallelization, zero-work policy
...
- orchestrator.md: add Capability-First Routing Protocol (5-step anti-regression)
- orchestrator.md: add Testing Task Routing Matrix (browser-automation, visual-tester)
- orchestrator.md: add Parallelization Protocol (review_phase + testing_phase parallel groups)
- orchestrator.md: add Orchestrator Self-Delegation Prohibition (ZERO WORK POLICY)
- capability-index.yaml: enrich parallel_groups with trigger/criteria/aggregator
- capability-index.yaml: enrich iteration_loops with trigger_on fields
- global.md: add Orchestrator Capability-First Check under Tooling Infrastructure
- docker.md: add Host Installation Prohibition (STOP/READ/DELEGATE/REPORT)
- EVOLUTION_LOG.md: log both evolution entries (2026-05-16T13:00 and 13:06)
Addresses: orchestrator host tool install regression, serial execution waste,
orchestrator self-work bypass of specialized agents.
2026-05-16 13:10:06 +01:00
Deploy Bot
60b14d33d0
fix(installer): install Kilo extension for root + all regular users, remove broken --user-data-dir
2026-05-16 12:13:35 +01:00
Deploy Bot
d796da6ab4
fix(installer): add bun to PATH persistently, suppress debconf dialogs, fix root vscode flags
2026-05-16 11:59:25 +01:00
Deploy Bot
e45cac8709
fix(installer): add --no-sandbox for root VS Code extension install + .work/ in .gitignore
2026-05-16 11:52:51 +01:00
Deploy Bot
879e0e5b7e
feat: add one-command Linux installer with VS Code + Kilo extension + APAW setup
2026-05-16 11:48:39 +01:00
NW
a6516f8595
feat: restore universal blog, booking, ecommerce skills with framework-agnostic schema and API patterns
2026-05-13 18:12:14 +01:00
NW
f65bbf9420
feat: add visual quality rules to frontend-developer agent + new screenshot page
2026-05-13 16:54:29 +01:00
NW
2287122f91
fix(agents): add Tool-First Enforcement to agent definitions and global rules
2026-05-13 09:37:40 +01:00
NW
4c9a95661f
evolution: remove obsolete :cloud suffix from kimi-k2.6 model id across all configs
2026-05-13 09:27:48 +01:00
NW
c031c4b9e5
feat(evolution): add incident-responder agent for server incident response and forensics
2026-05-09 13:31:20 +01:00
NW
8788261d4f
rules: add Task Critical Assessment (TCA) to prevent waste
...
Add task-critical-assessment.md with 5 criteria to evaluate tasks BEFORE execution:
1. Abstraction over local API → reject (MCP lesson)
2. Layer without proven need → reject (hybrid fallback lesson)
3. Environment more complex than task → reject (Docker overlay lesson)
4. No acceptance criteria → require clarification
5. Previously rolled back work → require justification
Link from global.md so every agent runs TCA before starting work.
Prevents repeating the MCP incident: 6 commits, 1700+ lines, 2 days → full revert.
2026-05-09 01:57:50 +01:00
NW
67e8d2e41a
revert: remove MCP Gitea integration, restore direct REST client
...
Remove all MCP-related infrastructure in favor of direct REST API calls.
MCP added layers without value: Docker container, stdio bridge, hybrid fallback,
healthchecks, SSE transport — all of which added failure modes and token overhead.
Deleted:
- docker/mcp-gitea/docker-compose.yml (MCP container config)
- scripts/mcp-gitea-stdio.cjs (stdio bridge)
- scripts/e2e-mcp-stdio-test*.py (MCP E2E tests)
- scripts/test-kilo-mcp-integration.py
- src/kilocode/agent-manager/mcp-gitea-client.ts (548 lines of MCP wrapper)
- MCP-STDIO-SETUP.md (MCP documentation)
- .vscode/settings.json (hardcoded MCP config with token)
- .kilo/skills/mcp-gitea-connection/ and mcp-gitea.research.md
Restored:
- pipeline-runner.ts: HybridGiteaClient → GiteaClient (direct REST)
Removed MCP dependency, imports, and initialization.
No healthcheck waits, no container startup delays.
- process-continuity.md: removed MCP-specific failure modes
- e2e-gns2-test.py: removed Basic Auth, use token auth; fixed spec reference
2026-05-09 01:55:52 +01:00
NW
0f522e61c3
fix(gns-2): replace Basic Auth password with Bearer PAT for MCP
2026-05-09 01:28:40 +01:00
NW
81e4708b5f
docs(gns-2): MCP stdio transport setup instructions
2026-05-09 00:33:21 +01:00
NW
af08e74f72
feat(gns-2): stdio MCP transport with hybrid fallback
2026-05-09 00:28:57 +01:00
NW
106a0291a4
feat(gns2): E2E integration test script for issue #110
...
- Scripts: e2e-gns2-test.py simulates full pipeline through Gitea API
- Supports scoped label replacement (status, budget, cascade)
- Generates GNS_EVENT footers in comments
- Validates checkpoint, labels, timeline, budget, depth
- Uses actual existing labels (status::done, not status::completed)
Refs: Milestone #67 , Issue #110
2026-05-08 22:49:02 +01:00
NW
f5966db155
feat(gns2): integrate HybridGiteaClient into PollingSupervisor
...
- PollingSupervisor now uses HybridGiteaClient (MCP primary, REST fallback)
- Added mcpUrl to PipelineConfig
- Supervisor calls initialize() to detect MCP vs REST mode automatically
Refs: Milestone #67 , Issue #107
2026-05-08 22:35:21 +01:00
NW
06fb0421ef
fix(process-continuity): operator-free design for MCP Docker integration
...
- Resolve service_healthy deadlock by using service_started instead
- Fix 172.28.0.0/16 network collision by removing ipam config
- Add HybridGiteaClient (mcp → rest → bash fallback)
- Create .kilo/rules/process-continuity.md with 5 operator-free principles:
1. No service_healthy conditions
2. No hardcoded networks
3. Automatic fallback chains
4. Pre-flight validation
5. Self-documenting failures
- Update docker-compose.yml with resilient config:
- start_period: 60s, retries: 5, restart: on-failure:3
- /tools healthcheck (guaranteed endpoint)
- tmpfs for Node.js /tmp
- Resource limits: 256M RAM, 0.5 CPU
- MCP/REST integration test passed (issue #109 )
Refs: Milestone #67 , Issues #107 , #109
2026-05-08 22:31:59 +01:00
NW
3cc6ee2ffe
feat(gns2): Phase 8 MCP Docker containers for Gitea direct integration
...
- docker/mcp-gitea/docker-compose.yml — MCP server container (Sqcoows/forgejo-mcp)
- .kilo/skills/mcp-gitea-connection/SKILL.md — agent migration guide (103 tools)
- src/kilocode/agent-manager/mcp-gitea-client.ts — MCP native client with fallback
- Hybrid mode: MCP primary, REST API fallback if container unavailable
- All 29 Tier 0/1 agents mass-updated with GNS-2 protocol (checkpoint read, event footer)
- Security: no bash for Gitea ops, MCP handles credentials internally
Refs: Milestone #67 , Issue #107
2026-05-08 22:16:52 +01:00
NW
bd154f24d0
feat(gns2): mass-update all 30 agents with GNS-2 protocol
...
- 29 agents updated with GNS-2 checkpoint/event protocol
- 12 Tier 0 (leaf) agents: read checkpoint, write event footer, no cascade
- 17 Tier 1 (task) agents: read checkpoint, recommend next agent, no direct task calls
- 2 Tier 2 (meta) agents already updated: capability-analyst, agent-architect, evaluator
- All agents now include GNS_EVENT footer template in comments
- Frontmatter updated with '(GNS-2 Tier N)' classification
Scripts added:
- scripts/mass-update-gns-agents.py — idempotent mass updater
- scripts/validate-gns-agents.py — protocol checker
Refs: Milestone #67 , Issues #99-#107
2026-05-08 22:03:08 +01:00
NW
47b027a02f
feat(gns2): Gitea-Nervous-System v2.0 - distributed agent state machine
...
- Add GNS-2 label taxonomy (66 labels) with semantic routing
- Tier 2 agents (capability-analyst, agent-architect, evaluator) enabled for self-cascade
- GNS agent protocol: checkpoint v2 in issue body, machine-readable event footers
- GiteaClient extended: checkpoint CRUD, event parsing, assignee/lock control, triggered issue polling
- PipelineRunner rewritten as PollingSupervisor: reactive instead of active dispatch
- Security: circuit breakers (is_locked), budget governance, depth limits
- Scripts: init-gns-labels.py, validate-gns-agents.py
- Milestone #67 + 7 phase issues (#99-#105) tracking evolution
Refs: Milestone #67 , Issues #99-#105
2026-05-08 21:25:38 +01:00
NW
f01e2064fb
feat(evolution): Kilo Code release sync & APAW system hardening (v2026-05-07)
...
Security & Permissions:
- All 30 agents: task[*]=deny, task[subagent]=deny (cascade prevention)
- orchestrator & release-manager: bash=ask (hardening)
- New .kilo/rules/subagent-security.md with audit rules
- Updated .kilo/rules/global.md with Security & Permissions section
- Updated .kilo/agents/orchestrator.md with Security Enforcement block
Session Management:
- New .kilo/rules/session-persistence.md (checkpoint format, worktree isolation)
- Updated .kilo/rules/branch-strategy.md (worktree per agent)
- pipeline-runner.ts: Checkpoint interface + save/load/resume methods
Plan Persistence:
- Updated .kilo/rules/lead-developer.md (plan handover section)
Per-Agent Reasoning:
- capability-index.yaml: reasoning_effort for all 30 agents (xhigh/high/medium/low)
MCP Cleanup:
- New .kilo/skills/docker-security/SKILL.md (--rm, orphaned process cleanup)
Config Validation:
- Updated .kilo/rules/docker.md (startup checks, commit scoping, location awareness)
Docs:
- README.md: v2026-05-07 evolution badges
- .kilo/EVOLUTION_LOG.md: Entry #6 with full metrics
- .gitignore: ignore dist/ + bun.lock
Gitea: Milestone #66 , Issues #91-#98
Architect: 9/9 sections fresh (express project type)
2026-05-08 18:54:08 +01:00
NW
74ad7c4b6e
docs(branch-strategy): default branch is dev, not main
...
- Update branch strategy: dev is primary development branch
- main is stable release only
- Add release process: dev → PR → review → main → tag
- Sync .kilo/ to target projects after release
2026-05-07 07:39:00 +01:00
NW
994ca58821
fix(agents): add missing permissions + complete kilo-meta.json
...
- Fix 12 agents missing edit/write/bash permissions
- Add 5 missing agents to kilo-meta.json (architect-indexer, flutter-developer, php-developer, pipeline-judge, python-developer)
- Remove BOM from kilo.jsonc
- All 32 agents now consistent between files and meta
2026-05-07 07:22:32 +01:00
NW
defe57d53a
feat: merge infrastructure skills and workflows from TenerifeProp
...
Add MCP-based infrastructure skills:
- mcp-integration: Playwright + GitMCP
- e2e-testing: Cypress + AntV + Slack
- search-integration: Brave + Tavily + Markitdown
- security-scanner: CVE Search + MCP Validator
- knowledge-base: Docfork + Wikipedia + ArXiv
- prompt-manager: version control + DevTrends
- api-catalog: MCP server registry
- agent-architect-mcp: patterns + OpenAPI converter
Add workflow commands:
- feature.md: full feature pipeline
- hotfix.md: urgent bug fix workflow
Add rules:
- orchestrator-self-evolution.md
- sdet-engineer.md
Add audit:
- WORKFLOW_AUDIT.md
Source: UniqueSoft/TenerifeProp
2026-05-06 23:04:14 +01:00
¨NW¨
80dca09ae0
fix: unquoted color, duplicate key, GLM downgrade + cross-platform validator
...
- Fix security-auditor.md color bare hex to quoted
- Fix orchestrator.md duplicate devops-engineer key
- Fix .kilo/kilo.jsonc: orchestrator + root model to kimi-k2.6:cloud
- Update agent-frontmatter-validation.md with diagnostic guide
- Update global.md with YAML frontmatter rules for all agents
- Update agent-architect.md + workflow-architect.md with color checklist
- Add scripts/validate-agents.cjs: zero-dependency, cross-platform, --fix flag, scans worktrees
2026-05-04 22:01:45 +01:00
¨NW¨
fb552e0020
feat: v3 optimal model assignments + fitness gate
...
- Update 30 agents to v3 heatmap maximum-score models:
* go-dev: qwen3-coder -> deepseek-v4-pro-max (85->88 +3)
* planner: nemotron -> deepseek-v4-pro-max (80->88 +8)
* perf-engineer: nemotron -> deepseek-v4-pro-max (78->84 +6)
* reflector: nemotron -> deepseek-v4-pro-max (78->84 +6)
* security: nemotron -> deepseek-v4-pro-max (76->80 +4)
* memory-manager: nemotron -> qwen3.6-plus (86->87 +1)
* frontend: kimi-k2.5 -> minimax-m2.5 (92)
* the-fixer: minimax-m2.5 -> kimi-k2.6 (88->90 +2)
* browser-auto: kimi-k2.6 -> qwen3-coder (86->87 +1)
* prompt-opt: glm-5.1 -> qwen3.6-plus (82->83 +1)
* backend: deepseek-v3.2 -> qwen3-coder (91)
* capability-analyst: nemotron -> glm-5.1 (85)
* release-man: devstral-2 -> glm-5.1 (82)
* evaluator: nemotron -> glm-5.1 (86)
* workflow-arch: gpt-oss -> glm-5.1 (84)
- Add Model Evolution Guard:
* fitness-gate.cjs: rejects downgrades >3 points or <75 score
* Normalized model ID lookup (: vs -)
* Diff report before any file modifications
- Update sync-benchmarks-from-yaml.cjs with fitness gate
- Sync kilo-meta.json, kilo.jsonc, .md agent files
- Rebuild research-dashboard.html (104KB, 30 agents, 11 models)
Total improvement: +105 points across 11 agents
Source: v3.html heatmap IF-adjusted composite scores
2026-04-30 08:42:10 +01:00
¨NW¨
9e48a4960e
fix: restore optimal v3 models + add fitness gate protection
...
- Restore all 30 agents to v3.html heatmap optimal models:
* frontend-developer: qwen3-coder -> minimax-m2.5 (92★)
* devops-engineer: nemotron-3-super -> kimi-k2.6:cloud (88★)
* browser-automation: qwen3-coder -> kimi-k2.6:cloud (86★)
* agent-architect: glm-5.1 -> kimi-k2.6:cloud (86★)
- Add Model Evolution Guard system:
* agent-evolution/scripts/lib/fitness-gate.cjs
* Rejects downgrades >3 points or below score 75
* Produces detailed diff report before any file modifications
* Normalized model ID lookup (v3.html ':' vs JSON '-')
- Update sync-benchmarks-from-yaml.cjs with fitness gate
- Update model-benchmarks.json with v3 optimal assignments
- Rebuild research-dashboard.html (104KB, 30 agents, 11 models)
- Add model-evolution-guard.md architecture documentation
- Add v3-optimal-models.json as source-of-truth reference
Fixes regression introduced by commit 3badb25 where models were
silently downgraded from heatmap optimal to inferior assignments.
2026-04-29 23:19:16 +01:00