Commit Graph

30 Commits

Author SHA1 Message Date
Deploy Bot
15ff887788 fix(dashboard): prevent column reset on evolve-agent reload
- Remove localStorage wipe before mergeCachedResults() — was deleting
  cached research results when Evolve button triggered load()
- mergeCachedResults(): only fill gaps (existing===undefined||0), never
  override scores from DB — prevents stale cache from shadowing live data
2026-05-28 12:24:40 +01:00
Deploy Bot
56eb5c7eb6 fix(dashboard): new model columns now appear after live research
- updateCell(): auto-create agent entry in reportData.agents if missing
- updateCell(): add model to allAvailableModels for modal checkboxes
- mergeCachedResults(): auto-create agent entry, normalize model names via modelShort()
- mergeCachedResults(): add new models to allAvailableModels for modal picker
- MODEL_BENCHMARKS: add deepseek-v4-pro (was missing, only had deepseek-v4-pro-max)
2026-05-28 12:05:25 +01:00
Deploy Bot
4071551476 feat(scripts): add real-fit evaluation engine and supporting test scripts
- real-fit-engine.py: refactored to support --from-report, improved Ollama v1/chat/completions compatibility, agent name normalization
- run-focused-eval.py: run evaluations for specific agent/model pairs from CLI
- test_ollama_minimal.py/test_real_api.py: Ollama API connectivity tests
- real-fit-architecture.md: architecture overview document
- tests/scripts/: E2E landing test, analytics capture, evolution heatmap verification
- Remove real-fit-recalc.py (superseded by --from-report flag)
2026-05-28 11:57:46 +01:00
Deploy Bot
b95fd41587 feat(evolution): add real-fit dashboard, API, report builder, and docker compose
- real-fit.html: API-driven research dashboard with agent/model heatmap, detail modal with score breakdown and evaluator commentary
- api.py: FastAPI backend serving /api/real-fit-report (dynamic from SQLite), /api/research, /api/evolve-agent/start
- rebuild-report.py: generates real-fit-report.json from SQLite DB for static fallback
- docker-compose.yml: add evolution-api service (Python 3.12, uvicorn) for research endpoints
- index.standalone.html: sync with dashboard data updates
- archive/index.html: standalone dashboard snapshot (263KB)
- .gitignore: exclude *.db, research-jobs.json from tracking
2026-05-28 11:55:49 +01:00
Deploy Bot
dbbf4c32e1 feat(landing): add state API service with real-fit score drill-down
- Add apaw-state-api Flask service (landing/api/server.py) that serves
  agent fit scores, best models, and explanations from real-fit.db
- Add nginx proxy rule: /api/state → apaw-state-api:8080
- Add fit-score drill-down modal (click heatmap cell → score breakdown
  + explanation) in api.js, styles.css, and index.html
- Add real-fit-recalc.py script for offline score recalculation from
  stored SQLite responses
- Add real-fit-engine.py (evaluation engine) and sync-dashboard-data.py
- Add Dockerfile ENTRYPOINT + entrypoint.sh for landing container
- Add docker-compose.ollama.yml for local Ollama inference
- Update kilo.jsonc command models and agent-versions.json
- Regenerate index.standalone.html with latest dashboard data
- Add .gitignore entries for __pycache__, runtime data, and backups
2026-05-27 19:53:40 +01:00
Deploy Bot
954c739dc9 chore(archive): move untracked files + clean working tree\n\nArchived to agent-evolution/archive/:\n - test scripts, specs, data exports\n - dashboard-user-journey.md → .kilo/archive/\n\nClean: all non-ollama models verified (openrouter, openai removed) 2026-05-27 14:04:37 +01:00
Deploy Bot
b075189b83 chore(dashboard): rebuild standalone after model migration\n\n- All 18 recommendations applied → pending: 0\n- File size: 246.1 KB 2026-05-27 13:47:33 +01:00
Deploy Bot
7635cb62cd fix(dashboard): heatmap cell click + 5th tab + model sync fixes\n\n- restore hmModal with 4 legacy tabs + new Performance Graph tab\n- fix event.target in research-dashboard.template.html switchTab\n- fix showCellDetail event.stopPropagation for modal persistence\n- update agent models + sync KILO_SPEC.md and AGENTS.md 2026-05-27 13:46:55 +01:00
Deploy Bot
36455ccf24 feat:apply model recommendations - 18 agents migrated to kimi-k2.6\n\nSources from agent-evolution/data/evolution.json\nAgents: architect-indexer, backend-developer, browser-automation,\n code-skeptic, evaluator, flutter-developer, frontend-developer,\n history-miner, lead-developer, markdown-validator, php-developer,\n product-owner, prompt-optimizer, python-developer,\n requirement-refiner, sdet-engineer, visual-tester,\n workflow-architect\nAlso synced 4 agents via sync-agents.cjs 2026-05-27 13:38:49 +01:00
Deploy Bot
95e0866b46 fix(dashboard): remove all event.target dependencies
- switchTab(tabId, el): uses el or document.querySelector fallback
- switchHmTab(tabName, btn): uses btn or querySelector fallback
- All 6 tab buttons + 4 heatmap modal tabs pass 'this' as parameter
- Rebuilt index.standalone.html (261.6 KB)
- Verified: grep event.target returns 0 occurrences
2026-05-26 13:22:40 +01:00
Deploy Bot
c212a0a34e fix(build): remove broken heatmap string replacement
- build-standalone-fixed.cjs: removed renderHeatmap() replacement block
- The replacement used string concatenation with '\'' which broke
  single quotes in generated HTML, causing SyntaxError: unexpected token
- Original renderHeatmap() in index.html uses template literals (`...`)
  which are safe and already contain showCellDetail onclick handler
- Rebuilt index.standalone.html from fixed source
- Zero console errors, zero JS syntax errors verified on port 3003
2026-05-25 22:31:32 +01:00
Deploy Bot
7f1269a370 fix(dashboard): 3 UI bugs + new DB watch tool
1. filterCategory: fix inline event.target → uses btn parameter
   - All Agents tab filter buttons now correctly toggle active class

2. exportRecommendations/showApplyModal: read from agentData, not removed INLINE_RECOMMENDATIONS
   - Apply modal shows real recommendations
   - Export generates JSON with real data

3. Heatmap cell click: add showCellDetail modal with Chart.js line chart + prompt history
   - onclick='showCellDetail(model, agent)' on every td
   - renderCellChart computes score history from agent.history
   - prompt_change items filtered and displayed

4. watch-db.cjs: incremental DB sync tool
   - Polls git for changes in .kilo/agents/*.md and kilo-meta.json
   - Detects model_change vs prompt_change by comparing with previous version
   - Exports to JSON after sync, logs to .kilo/logs/watch-db.log
   - SIGINT/SIGTERM graceful shutdown
   - Trigger: npm run evolution:watch
2026-05-25 21:50:55 +01:00
Deploy Bot
a0604afaf6 chore: archive generated files and clean up runtime outputs
- index.standalone.html → agent-evolution/archive/index.standalone-2026-05-25.html (generated build output)
- tests/visual/dashboard-tabs/current/*.png → tests/visual/archive/dashboard-tabs-current-2026-05-25/ (runtime capture output)
- Cleaned empty tests/visual/dashboard-tabs/current/ directory
2026-05-25 21:23:47 +01:00
Deploy Bot
9b0f160587 feat(dashboard): unified data pipeline, verified benchmarks, and browser testing
- build-standalone-fixed.cjs: reads from 4 real sources (agents md, kilo-meta.json, model-benchmarks-verified.json, agent-versions.json); computes recommendations dynamically
- build-standalone-direct.cjs: direct data export + HTML embed pipeline
- dashboard-smoke-test.ts: Playwright E2E smoke test covering all 6 tabs
- model-benchmarks-verified.json: verified IF scores from artificialanalysis.ai for 15 models (SWE-bench unverifiable → null)
- agent-versions.json: 347 git history entries extracted for 34 agents
- kilo-meta.json: prompt-optimizer → qwen3.5-122b, memory-manager → deepseek-v4-pro-max
- index.html: Recommendations tab rendering updated for dynamic data
- Dockerfile + docker-compose.yml: mount-driven build, no image rebuild for data changes
- README.md: updated dashboard docs and verified benchmark sources
2026-05-25 21:05:14 +01:00
Deploy Bot
f9bed0f262 fix(dashboard): correct computeAgentScore formula and inline benchmark data
- SWE=null no longer zeroes score; weight IF at 0.85 for reasoning-only models
- Inline MODEL_BENCHMARKS const (sync script doesn't populate benchmarks)
- Hash fallback tightened from 50-85 to 55-80
- History-miner now shows +10 improvement (82 vs 72) instead of false regression
2026-05-25 16:31:15 +01:00
Deploy Bot
699456b49e feat(dashboard): replace raw Canvas with Chart.js for all Impact tab charts
- Add Chart.js 4.4.7 via CDN + datalabels plugin
- Agent Score: horizontal bar chart, sorted descending, color-coded
- Model Distribution: doughnut with right-side legend + percentages
- Migration Impact: grouped before/after bars with tooltip showing delta
- Dark theme defaults: #8ba3c0 text, #1e2d45 grid
- Chart instances destroyed before re-render to prevent memory leaks
- Responsive: maintainAspectRatio: false
2026-05-25 15:45:14 +01:00
Deploy Bot
19be5cf229 fix(dashboard): rewrite Impact tab charts to work with actual data structure
Replaced broken chart functions that expected non-existent fit_score_after/before
with data-agnostic implementations using model names + benchmark lookup.

- Agent Score Bar Chart: horizontal bars per agent, sorted descending, color-coded
- Model Distribution: donut chart with legend on the right
- Migration Impact Bars: before/after comparison from history entries
- Added getModelScore() helper with deterministic fallback
- Added 'Sync Evolution Data' button if data missing

Fixes: canvas dimensions, getBoundingClientRect() == 0 when tab hidden
2026-05-25 15:18:35 +01:00
Deploy Bot
047a87afb4 feat(agent-models): apply MEDIUM+LOW priority model migrations
- markdown-validator: deepseek-v4-pro-max → nemotron-3-nano (90% cost cut)
- release-manager: glm-5.1 → kimi-k2.6 (+2 matrix, 1M context for diffs)
- capability-analyst: glm-5.1 → deepseek-v4-pro-max (+4 matrix, 1M ctx)
- browser-automation: qwen3-coder → deepseek-v4-flash (3× faster inference)
- history-miner: nemotron-3-super → qwen3.5-122b (+14 IF, 12.4M pulls)
2026-05-25 15:07:17 +01:00
NW
4c9a95661f evolution: remove obsolete :cloud suffix from kimi-k2.6 model id across all configs 2026-05-13 09:27:48 +01:00
¨NW¨
fb552e0020 feat: v3 optimal model assignments + fitness gate
- Update 30 agents to v3 heatmap maximum-score models:
  * go-dev: qwen3-coder -> deepseek-v4-pro-max (85->88 +3)
  * planner: nemotron -> deepseek-v4-pro-max (80->88 +8)
  * perf-engineer: nemotron -> deepseek-v4-pro-max (78->84 +6)
  * reflector: nemotron -> deepseek-v4-pro-max (78->84 +6)
  * security: nemotron -> deepseek-v4-pro-max (76->80 +4)
  * memory-manager: nemotron -> qwen3.6-plus (86->87 +1)
  * frontend: kimi-k2.5 -> minimax-m2.5 (92)
  * the-fixer: minimax-m2.5 -> kimi-k2.6 (88->90 +2)
  * browser-auto: kimi-k2.6 -> qwen3-coder (86->87 +1)
  * prompt-opt: glm-5.1 -> qwen3.6-plus (82->83 +1)
  * backend: deepseek-v3.2 -> qwen3-coder (91)
  * capability-analyst: nemotron -> glm-5.1 (85)
  * release-man: devstral-2 -> glm-5.1 (82)
  * evaluator: nemotron -> glm-5.1 (86)
  * workflow-arch: gpt-oss -> glm-5.1 (84)

- Add Model Evolution Guard:
  * fitness-gate.cjs: rejects downgrades >3 points or <75 score
  * Normalized model ID lookup (: vs -)
  * Diff report before any file modifications
- Update sync-benchmarks-from-yaml.cjs with fitness gate
- Sync kilo-meta.json, kilo.jsonc, .md agent files
- Rebuild research-dashboard.html (104KB, 30 agents, 11 models)

Total improvement: +105 points across 11 agents
Source: v3.html heatmap IF-adjusted composite scores
2026-04-30 08:42:10 +01:00
¨NW¨
9e48a4960e fix: restore optimal v3 models + add fitness gate protection
- Restore all 30 agents to v3.html heatmap optimal models:
  * frontend-developer: qwen3-coder -> minimax-m2.5 (92★)
  * devops-engineer: nemotron-3-super -> kimi-k2.6:cloud (88★)
  * browser-automation: qwen3-coder -> kimi-k2.6:cloud (86★)
  * agent-architect: glm-5.1 -> kimi-k2.6:cloud (86★)
- Add Model Evolution Guard system:
  * agent-evolution/scripts/lib/fitness-gate.cjs
  * Rejects downgrades >3 points or below score 75
  * Produces detailed diff report before any file modifications
  * Normalized model ID lookup (v3.html ':' vs JSON '-')
- Update sync-benchmarks-from-yaml.cjs with fitness gate
- Update model-benchmarks.json with v3 optimal assignments
- Rebuild research-dashboard.html (104KB, 30 agents, 11 models)
- Add model-evolution-guard.md architecture documentation
- Add v3-optimal-models.json as source-of-truth reference

Fixes regression introduced by commit 3badb25 where models were
silently downgraded from heatmap optimal to inferior assignments.
2026-04-29 23:19:16 +01:00
¨NW¨
d1516f4856 chore: organize temporary research artifacts into archive
- Create agent-evolution/archive/ with scripts/, reports/, data/
- Move 11 Python migration/diagnostic scripts
- Move 7 intermediate report files (json, md, txt)
- Move test data and old dashboard builds
- Add archive/README.md with full index of contents
- Update .gitignore to exclude archive/scripts, reports, data
- Keep archive/README.md tracked for documentation
2026-04-29 21:14:23 +01:00
¨NW¨
3badb259cc feat: bidirectional research dashboard + agent config fixes
- Integrate apaw_agent_model_research_v3.html as standalone dashboard
- Add model-benchmarks.json with 32 agents, 11 scored models, 11 recommendations
- Add build-research-dashboard.ts: inject live data into template → standalone HTML
- Add rebuild-template.cjs: regenerate template from v3.html source
- Add sync-benchmarks-from-yaml.cjs: sync YAML → JSON round-trip
- Add sync-model-research.ts: apply recommendation matrix to config files
- Add model-benchmarks.schema.json and model-research.schema.json for validation
- Add bidirectional-data-flow.md architecture documentation
- Add log-execution.cjs pipeline hook
- Update capability-index.yaml: add fallback_models, failover_strategy
- Update kilo-meta.json, kilo.jsonc, KILO_SPEC.md with synced models
- Update evolution.md / research.md / self-evolution.md / evolutionary-sync.md docs
- Fix security-auditor.md: quote YAML color (#DC2626)
- Fix orchestrator.md: remove duplicate devops-engineer key
- Build research-dashboard.html (106KB standalone) + dated archive
2026-04-29 21:04:22 +01:00
¨NW¨
3127d82102 feat: sync agent evolution data and add self-diagnostic report 2026-04-23 07:58:44 +01:00
¨NW¨
fa68141d47 feat: add pipeline-judge agent and evolution workflow system
- Add pipeline-judge agent for objective fitness scoring
- Update capability-index.yaml with pipeline-judge, evolution config
- Add fitness-evaluation.md workflow for auto-optimization
- Update evolution.md command with /evolve CLI
- Create .kilo/logs/fitness-history.jsonl for metrics logging
- Update AGENTS.md with new workflow state machine
- Add 6 new issues to MILESTONE_ISSUES.md for evolution integration
- Preserve ideas in agent-evolution/ideas/

Pipeline Judge computes fitness = (test_rate*0.5) + (gates*0.25) + (efficiency*0.25)
Auto-triggers prompt-optimizer when fitness < 0.70
2026-04-06 00:23:50 +01:00
¨NW¨
1ab9939c92 fix: correct OpenRouter model paths across all files
Fixed format from 'qwen/...' to 'openrouter/qwen/...' for:
- product-owner.md
- prompt-optimizer.md
- workflow-architect.md
- status.md, blog.md, booking.md, commerce.md
- kilo.jsonc (default model + ask agent)
- agent-frontmatter-validation.md
- agent-versions.json (recommendations and history)
2026-04-05 23:47:14 +01:00
¨NW¨
6ba325cec5 fix: correct model path format for OpenRouter
Changed qwen/qwen3.6-plus:free to openrouter/qwen/qwen3.6-plus:free
for capability-analyst, agent-architect, and evaluator agents.
2026-04-05 23:42:32 +01:00
¨NW¨
a4e09ad5d5 feat: upgrade agent models based on research findings
- capability-analyst: nemotron-3-super → qwen3.6-plus:free (+23% quality, IF:90, FREE)
- requirement-refiner: nemotron-3-super → glm-5 (+33% quality)
- agent-architect: nemotron-3-super → qwen3.6-plus:free (+22% quality)
- evaluator: nemotron-3-super → qwen3.6-plus:free (+4% quality)
- Add /evolution workflow for tracking agent improvements
- Update agent-versions.json with evolution history
2026-04-05 23:37:23 +01:00
¨NW¨
fe28aa5922 chore: reorganize project structure and update README
- Move docker-compose.evolution.yml to agent-evolution/docker-compose.yml
- Update README with current agent lineup (28+ agents)
- Fix model references in README tables
- Add recent commits history
- Simplify architecture overview
2026-04-05 23:02:44 +01:00
¨NW¨
15a7b4b7a4 feat: add Agent Evolution Dashboard
- Create agent-evolution/ directory with standalone dashboard
- Add interactive HTML dashboard with agent/model matrix
- Add heatmap view for agent-model compatibility scores
- Add recommendations tab with optimization suggestions
- Add Gitea integration preparation (history timeline)
- Add Docker configuration for deployment
- Add build scripts for standalone HTML generation
- Add sync scripts for agent data synchronization
- Add milestone and issues documentation
- Add skills and rules for evolution sync
- Update AGENTS.md with dashboard documentation
- Update package.json with evolution scripts

Features:
- 28 agents with model assignments and fit scores
- 8 models with benchmarks (SWE-bench, RULER, Terminal)
- 11 recommendations for model optimization
- History timeline with agent changes
- Interactive modal windows for model details
- Filter and search functionality
- Russian language interface
- Works offline (file://) with embedded data

Docker:
- Dockerfile for standalone deployment
- docker-compose.evolution.yml
- docker-run.sh/docker-run.bat scripts

NPM scripts:
- sync:evolution - sync and build dashboard
- evolution:open - open in browser
- evolution:dashboard - start dev server

Status: PAUSED - foundation complete, Gitea integration pending
2026-04-05 19:58:59 +01:00