- build-standalone-fixed.cjs: reads from 4 real sources (agents md, kilo-meta.json, model-benchmarks-verified.json, agent-versions.json); computes recommendations dynamically - build-standalone-direct.cjs: direct data export + HTML embed pipeline - dashboard-smoke-test.ts: Playwright E2E smoke test covering all 6 tabs - model-benchmarks-verified.json: verified IF scores from artificialanalysis.ai for 15 models (SWE-bench unverifiable → null) - agent-versions.json: 347 git history entries extracted for 34 agents - kilo-meta.json: prompt-optimizer → qwen3.5-122b, memory-manager → deepseek-v4-pro-max - index.html: Recommendations tab rendering updated for dynamic data - Dockerfile + docker-compose.yml: mount-driven build, no image rebuild for data changes - README.md: updated dashboard docs and verified benchmark sources
69 lines
2.1 KiB
Markdown
69 lines
2.1 KiB
Markdown
# APAW Agent Evolution Dashboard
|
||
|
||
## Overview
|
||
|
||
This is a standalone HTML dashboard that visualizes agent model assignments, performance scores, and recommendations for the APAW codebase.
|
||
|
||
## Features
|
||
|
||
- Real-time agent model & performance tracking
|
||
- Agent × Model compatibility heatmap
|
||
- Performance impact analysis with Chart.js visualizations
|
||
- Model recommendation engine with priority scoring
|
||
- Evolution timeline and history tracking
|
||
|
||
## Data Sources
|
||
|
||
The dashboard pulls data from three primary sources:
|
||
|
||
1. **.kilo/agents/*.md** - Agent definitions with model assignments, modes, colors, and descriptions
|
||
2. **kilo-meta.json** - Central registry of agent metadata, categories, and capabilities
|
||
3. **model-benchmarks-verified.json** - IF scores and context window data for all supported models
|
||
|
||
## Build Process
|
||
|
||
The `build-standalone-fixed.cjs` script:
|
||
|
||
1. Parses all agent YAML frontmatter
|
||
2. Computes composite performance scores using IF scores and context windows
|
||
3. Generates model recommendations based on score improvements
|
||
4. Embeds unified JSON data directly into the HTML file
|
||
5. Updates JavaScript functions to use embedded data
|
||
|
||
## Validation
|
||
|
||
The build process ensures:
|
||
- ✅ No unicode escape sequences (no \u003c or \u003e characters)
|
||
- ✅ Valid embedded JSON structure
|
||
- ✅ Clean standalone HTML file with no external dependencies
|
||
- ✅ Proper function updates (init, renderHeatmap, renderRecommendations)
|
||
|
||
## Output Files
|
||
|
||
- `index.standalone.html` - Self-contained dashboard with embedded data
|
||
- `data/index.html` - Copy of standalone dashboard for web serving
|
||
|
||
## Usage
|
||
|
||
Simply open `index.standalone.html` in any modern browser. No server or external dependencies required.
|
||
|
||
## Agent Count
|
||
|
||
The dashboard currently tracks **34 agents** across multiple categories:
|
||
- Core Development
|
||
- Quality Assurance
|
||
- Security
|
||
- Analysis
|
||
- Process Management
|
||
- Cognitive Enhancement
|
||
- Testing
|
||
|
||
## Model Support
|
||
|
||
Supports 15 verified models with IF scores from artificialanalysis.ai:
|
||
- DeepSeek V4-Pro Max (IF: 89)
|
||
- DeepSeek V4-Flash (IF: 86)
|
||
- Kimi K2.6 (IF: 91)
|
||
- Qwen3-Coder 480B (IF: 88)
|
||
- GLM-5.1 (IF: 90)
|
||
- And 10 more models |