- Integrate apaw_agent_model_research_v3.html as standalone dashboard - Add model-benchmarks.json with 32 agents, 11 scored models, 11 recommendations - Add build-research-dashboard.ts: inject live data into template → standalone HTML - Add rebuild-template.cjs: regenerate template from v3.html source - Add sync-benchmarks-from-yaml.cjs: sync YAML → JSON round-trip - Add sync-model-research.ts: apply recommendation matrix to config files - Add model-benchmarks.schema.json and model-research.schema.json for validation - Add bidirectional-data-flow.md architecture documentation - Add log-execution.cjs pipeline hook - Update capability-index.yaml: add fallback_models, failover_strategy - Update kilo-meta.json, kilo.jsonc, KILO_SPEC.md with synced models - Update evolution.md / research.md / self-evolution.md / evolutionary-sync.md docs - Fix security-auditor.md: quote YAML color (#DC2626) - Fix orchestrator.md: remove duplicate devops-engineer key - Build research-dashboard.html (106KB standalone) + dated archive
588 lines
16 KiB
Markdown
588 lines
16 KiB
Markdown
# Agent Evolution Dashboard
|
||
|
||
Интерактивная панель для отслеживания эволюции агентной системы APAW.
|
||
|
||
## 🚀 Быстрый старт
|
||
|
||
### Синхронизация данных
|
||
|
||
```bash
|
||
# Синхронизировать агентов + построить standalone HTML
|
||
bun run sync:evolution
|
||
|
||
# Только построить HTML из существующих данных
|
||
bun run evolution:build
|
||
```
|
||
|
||
### Открыть в браузере
|
||
|
||
**Способ 1: Локальный файл (рекомендуется)**
|
||
|
||
```bash
|
||
# Windows
|
||
start agent-evolution\index.standalone.html
|
||
|
||
# macOS
|
||
open agent-evolution/index.standalone.html
|
||
|
||
# Linux
|
||
xdg-open agent-evolution/index.standalone.html
|
||
|
||
# Или через npm
|
||
bun run evolution:open
|
||
```
|
||
|
||
**Способ 2: HTTP сервер**
|
||
|
||
```bash
|
||
cd agent-evolution
|
||
python -m http.server 3001
|
||
|
||
# Открыть http://localhost:3001
|
||
```
|
||
|
||
**Способ 3: Docker**
|
||
|
||
```bash
|
||
# Linux/macOS
|
||
bash agent-evolution/docker-run.sh restart
|
||
|
||
# Windows
|
||
agent-evolution\docker-run.bat restart
|
||
|
||
# Открыть http://localhost:3001
|
||
```
|
||
|
||
## 📁 Структура файлов
|
||
|
||
### Быстрый запуск
|
||
|
||
```bash
|
||
# Linux/macOS
|
||
bash agent-evolution/docker-run.sh restart
|
||
|
||
# Windows
|
||
agent-evolution\docker-run.bat restart
|
||
|
||
# Открыть в браузере
|
||
http://localhost:3001
|
||
```
|
||
|
||
### Docker Compose
|
||
|
||
```bash
|
||
# Стандартный запуск
|
||
docker-compose -f docker-compose.evolution.yml up -d
|
||
|
||
# С nginx reverse proxy
|
||
docker-compose -f docker-compose.evolution.yml --profile nginx up -d
|
||
|
||
# Остановка
|
||
docker-compose -f docker-compose.evolution.yml down
|
||
```
|
||
|
||
### Управление контейнером
|
||
|
||
```bash
|
||
# Linux/macOS
|
||
bash agent-evolution/docker-run.sh build # Собрать образ
|
||
bash agent-evolution/docker-run.sh run # Запустить контейнер
|
||
bash agent-evolution/docker-run.sh stop # Остановить
|
||
bash agent-evolution/docker-run.sh restart # Пересобрать и запустить
|
||
bash agent-evolution/docker-run.sh logs # Логи
|
||
bash agent-evolution/docker-run.sh open # Открыть в браузере
|
||
bash agent-evolution/docker-run.sh sync # Синхронизировать данные
|
||
bash agent-evolution/docker-run.sh status # Статус
|
||
bash agent-evolution/docker-run.sh clean # Удалить всё
|
||
bash agent-evolution/docker-run.sh dev # Dev режим с hot reload
|
||
|
||
# Windows
|
||
agent-evolution\docker-run.bat build
|
||
agent-evolution\docker-run.bat run
|
||
agent-evolution\docker-run.bat stop
|
||
agent-evolution\docker-run.bat restart
|
||
agent-evolution\docker-run.bat logs
|
||
agent-evolution\docker-run.bat open
|
||
agent-evolution\docker-run.bat sync
|
||
agent-evolution\docker-run.bat status
|
||
agent-evolution\docker-run.bat clean
|
||
agent-evolution\docker-run.bat dev
|
||
```
|
||
|
||
### NPM Scripts
|
||
|
||
```bash
|
||
bun run evolution:build # Собрать Docker образ
|
||
bun run evolution:run # Запустить контейнер
|
||
bun run evolution:stop # Остановить
|
||
bun run evolution:dev # Docker Compose
|
||
bun run evolution:logs # Логи
|
||
bun run research:dashboard # Build research dashboard
|
||
bun run research:watch # Watch mode for dashboard
|
||
bun run research:sync # Sync model research to agents
|
||
```
|
||
|
||
## Структура
|
||
|
||
```
|
||
agent-evolution/
|
||
├── data/
|
||
│ ├── agent-versions.json # Текущее состояние + история
|
||
│ └── agent-versions.schema.json # JSON Schema
|
||
├── scripts/
|
||
│ └── sync-agent-history.ts # Скрипт синхронизации
|
||
├── index.html # Дашборд UI
|
||
└── README.md # Этот файл
|
||
```
|
||
|
||
## Research Dashboard (Model Benchmarks)
|
||
|
||
### Generate from live data
|
||
|
||
```bash
|
||
# Build research dashboard from model-benchmarks.json
|
||
bun run agent-evolution/scripts/build-research-dashboard.ts
|
||
|
||
# Watch mode — auto-rebuild on data changes
|
||
bun run agent-evolution/scripts/build-research-dashboard.ts --watch
|
||
|
||
# Open in browser
|
||
start agent-evolution/research-dashboard.html
|
||
```
|
||
|
||
### Output files
|
||
|
||
| File | Description |
|
||
|------|-------------|
|
||
| `research-dashboard.html` | Latest interactive dashboard (all 6 tabs) |
|
||
| `dist/research-dashboard-YYYY_MM_DD.html` | Dated archive |
|
||
| `research-dashboard.template.html` | Template for generation |
|
||
|
||
### Dashboard tabs
|
||
|
||
1. **Обзор** — stat cards, current config table, agent count, model count
|
||
2. **Groq** — free tier models with RPM/RPD/TPM/TPD limits, speed indicators
|
||
3. **Модели** — filterable cards with SWE-bench, IF scores, context windows, tags
|
||
4. **Матрица** — Agent×Model heatmap with IF adjustment, tooltips, color coding
|
||
5. **Рекомендации** — selectable cards with JSON export, impact analysis
|
||
6. **Анализ профита** — before/after comparison, canvas charts, closed-source comparison
|
||
|
||
### Source data
|
||
|
||
The dashboard reads from `agent-evolution/data/model-benchmarks.json`:
|
||
- 15 models with benchmarks (SWE-bench, IF scores)
|
||
- 36 agent configurations
|
||
- 33 agent×model score matrices
|
||
- 11 recommendations
|
||
- 5 Groq models with rate limits
|
||
- Closed-source comparison data
|
||
|
||
Refresh: run `/research models` or `/evolution research` to update
|
||
|
||
## Быстрый старт
|
||
|
||
```bash
|
||
# Синхронизировать данные агентов
|
||
bun run sync:evolution
|
||
|
||
# Запустить дашборд
|
||
bun run evolution:dashboard
|
||
|
||
# Открыть в браузере
|
||
bun run evolution:open
|
||
# или http://localhost:3001
|
||
```
|
||
|
||
## Возможности дашборда
|
||
|
||
### 1. Overview — Обзор
|
||
|
||
- **Статистика**: общее количество агентов, с историей, рекомендации
|
||
- **Recent Changes**: последние изменения моделей и промптов
|
||
- **Pending Recommendations**: критические рекомендации по обновлению
|
||
|
||
### 2. All Agents — Все агенты
|
||
|
||
- Поиск и фильтрация по категориям
|
||
- Карточки агентов с:
|
||
- Текущей моделью
|
||
- Fit Score
|
||
- Количеством capability
|
||
- Историей изменений
|
||
|
||
### 3. Timeline — История
|
||
|
||
- Полная хронология изменений
|
||
- Типы событий: model_change, prompt_change, agent_created
|
||
- Фильтрация по дате
|
||
|
||
### 4. Recommendations — Рекомендации
|
||
|
||
- Агенты с pending recommendations
|
||
- Приоритеты: critical, high, medium, low
|
||
- Экспорт в JSON
|
||
|
||
### 5. Model Matrix — Матрица моделей
|
||
|
||
- Таблица Agent × Model
|
||
- Fit Score для каждой пары
|
||
- Визуализация provider distribution
|
||
|
||
## Источники данных
|
||
|
||
### 1. Agent Files (`.kilo/agents/*.md`)
|
||
|
||
```yaml
|
||
---
|
||
model: ollama-cloud/qwen3-coder:480b
|
||
description: Primary code writer
|
||
mode: subagent
|
||
color: "#DC2626"
|
||
---
|
||
```
|
||
|
||
### 2. Capability Index (`.kilo/capability-index.yaml`)
|
||
|
||
```yaml
|
||
agents:
|
||
lead-developer:
|
||
model: ollama-cloud/qwen3-coder:480b
|
||
capabilities: [code_writing, refactoring]
|
||
```
|
||
|
||
### 3. Kilo Config (`.kilo/kilo.jsonc`)
|
||
|
||
```json
|
||
{
|
||
"agent": {
|
||
"lead-developer": {
|
||
"model": "ollama-cloud/qwen3-coder:480b"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4. Git History
|
||
|
||
```bash
|
||
git log --all --oneline -- ".kilo/agents/"
|
||
```
|
||
|
||
### 5. Gitea Issue Comments
|
||
|
||
```markdown
|
||
## ✅ lead-developer completed
|
||
|
||
**Score**: 8/10
|
||
**Duration**: 1.2h
|
||
**Files**: src/auth.ts, src/user.ts
|
||
```
|
||
|
||
### 6. Model Benchmarks (agent-evolution/data/model-benchmarks.json)
|
||
|
||
Research data extracted from `apaw_agent_model_research_v3.html`:
|
||
- Static benchmark scores (SWE-bench, IF scores, context windows)
|
||
- Heatmap compatibility matrix
|
||
- Provider rate limits
|
||
- Recommendation history
|
||
|
||
### 7. Model Research Output (agent-evolution/data/model-research-latest.json)
|
||
|
||
Dynamic research results:
|
||
- Fresh model data from provider APIs
|
||
- IF-adjusted agent×model scores
|
||
- Pending recommendations with impact levels
|
||
- Ready-to-apply YAML patches
|
||
|
||
## JSON Schema
|
||
|
||
Формат `agent-versions.json`:
|
||
|
||
```json
|
||
{
|
||
"version": "1.0.0",
|
||
"lastUpdated": "2026-04-05T17:27:00Z",
|
||
"agents": {
|
||
"lead-developer": {
|
||
"current": {
|
||
"model": "ollama-cloud/qwen3-coder:480b",
|
||
"provider": "Ollama",
|
||
"category": "Core Dev",
|
||
"fit_score": 92
|
||
},
|
||
"history": [
|
||
{
|
||
"date": "2026-04-05T05:21:00Z",
|
||
"commit": "caf77f53c8",
|
||
"type": "model_change",
|
||
"from": null,
|
||
"to": "ollama-cloud/qwen3-coder:480b",
|
||
"reason": "Initial configuration"
|
||
}
|
||
],
|
||
"performance_log": [
|
||
{
|
||
"date": "2026-04-05T10:30:00Z",
|
||
"issue": 42,
|
||
"score": 8,
|
||
"duration_ms": 120000,
|
||
"success": true
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## Model Research Data
|
||
|
||
### model-benchmarks.json
|
||
|
||
Comprehensive benchmark data from the HTML research file:
|
||
|
||
```json
|
||
{
|
||
"version": "1.0.0",
|
||
"generated": "2026-04-27T17:44:44Z",
|
||
"total_agents": 36,
|
||
"total_models_tracked": 11,
|
||
"models": [
|
||
{
|
||
"id": "ollama-cloud/qwen3-coder:480b",
|
||
"name": "Qwen3-Coder 480B",
|
||
"organization": "Qwen",
|
||
"swe_bench": 66.5,
|
||
"if_score": 88,
|
||
"context_window": "256K→1M",
|
||
"categories": ["coding", "agent"],
|
||
"provider": "ollama"
|
||
}
|
||
],
|
||
"agent_current_config": [
|
||
{ "agent": "lead-developer", "model": "ollama-cloud/qwen3-coder:480b", "fit_score": 92, "status": "optimal" }
|
||
],
|
||
"recommendations": [
|
||
{
|
||
"agent": "planner",
|
||
"current_model": "nemotron-3-super",
|
||
"recommended_model": "deepseek-v4-pro-max",
|
||
"impact": "high",
|
||
"expected_improvement": { "quality": "+10%", "speed": "~1x", "context_window": "1M" }
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### model-research-latest.json
|
||
|
||
Latest research output (overwritten each cycle):
|
||
- Generated by `/research models` or `/evolution Step 0`
|
||
- Validated against `model-research.schema.json`
|
||
- Consumed by `sync-model-research.ts`
|
||
|
||
### sync-model-research.ts
|
||
|
||
Applies model recommendations to configuration:
|
||
|
||
```bash
|
||
# Dry-run first
|
||
bun run agent-evolution/scripts/sync-model-research.ts --dry-run
|
||
|
||
# Apply all pending recommendations
|
||
bun run agent-evolution/scripts/sync-model-research.ts
|
||
|
||
# Apply for single agent
|
||
bun run agent-evolution/scripts/sync-model-research.ts --agent planner
|
||
```
|
||
|
||
Updates:
|
||
1. `.kilo/capability-index.yaml` — model assignments
|
||
2. `kilo-meta.json` — source of truth
|
||
3. `kilo.jsonc` — agent config
|
||
4. `agent-evolution/data/agent-versions.json` — history tracking
|
||
5. `.kilo/agents/*.md` frontmatter (via sync-agents.js --fix)
|
||
|
||
After applying, rebuilds dashboard automatically.
|
||
|
||
## Интеграция
|
||
|
||
### В Pipeline
|
||
|
||
Добавьте в `.kilo/commands/pipeline.md`:
|
||
|
||
```yaml
|
||
post_steps:
|
||
- name: sync_evolution
|
||
run: bun run sync:evolution
|
||
```
|
||
|
||
### В Gitea Webhooks
|
||
|
||
```typescript
|
||
// Добавить webhook в Gitea
|
||
{
|
||
"url": "http://localhost:3000/api/evolution/webhook",
|
||
"events": ["issue_comment", "issues"]
|
||
}
|
||
```
|
||
|
||
### Чтение из кода
|
||
|
||
```typescript
|
||
import { agentEvolution } from './agent-evolution/scripts/sync-agent-history';
|
||
|
||
// Получить все агенты
|
||
const agents = await agentEvolution.getAllAgents();
|
||
|
||
// Получить историю конкретного агента
|
||
const history = await agentEvolution.getAgentHistory('lead-developer');
|
||
|
||
// Записать изменение модели
|
||
await agentEvolution.recordChange({
|
||
agent: 'security-auditor',
|
||
type: 'model_change',
|
||
from: 'gpt-oss:120b',
|
||
to: 'nemotron-3-super',
|
||
reason: 'Better reasoning for security analysis',
|
||
source: 'manual'
|
||
});
|
||
```
|
||
|
||
## Рекомендации
|
||
|
||
### Приоритеты
|
||
|
||
| Priority | Criteria | Action |
|
||
|----------|----------|--------|
|
||
| Critical | Fit score < 70 | Немедленное обновление |
|
||
| High | Модель недоступна | Переключение на fallback |
|
||
| Medium | Доступна лучшая модель | Рассмотреть обновление |
|
||
| Low | Возможна оптимизация | Опционально |
|
||
|
||
### Примеры рекомендаций
|
||
|
||
```json
|
||
{
|
||
"agent": "requirement-refiner",
|
||
"recommendations": [{
|
||
"target": "ollama-cloud/nemotron-3-super",
|
||
"reason": "+22% quality, 1M context for specifications",
|
||
"priority": "critical"
|
||
}]
|
||
}
|
||
```
|
||
|
||
## Мониторинг
|
||
|
||
### Метрики агента
|
||
|
||
- **Average Score**: Средний балл за последние 10 выполнений
|
||
- **Success Rate**: Процент успешных выполнений
|
||
- **Average Duration**: Среднее время выполнения
|
||
- **Files per Task**: Среднее количество файлов на задачу
|
||
|
||
### Метрики системы
|
||
|
||
- **Total Agents**: Количество активных агентов
|
||
- **Agents with History**: Агентов с историей изменений
|
||
- **Pending Recommendations**: Количество рекомендаций
|
||
- **Provider Distribution**: Распределение по провайдерам
|
||
|
||
## Обслуживание
|
||
|
||
### Очистка истории
|
||
|
||
```bash
|
||
# Удалить дубликаты
|
||
bun run agent-evolution/scripts/cleanup.ts --dedupe
|
||
|
||
# Слить связанные изменения
|
||
bun run agent-evolution/scripts/cleanup.ts --merge
|
||
```
|
||
|
||
### Экспорт данных
|
||
|
||
```bash
|
||
# Экспортировать в CSV
|
||
bun run agent-evolution/scripts/export.ts --format csv
|
||
|
||
# Экспортировать в Markdown
|
||
bun run agent-evolution/scripts/export.ts --format md
|
||
```
|
||
|
||
### Резервное копирование
|
||
|
||
```bash
|
||
# Создать бэкап
|
||
cp agent-evolution/data/agent-versions.json agent-evolution/data/backup/agent-versions-$(date +%Y%m%d).json
|
||
|
||
# Восстановить из бэкапа
|
||
cp agent-evolution/data/backup/agent-versions-20260405.json agent-evolution/data/agent-versions.json
|
||
```
|
||
|
||
## Будущие улучшения
|
||
|
||
1. **API Endpoints**:
|
||
- `GET /api/evolution/agents` — список агентов
|
||
- `GET /api/evolution/agents/:name/history` — история агента
|
||
- `POST /api/evolution/sync` — запустить синхронизацию
|
||
|
||
2. **Real-time Updates**:
|
||
- WebSocket для обновления дашборда
|
||
- Автоматическое обновление при изменениях
|
||
|
||
3. **Analytics**:
|
||
- Графики производительности во времени
|
||
- Сравнение моделей
|
||
- Прогнозирование производительности
|
||
|
||
4. **Integration**:
|
||
- Slack/Telegram уведомления
|
||
- Автоматическое применение рекомендаций
|
||
- A/B testing моделей
|
||
|
||
## Bidirectional Data Flow
|
||
|
||
```
|
||
[/research models] OR [/evolution Step 0]
|
||
↓
|
||
[agent-evolution/data/model-research-latest.json]
|
||
↓
|
||
[bun run sync-model-research.ts]
|
||
↓
|
||
[.kilo/capability-index.yaml] → updated model assignments
|
||
[kilo-meta.json] → updated source of truth
|
||
[kilo.jsonc] → updated config
|
||
[agent-versions.json] → history entries
|
||
[.kilo/agents/*.md] → frontmatter updated
|
||
↓
|
||
[sync-agents.js --fix] → propagate to all files
|
||
↓
|
||
[bun run build-research-dashboard.ts]
|
||
↓
|
||
[research-dashboard.html] → live dashboard
|
||
[dist/dashboard-YYYY_MM_DD.html] → dated archive
|
||
↓
|
||
[/research models] ← loop continues
|
||
```
|
||
|
||
### Data staleness check
|
||
|
||
```bash
|
||
# Check if benchmarks need refresh
|
||
node -e "
|
||
const d = require('./agent-evolution/data/model-benchmarks.json');
|
||
const days = (Date.now() - new Date(d.generated)) / (1000*60*60*24);
|
||
console.log(days > 7 ? 'STALE: needs refresh' : 'FRESH', Math.round(days), 'days old');
|
||
"
|
||
```
|
||
|
||
### Auto-refresh pipeline
|
||
|
||
```yaml
|
||
# In capability-index.yaml
|
||
evolution:
|
||
auto_trigger: true
|
||
max_evolution_attempts: 3
|
||
dashboard_rebuild: true # new: auto-rebuild on model changes
|
||
``` |