# Kilo Code Agents Reference This file configures AI agent behavior for the APAW project - a self-improving code pipeline with Gitea logging. ## Pipeline Workflow The main workflow is `/pipeline` - use it to process issues through all agents automatically. ``` User: /pipeline 42 Agent: Runs full pipeline for issue #42 with Gitea logging ``` ## Commands (Slash Commands) | Command | Description | Usage | |---------|-------------|-------| | `/pipeline ` | Run full agent pipeline for issue | `/pipeline 42` | | `/status ` | Check pipeline status for issue | `/status 42` | | `/evolve` | Run evolution cycle with fitness scoring | `/evolve --issue 42` | | `/evaluate ` | Generate performance report | `/evaluate 42` | | `/plan` | Creates detailed task plans | `/plan feature X` | | `/ask` | Answers codebase questions | `/ask how does auth work` | | `/debug` | Analyzes and fixes bugs | `/debug error in login` | | `/code` | Quick code generation | `/code add validation` | | `/research [topic]` | Run research and self-improvement | `/research multi-agent` | | `/evolution log` | Log agent model change | `/evolution log planner "reason"` | | `/evolution report` | Generate evolution report | `/evolution report` | | `/web-test ` | Visual regression testing in Docker | `/web-test https://bbox.wtf` | | `/e2e-test ` | E2E browser automation tests | `/e2e-test https://my-app.com` | ## Pipeline Agents (Subagents) These agents are invoked automatically by `/pipeline` or manually via `@mention`: ### Core Development | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@requirement-refiner` | Converts ideas to User Stories | glm-5.1 | thinking | history-miner, system-analyst | | `@history-miner` | Finds duplicates in git | nemotron-3-super | — | *(read-only)* | | `@system-analyst` | Designs specifications | glm-5.1 | thinking | sdet-engineer, orchestrator | | `@sdet-engineer` | Writes tests (TDD) | qwen3-coder:480b | thinking | lead-developer, orchestrator | | `@lead-developer` | Implements code | qwen3-coder:480b | thinking | code-skeptic, orchestrator | | `@frontend-developer` | UI implementation | qwen3-coder:480b | — | code-skeptic, orchestrator | | `@backend-developer` | Node.js/Express/APIs | qwen3-coder:480b | — | code-skeptic, orchestrator | | `@go-developer` | Go backend services | qwen3-coder:480b | — | code-skeptic, orchestrator | | `@flutter-developer` | Flutter mobile apps | qwen3-coder:480b | — | code-skeptic, orchestrator | ### Quality Assurance | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@code-skeptic` | Adversarial review | minimax-m2.5 | — | the-fixer, performance-engineer, orchestrator | | `@the-fixer` | Fixes issues | minimax-m2.5 | — | code-skeptic, orchestrator | | `@performance-engineer` | Performance review | nemotron-3-super | — | the-fixer, security-auditor, orchestrator | | `@security-auditor` | Security audit | nemotron-3-super | — | the-fixer, release-manager, orchestrator | | `@visual-tester` | Visual regression + bbox extraction + console/network errors | qwen3-coder:480b | — | the-fixer, orchestrator | | `@browser-automation` | E2E testing | qwen3-coder:480b | — | orchestrator | ### DevOps & Infrastructure | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@devops-engineer` | Docker/K8s/CI-CD | nemotron-3-super | — | code-skeptic, security-auditor, orchestrator | | `@release-manager` | Git operations, releases | glm-5.1 | — | evaluator | ### Meta & Process | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@evaluator` | Scores effectiveness | glm-5.1 | thinking | prompt-optimizer, product-owner, orchestrator | | `@pipeline-judge` | Objective fitness scoring | glm-5.1 | — | prompt-optimizer | | `@prompt-optimizer` | Improves prompts | glm-5.1 | instant | *(edits files)* | | `@product-owner` | Manages issues/tracking | glm-5.1 | — | *(read-only)* | ### Analysis & Design | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@capability-analyst` | Analyzes task coverage | glm-5.1 | — | agent-architect, orchestrator | | `@agent-architect` | Creates new agents | glm-5.1 | thinking | capability-analyst, requirement-refiner, system-analyst | | `@workflow-architect` | Creates workflows | glm-5.1 | thinking | *(edits files)* | | `@markdown-validator` | Validates Markdown | nemotron-3-nano:30b | — | orchestrator | ### Cognitive Enhancement | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@planner` | Task decomposition | nemotron-3-super | — | *(read-only)* | | `@reflector` | Self-reflection | nemotron-3-super | — | *(read-only)* | | `@memory-manager` | Memory systems | nemotron-3-super | — | *(read-only)* | ## Workflow State Machine ``` [new] ↓ @requirement-refiner [planned] ↓ @capability-analyst → (gaps?) → @agent-architect → create new agents ↓ @history-miner [researching] ↓ @system-analyst [designed] ↓ @sdet-engineer (writes failing tests) [testing] ↓ @lead-developer (makes tests pass) [implementing] ↓ @code-skeptic (review) [reviewing] ──[fail]──→ [fixing] ──→ [reviewing] ↓ @review-watcher → (auto-validate) → create fix tasks ↓ [pass] [perf-check] ↓ @performance-engineer [security-check] ↓ @security-auditor [releasing] ↓ @release-manager [evaluated] ↓ @evaluator (subjective score 1-10) ├── [score ≥ 7] → [@pipeline-judge] → fitness scoring └── [score < 7] → @prompt-optimizer → [@evaluated] ↓ [@pipeline-judge] ← runs tests, measures tokens/time ↓ fitness score ↓ ┌──────────────────────────────────────┐ │ fitness >= 0.85 │──→ [completed] │ fitness 0.70-0.84 │──→ @prompt-optimizer → [evolving] │ fitness < 0.70 │──→ @prompt-optimizer (major) → [evolving] │ fitness < 0.50 │──→ @agent-architect → redesign └──────────────────────────────────────┘ ↓ [evolving] → re-run workflow → [@pipeline-judge] ↓ compare fitness_before vs fitness_after ↓ [improved?] → commit prompts → [completed] └─ [not improved?] → revert → try different strategy ``` ## Capability Analysis Flow When starting a complex task: ``` [User Request] ↓ [@capability-analyst] ← Analyzes requirements vs existing capabilities ↓ [Gap Analysis] ← Identifies missing agents, workflows, skills ↓ [Recommendations] → Create new or enhance existing? ↓ [Decision] ├── [Create New] → [@agent-architect] → Create component → Review └── [Enhance] → [@lead-developer] → Modify existing ↓ [Integration] ← Verify new component works with system ↓ [Complete] ← Task can now be handled ``` ## Gitea Integration ### Status Labels Pipeline uses Gitea labels to track progress: - `status: new` → `status: planned` → `status: researching` → ... - Agents add/remove labels automatically ### Performance Logging Each agent logs to Gitea issue comments: ```markdown ## ✅ lead-developer completed **Score**: 8/10 **Duration**: 1.2h **Files**: src/auth.ts, src/user.ts ### Notes - Clean implementation - Follows existing patterns - Tests passing ``` ### Efficiency Tracking Scores saved to `.kilo/logs/efficiency_score.json`: ```json { "version": "1.0", "history": [ { "issue": 42, "date": "2024-01-02T10:00:00Z", "agents": { "lead-developer": 8, "code-skeptic": 7, "the-fixer": 9 }, "iterations": 2, "duration_hours": 1.5 } ] } ``` ### Fitness Tracking Fitness scores saved to `.kilo/logs/fitness-history.jsonl`: ```jsonl {"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47} {"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47} ``` ## Manual Agent Invocation ```typescript // Use Task tool to invoke subagent Task tool with: subagent_type: "lead-developer" prompt: "Implement authentication for issue #42" ``` Or via `@mention`: ``` @lead-developer implement authentication flow ``` ## Environment Variables Required for Gitea integration: ```bash GITEA_API_URL=https://git.softuniq.eu/api/v1 GITEA_TOKEN=your-token-here ``` ## Self-Improvement Cycle 1. **Pipeline runs** for each issue 2. **Evaluator scores** each agent (1-10) - subjective 3. **Pipeline Judge measures** fitness objectively (0.0-1.0) 4. **Low fitness (<0.70)** triggers prompt-optimizer 5. **Prompt optimizer** analyzes failures and improves prompts 6. **Re-run workflow** with improved prompts 7. **Compare fitness** before/after - commit if improved 8. **Log results** to `.kilo/logs/fitness-history.jsonl` ### Evaluator vs Pipeline Judge | Aspect | Evaluator | Pipeline Judge | |--------|-----------|----------------| | Type | Subjective | Objective | | Score | 1-10 (opinion) | 0.0-1.0 (metrics) | | Metrics | Observations | Tests, tokens, time | | Trigger | After workflow | After evaluator | | Action | Logs to Gitea | Triggers optimization | ### Fitness Score Components ``` fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25) where: test_pass_rate = passed_tests / total_tests quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage) efficiency_score = 1.0 - clamp(normalized_cost, 0, 1) ``` ## Architecture Files | File | Purpose | |------|---------| | `AGENTS.md` | This file - main config | | `.kilo/agents/*.md` | Agent definitions with prompts | | `.kilo/commands/*.md` | Workflow commands | | `.kilo/rules/*.md` | Custom rules loaded globally | | `.kilo/skills/` | Skill modules | | `src/kilocode/` | TypeScript API for programmatic use | ## Using the TypeScript API ```typescript import { PipelineRunner, GiteaClient, decideRouting } from './src/kilocode/index.js' const runner = await createPipelineRunner({ giteaToken: process.env.GITEA_TOKEN }) await runner.run({ issueNumber: 42 }) ``` ## Agent Evolution Dashboard Track agent model changes, performance, and recommendations in real-time. ### Access ```bash # Sync agent data bun run sync:evolution # Open dashboard bun run evolution:dashboard bun run evolution:open # or visit http://localhost:3001 ``` ### Dashboard Tabs | Tab | Description | |-----|-------------| | **Overview** | Stats, recent changes, pending recommendations | | **All Agents** | Filterable agent cards with history | | **Timeline** | Full evolution history | | **Recommendations** | Priority-based model suggestions | | **Model Matrix** | Agent × Model mapping with fit scores | ### Data Sources | Source | What it tracks | |--------|----------------| | `.kilo/agents/*.md` | Model, description, capabilities | | `.kilo/kilo.jsonc` | Model assignments | | `.kilo/capability-index.yaml` | Capability routing | | Git History | Model and prompt changes | | Gitea Comments | Performance scores | ### Evolution Data Structure ```json { "agents": { "lead-developer": { "current": { "model": "qwen3-coder:480b", "fit_score": 92 }, "history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }], "performance_log": [{ "issue": 42, "score": 8, "success": true }] } } } ``` ### Recommendations Priority | Priority | When | Example | |----------|------|---------| | **Critical** | Fit score < 70 | Immediate model change required | | **High** | Model unavailable | Switch to fallback | | **Medium** | Better model available | Consider upgrade | | **Low** | Optimization possible | Optional improvement | ## Code Style - Use TypeScript for new files - Follow existing patterns - Write tests before code (TDD) - Keep functions under 50 lines - Use early returns - No comments unless explicitly requested