# Kilo Code Agents Reference This file configures AI agent behavior for the project - a self-improving code pipeline with Gitea logging. ## Pipeline Workflow The main workflow is `/pipeline` - use it to process issues through all agents automatically. ``` User: /pipeline 42 Agent: Runs full pipeline for issue #42 with Gitea logging ``` ## Commands (Slash Commands) | Command | Description | Usage | |---------|-------------|-------| | `/pipeline ` | Run full agent pipeline for issue | `/pipeline 42` | | `/nextjs` | Next.js 14+ full-stack app pipeline | `/nextjs my-app` | | `/vue` | Vue/Nuxt 3 full-stack app pipeline | `/vue my-app` | | `/laravel` | Laravel full-stack app pipeline | `/laravel my-app` | | `/wordpress` | WordPress plugin/site pipeline | `/wordpress my-plugin` | | `/feature` | Feature development pipeline | `/feature` | | `/commerce` | E-commerce site pipeline | `/commerce` | | `/status ` | Check pipeline status for issue | `/status 42` | | `/evolve` | Run evolution cycle with fitness scoring | `/evolve --issue 42` | | `/evaluate ` | Generate performance report | `/evaluate 42` | | `/plan` | Creates detailed task plans | `/plan feature X` | | `/ask` | Answers codebase questions | `/ask how does auth work` | | `/debug` | Analyzes and fixes bugs | `/debug error in login` | | `/code` | Quick code generation | `/code add validation` | | `/research [topic]` | Run research and self-improvement | `/research multi-agent` | | `/evolution log` | Log agent model change | `/evolution log planner "reason"` | | `/evolution report` | Generate evolution report | `/evolution report` | | `/index-project` | Index codebase into .architect/ for agent orientation | `/index-project` | | `/web-test ` | Visual regression testing in Docker | `/web-test https://bbox.wtf` | | `/e2e-test ` | E2E browser automation tests | `/e2e-test https://my-app.com` | ## Pipeline Agents (Subagents) These agents are invoked automatically by `/pipeline` or manually via `@mention`: ### Core Development | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@requirement-refiner` | Converts ideas to User Stories | glm-5.1 | thinking | history-miner, system-analyst | | `@history-miner` | Finds duplicates in git | nemotron-3-super | — | *(read-only)* | | `@system-analyst` | Designs specifications | glm-5.1 | thinking | sdet-engineer, orchestrator | | `@sdet-engineer` | Writes tests (TDD) | qwen3-coder:480b | thinking | lead-developer, orchestrator | | `@lead-developer` | Implements code | qwen3-coder:480b | thinking | code-skeptic, orchestrator | | `@frontend-developer` | UI (Next.js, Vue/Nuxt, React) | qwen3-coder:480b | — | code-skeptic, visual-tester, orchestrator | | `@backend-developer` | Node.js/Express/APIs | qwen3-coder:480b | — | code-skeptic, orchestrator | | `@php-developer` | PHP/Laravel/Symfony/WordPress | qwen3-coder:480b | thinking | code-skeptic, security-auditor, orchestrator | | `@python-developer` | Python/Django/FastAPI | qwen3-coder:480b | thinking | code-skeptic, security-auditor, orchestrator | | `@go-developer` | Go backend services | qwen3-coder:480b | — | code-skeptic, orchestrator | | `@flutter-developer` | Flutter mobile apps | qwen3-coder:480b | — | code-skeptic, orchestrator | ### Quality Assurance | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@code-skeptic` | Adversarial review | minimax-m2.5 | — | the-fixer, performance-engineer, orchestrator | | `@the-fixer` | Fixes issues | minimax-m2.5 | — | code-skeptic, orchestrator | | `@performance-engineer` | Performance review | nemotron-3-super | — | the-fixer, security-auditor, orchestrator | | `@security-auditor` | Security audit | nemotron-3-super | — | the-fixer, release-manager, orchestrator | | `@visual-tester` | Visual regression + bbox extraction + console/network errors | qwen3-coder:480b | — | the-fixer, orchestrator | | `@browser-automation` | E2E testing | qwen3-coder:480b | — | orchestrator | ### DevOps & Infrastructure | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@devops-engineer` | Docker/K8s/CI-CD | nemotron-3-super | — | code-skeptic, security-auditor, orchestrator | | `@release-manager` | Git operations, releases | glm-5.1 | — | evaluator | ### Meta & Process | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@evaluator` | Scores effectiveness | glm-5.1 | thinking | prompt-optimizer, product-owner, orchestrator | | `@pipeline-judge` | Objective fitness scoring | glm-5.1 | — | prompt-optimizer | | `@prompt-optimizer` | Improves prompts | glm-5.1 | instant | *(edits files)* | | `@product-owner` | Manages issues/tracking | glm-5.1 | — | *(read-only)* | ### Analysis & Design | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@capability-analyst` | Analyzes task coverage | glm-5.1 | — | agent-architect, orchestrator | | `@agent-architect` | Creates new agents | glm-5.1 | thinking | capability-analyst, requirement-refiner, system-analyst | | `@workflow-architect` | Creates workflows | glm-5.1 | thinking | *(edits files)* | | `@markdown-validator` | Validates Markdown | nemotron-3-nano:30b | — | orchestrator | | `@architect-indexer` | Maps project codebase into .architect/ | glm-5.1 | thinking | system-analyst, orchestrator | ### Cognitive Enhancement | Agent | Role | Model | Variant | Can Call | |-------|------|-------|---------|----------| | `@planner` | Task decomposition | nemotron-3-super | — | *(read-only)* | | `@reflector` | Self-reflection | nemotron-3-super | — | *(read-only)* | | `@memory-manager` | Memory systems | nemotron-3-super | — | *(read-only)* | ## Workflow State Machine ``` [new] ↓ @requirement-refiner [planned] ↓ @capability-analyst → (gaps?) → @agent-architect → create new agents ↓ @history-miner [researching] ↓ @system-analyst [designed] ↓ @sdet-engineer (writes failing tests) [testing] ↓ @lead-developer (makes tests pass) [implementing] ↓ @code-skeptic (review) [reviewing] ──[fail]──→ [fixing] ──→ [reviewing] ↓ @review-watcher → (auto-validate) → create fix tasks ↓ [pass] [perf-check] ↓ @performance-engineer [security-check] ↓ @security-auditor [releasing] ↓ @release-manager [evaluated] ↓ @evaluator (subjective score 1-10) ├── [score ≥ 7] → [@pipeline-judge] → fitness scoring └── [score < 7] → @prompt-optimizer → [@evaluated] ↓ [@pipeline-judge] ← runs tests, measures tokens/time ↓ fitness score ↓ ┌──────────────────────────────────────┐ │ fitness >= 0.85 │──→ [completed] │ fitness 0.70-0.84 │──→ @prompt-optimizer → [evolving] │ fitness < 0.70 │──→ @prompt-optimizer (major) → [evolving] │ fitness < 0.50 │──→ @agent-architect → redesign └──────────────────────────────────────┘ ↓ [evolving] → re-run workflow → [@pipeline-judge] ↓ compare fitness_before vs fitness_after ↓ [improved?] → commit prompts → [completed] └─ [not improved?] → revert → try different strategy ``` ## Capability Analysis Flow When starting a complex task: ``` [User Request] ↓ [@capability-analyst] ← Analyzes requirements vs existing capabilities ↓ [Gap Analysis] ← Identifies missing agents, workflows, skills ↓ [Recommendations] → Create new or enhance existing? ↓ [Decision] ├── [Create New] → [@agent-architect] → Create component → Review └── [Enhance] → [@lead-developer] → Modify existing ↓ [Integration] ← Verify new component works with system ↓ [Complete] ← Task can now be handled ``` ## Gitea Integration ### Status Labels Pipeline uses Gitea labels to track progress: - `status: new` → `status: planned` → `status: researching` → ... - Agents add/remove labels automatically ### Performance Logging Each agent logs to Gitea issue comments: ```markdown ## ✅ lead-developer completed **Score**: 8/10 **Duration**: 1.2h **Files**: src/auth.ts, src/user.ts ### Notes - Clean implementation - Follows existing patterns - Tests passing ``` ### Efficiency Tracking Scores saved to `.kilo/logs/efficiency_score.json`: ```json { "version": "1.0", "history": [ { "issue": 42, "date": "2024-01-02T10:00:00Z", "agents": { "lead-developer": 8, "code-skeptic": 7, "the-fixer": 9 }, "iterations": 2, "duration_hours": 1.5 } ] } ``` ### Fitness Tracking Fitness scores saved to `.kilo/logs/fitness-history.jsonl`: ```jsonl {"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47} {"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47} ``` ## Manual Agent Invocation ```typescript // Use Task tool to invoke subagent Task tool with: subagent_type: "lead-developer" prompt: "Implement authentication for issue #42" ``` Or via `@mention`: ``` @lead-developer implement authentication flow ``` ## Environment Variables Gitea integration uses centralized authentication (see `.kilo/shared/gitea-auth.md` and `.kilo/gitea.jsonc`): | Variable | Required | Description | |----------|----------|-------------| | `GITEA_API_URL` | No | API base URL (default: `https://git.softuniq.eu/api/v1`) | | `GITEA_TOKEN` | Preferred | Pre-existing API token | | `GITEA_USER` | Fallback | Username for Basic Auth token creation | | `GITEA_PASS` | Fallback | Password for Basic Auth token creation | | `GITEA_TARGET_REPO` | No | Override target project (auto-detected otherwise) | Auth resolution: `GITEA_TOKEN` → `GITEA_USER+GITEA_PASS` → `ValueError`. **NEVER hardcode credentials.** ## Self-Improvement Cycle 1. **Pipeline runs** for each issue 2. **Evaluator scores** each agent (1-10) - subjective 3. **Pipeline Judge measures** fitness objectively (0.0-1.0) 4. **Low fitness (<0.70)** triggers prompt-optimizer 5. **Prompt optimizer** analyzes failures and improves prompts 6. **Re-run workflow** with improved prompts 7. **Compare fitness** before/after - commit if improved 8. **Log results** to `.kilo/logs/fitness-history.jsonl` ### Evaluator vs Pipeline Judge | Aspect | Evaluator | Pipeline Judge | |--------|-----------|----------------| | Type | Subjective | Objective | | Score | 1-10 (opinion) | 0.0-1.0 (metrics) | | Metrics | Observations | Tests, tokens, time | | Trigger | After workflow | After evaluator | | Action | Logs to Gitea | Triggers optimization | ### Fitness Score Components ``` fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25) where: test_pass_rate = passed_tests / total_tests quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage) efficiency_score = 1.0 - clamp(normalized_cost, 0, 1) ``` ## Architecture Files | File | Purpose | |------|---------| | `AGENTS.md` | This file - main config | | `.kilo/agents/*.md` | Agent definitions with prompts | | `.kilo/commands/*.md` | Workflow commands | | `.kilo/rules/*.md` | Custom rules loaded globally | | `.kilo/skills/` | Skill modules | | `.kilo/shared/gitea-auth.md` | Centralized Gitea auth (env vars, no hardcoded creds) | | `.kilo/gitea.jsonc` | Gitea auth structure (env var mapping) | | `.kilo/shared/gitea-api.md` | Centralized Gitea API client | | `.kilo/shared/gitea-commenting.md` | Comment format for Gitea | | `.kilo/shared/self-evolution.md` | Self-evolution protocol | | `.kilo/rules/architect-first-contact.md` | First-contact project indexing rules | | `.kilo/skills/project-mapping/SKILL.md` | Project mapping skill (`.architect/` system) | | `.architect/` | Project codebase map (auto-indexed, see below) | | `src/kilocode/` | TypeScript API for programmatic use | ## `.architect/` Project Mapping The `.architect/` directory is the **project brain** — a structured, auto-indexed map of the codebase that all agents read before starting work. ### When Is It Used 1. **Orchestrator first contact**: Before routing any task, checks `.architect/state.json` 2. **Stale or missing**: Triggers `architect-indexer` to build/update 3. **Fresh**: Agents read relevant sections for context 4. **After changes**: `lead-developer`/`the-fixer` mark affected sections as stale ### Structure ``` .architect/ ├── README.md # Navigation index (auto-updated) ├── project.json # Machine-readable project metadata ├── state.json # Index freshness state (hashes, timestamps) ├── architecture/ │ ├── overview.md # Architecture pattern, layers, boundaries │ └── dependency-graph.md # Module dependency graph ├── entities/ │ └── entities.md # Domain entities, fields, relationships ├── db-schema/ │ └── schema.md # Tables, columns, indexes, foreign keys ├── api-surface/ │ └── endpoints.md # API endpoints, methods, auth, controllers ├── conventions/ │ └── conventions.md # Naming, patterns, forbidden practices ├── maps/ │ ├── file-graph.json # Programmatic file→imports/exports graph │ └── module-graph.json # Programmatic module→dependencies graph └── tech-stack/ └── stack.md # Languages, frameworks, databases, tools ``` ### Context Injection Per Agent | Agent | `.architect/` Sections | |-------|----------------------| | system-analyst | architecture/overview, entities, db-schema, api-surface | | sdet-engineer | api-surface, entities, conventions | | lead-developer | conventions, entities, architecture/overview | | code-skeptic | conventions, architecture/dependency-graph | | the-fixer | conventions, relevant file section | | php-developer | conventions, entities, db-schema, api-surface | | python-developer | conventions, entities, db-schema, api-surface | | go-developer | conventions, entities, db-schema, api-surface | | frontend-developer | conventions, api-surface, architecture/overview | | backend-developer | conventions, entities, db-schema, api-surface | ### Staleness Triggers | Event | Sections Marked Stale | |-------|----------------------| | New/removed file | file_graph, module_graph | | New dependency | tech_stack (full reindex) | | New migration | db_schema | | New model/entity | entities | | New API endpoint | api_surface | | Convention change | conventions | | Structural refactor | architecture_overview, dependency_graph | ## Using the TypeScript API ```typescript import { PipelineRunner, GiteaClient, decideRouting } from './src/kilocode/index.js' const runner = await createPipelineRunner({ giteaToken: process.env.GITEA_TOKEN }) await runner.run({ issueNumber: 42 }) ``` ## Agent Evolution Dashboard Track agent model changes, performance, and recommendations in real-time. ### Access ```bash # Sync agent data bun run sync:evolution # Open dashboard bun run evolution:dashboard bun run evolution:open # or visit http://localhost:3001 ``` ### Dashboard Tabs | Tab | Description | |-----|-------------| | **Overview** | Stats, recent changes, pending recommendations | | **All Agents** | Filterable agent cards with history | | **Timeline** | Full evolution history | | **Recommendations** | Priority-based model suggestions | | **Model Matrix** | Agent × Model mapping with fit scores | ### Data Sources | Source | What it tracks | |--------|----------------| | `.kilo/agents/*.md` | Model, description, capabilities | | `.kilo/kilo.jsonc` | Model assignments | | `.kilo/capability-index.yaml` | Capability routing | | Git History | Model and prompt changes | | Gitea Comments | Performance scores | ### Evolution Data Structure ```json { "agents": { "lead-developer": { "current": { "model": "qwen3-coder:480b", "fit_score": 92 }, "history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }], "performance_log": [{ "issue": 42, "score": 8, "success": true }] } } } ``` ### Recommendations Priority | Priority | When | Example | |----------|------|---------| | **Critical** | Fit score < 70 | Immediate model change required | | **High** | Model unavailable | Switch to fallback | | **Medium** | Better model available | Consider upgrade | | **Low** | Optimization possible | Optional improvement | ## Agent Execution Monitoring Every agent invocation is logged to `.kilo/logs/agent-executions.jsonl` for project-level monitoring. ### Log Format ```jsonl {"ts":"2026-04-18T14:00:00Z","agent":"php-developer","issue":42,"project":"UniqueSoft/my-shop","task":"Create Product model","subtask_type":"model_creation","duration_ms":45000,"tokens_used":8500,"status":"success","files":["app/Models/Product.php"],"score":8,"next_agent":"code-skeptic"} ``` ### Monitoring Commands ```bash # Agent stats report bun run scripts/agent-stats.ts # Stats for last 7 days bun run scripts/agent-stats.ts --last 7 # Stats for specific project bun run scripts/agent-stats.ts --project UniqueSoft/my-shop ``` ### Required Logging Fields | Field | Description | |-------|-------------| | `agent` | Agent name | | `issue` | Gitea issue number | | `project` | Target project repo (NOT hardcoded APAW) | | `task` | Atomic task description | | `duration_ms` | Execution time | | `tokens_used` | Token estimate | | `status` | success/fail/pass/blocked | ## Critical Rules ### Target Project (NOT APAW) **Issues MUST be created in the target project repository, NOT in APAW.** APAW is the agent framework, not the default project. ```bash # Auto-detect from git remote TARGET_REPO=$(git remote get-url origin | sed 's:/*$::' | sed -E 's|.*[:/]([^/]+/[^/]+?)(\.git)?$|\1|') ``` ### Atomic Tasks (1 action = 1 task) Every agent invocation solves exactly ONE atomic task: - ❌ "Implement the entire e-commerce backend" - ✅ "Create Product model with migration" - ✅ "Add POST /api/products endpoint" ### Modular Code - Maximum 100 lines per file - Maximum 30 lines per function - Features organized as independent modules - Cross-module communication via events/interfaces only ### Token Budgets | Task Size | Max Tokens | Example | |----------|-----------|---------| | Tiny | 2,000 | Fix typo, add config | | Small | 5,000 | Create model + migration | | Medium | 10,000 | Create API endpoint + test | | Large | 20,000 | Create service with 3 methods | ## Code Style - Use TypeScript for new files - Follow existing patterns - Write tests before code (TDD) - Keep functions under 50 lines - Use early returns - No comments unless explicitly requested