- Add pipeline-judge agent for objective fitness scoring - Update capability-index.yaml with pipeline-judge, evolution config - Add fitness-evaluation.md workflow for auto-optimization - Update evolution.md command with /evolve CLI - Create .kilo/logs/fitness-history.jsonl for metrics logging - Update AGENTS.md with new workflow state machine - Add 6 new issues to MILESTONE_ISSUES.md for evolution integration - Preserve ideas in agent-evolution/ideas/ Pipeline Judge computes fitness = (test_rate*0.5) + (gates*0.25) + (efficiency*0.25) Auto-triggers prompt-optimizer when fitness < 0.70
11 KiB
11 KiB
Kilo Code Agents Reference
This file configures AI agent behavior for the APAW project - a self-improving code pipeline with Gitea logging.
Pipeline Workflow
The main workflow is /pipeline - use it to process issues through all agents automatically.
User: /pipeline 42
Agent: Runs full pipeline for issue #42 with Gitea logging
Commands (Slash Commands)
| Command | Description | Usage |
|---|---|---|
/pipeline <issue> |
Run full agent pipeline for issue | /pipeline 42 |
/status <issue> |
Check pipeline status for issue | /status 42 |
/evolve |
Run evolution cycle with fitness scoring | /evolve --issue 42 |
/evaluate <issue> |
Generate performance report | /evaluate 42 |
/plan |
Creates detailed task plans | /plan feature X |
/ask |
Answers codebase questions | /ask how does auth work |
/debug |
Analyzes and fixes bugs | /debug error in login |
/code |
Quick code generation | /code add validation |
/research [topic] |
Run research and self-improvement | /research multi-agent |
/evolution log |
Log agent model change | /evolution log planner "reason" |
/evolution report |
Generate evolution report | /evolution report |
Pipeline Agents (Subagents)
These agents are invoked automatically by /pipeline or manually via @mention:
Core Development
| Agent | Role | When Invoked |
|---|---|---|
@requirement-refiner |
Converts ideas to User Stories | Issue status: new |
@history-miner |
Finds duplicates in git | Status: planned |
@system-analyst |
Designs specifications | Status: researching |
@sdet-engineer |
Writes tests (TDD) | Status: designed |
@lead-developer |
Implements code | Status: testing (tests fail) |
@frontend-developer |
UI implementation | When UI work needed |
@backend-developer |
Node.js/Express/APIs | When backend needed |
@flutter-developer |
Flutter mobile apps | When mobile development |
@go-developer |
Go backend services | When Go backend needed |
Quality Assurance
| Agent | Role | When Invoked |
|---|---|---|
@code-skeptic |
Adversarial review | Status: implementing |
@the-fixer |
Fixes issues | When review fails |
@performance-engineer |
Performance review | After code-skeptic |
@security-auditor |
Security audit | After performance |
@visual-tester |
Visual regression | When UI changes |
Cognitive Enhancement (New)
| Agent | Role | When Invoked |
|---|---|---|
@planner |
Task decomposition (CoT/ToT) | Complex tasks |
@reflector |
Self-reflection (Reflexion) | After each agent |
@memory-manager |
Memory systems | Context management |
Meta & Process
| Agent | Role | When Invoked |
|---|---|---|
@release-manager |
Git operations | Status: releasing |
@evaluator |
Scores effectiveness | Status: evaluated |
@pipeline-judge |
Objective fitness scoring | After workflow completes |
@prompt-optimizer |
Improves prompts | When fitness < 0.70 |
@capability-analyst |
Analyzes task coverage | When starting new task |
@agent-architect |
Creates new agents | When gaps identified |
@workflow-architect |
Creates workflows | New workflow needed |
@markdown-validator |
Validates Markdown | Before issue creation |
Workflow State Machine
[new]
↓ @requirement-refiner
[planned]
↓ @capability-analyst → (gaps?) → @agent-architect → create new agents
↓ @history-miner
[researching]
↓ @system-analyst
[designed]
↓ @sdet-engineer (writes failing tests)
[testing]
↓ @lead-developer (makes tests pass)
[implementing]
↓ @code-skeptic (review)
[reviewing] ──[fail]──→ [fixing] ──→ [reviewing]
↓ @review-watcher → (auto-validate) → create fix tasks
↓ [pass]
[perf-check]
↓ @performance-engineer
[security-check]
↓ @security-auditor
[releasing]
↓ @release-manager
[evaluated]
↓ @evaluator (subjective score 1-10)
├── [score ≥ 7] → [@pipeline-judge] → fitness scoring
└── [score < 7] → @prompt-optimizer → [@evaluated]
↓
[@pipeline-judge] ← runs tests, measures tokens/time
↓
fitness score
↓
┌──────────────────────────────────────┐
│ fitness >= 0.85 │──→ [completed]
│ fitness 0.70-0.84 │──→ @prompt-optimizer → [evolving]
│ fitness < 0.70 │──→ @prompt-optimizer (major) → [evolving]
│ fitness < 0.50 │──→ @agent-architect → redesign
└──────────────────────────────────────┘
↓
[evolving] → re-run workflow → [@pipeline-judge]
↓
compare fitness_before vs fitness_after
↓
[improved?] → commit prompts → [completed]
└─ [not improved?] → revert → try different strategy
Capability Analysis Flow
When starting a complex task:
[User Request]
↓
[@capability-analyst] ← Analyzes requirements vs existing capabilities
↓
[Gap Analysis] ← Identifies missing agents, workflows, skills
↓
[Recommendations] → Create new or enhance existing?
↓
[Decision]
├── [Create New] → [@agent-architect] → Create component → Review
└── [Enhance] → [@lead-developer] → Modify existing
↓
[Integration] ← Verify new component works with system
↓
[Complete] ← Task can now be handled
Gitea Integration
Status Labels
Pipeline uses Gitea labels to track progress:
status: new→status: planned→status: researching→ ...- Agents add/remove labels automatically
Performance Logging
Each agent logs to Gitea issue comments:
## ✅ lead-developer completed
**Score**: 8/10
**Duration**: 1.2h
**Files**: src/auth.ts, src/user.ts
### Notes
- Clean implementation
- Follows existing patterns
- Tests passing
Efficiency Tracking
Scores saved to .kilo/logs/efficiency_score.json:
{
"version": "1.0",
"history": [
{
"issue": 42,
"date": "2024-01-02T10:00:00Z",
"agents": {
"lead-developer": 8,
"code-skeptic": 7,
"the-fixer": 9
},
"iterations": 2,
"duration_hours": 1.5
}
]
}
Fitness Tracking
Fitness scores saved to .kilo/logs/fitness-history.jsonl:
{"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47}
{"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47}
Manual Agent Invocation
// Use Task tool to invoke subagent
Task tool with:
subagent_type: "lead-developer"
prompt: "Implement authentication for issue #42"
Or via @mention:
@lead-developer implement authentication flow
Environment Variables
Required for Gitea integration:
GITEA_API_URL=https://git.softuniq.eu/api/v1
GITEA_TOKEN=your-token-here
Self-Improvement Cycle
- Pipeline runs for each issue
- Evaluator scores each agent (1-10) - subjective
- Pipeline Judge measures fitness objectively (0.0-1.0)
- Low fitness (<0.70) triggers prompt-optimizer
- Prompt optimizer analyzes failures and improves prompts
- Re-run workflow with improved prompts
- Compare fitness before/after - commit if improved
- Log results to
.kilo/logs/fitness-history.jsonl
Evaluator vs Pipeline Judge
| Aspect | Evaluator | Pipeline Judge |
|---|---|---|
| Type | Subjective | Objective |
| Score | 1-10 (opinion) | 0.0-1.0 (metrics) |
| Metrics | Observations | Tests, tokens, time |
| Trigger | After workflow | After evaluator |
| Action | Logs to Gitea | Triggers optimization |
Fitness Score Components
fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25)
where:
test_pass_rate = passed_tests / total_tests
quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage)
efficiency_score = 1.0 - clamp(normalized_cost, 0, 1)
Architecture Files
| File | Purpose |
|---|---|
AGENTS.md |
This file - main config |
.kilo/agents/*.md |
Agent definitions with prompts |
.kilo/commands/*.md |
Workflow commands |
.kilo/rules/*.md |
Custom rules loaded globally |
.kilo/skills/ |
Skill modules |
src/kilocode/ |
TypeScript API for programmatic use |
Using the TypeScript API
import {
PipelineRunner,
GiteaClient,
decideRouting
} from './src/kilocode/index.js'
const runner = await createPipelineRunner({
giteaToken: process.env.GITEA_TOKEN
})
await runner.run({ issueNumber: 42 })
Agent Evolution Dashboard
Track agent model changes, performance, and recommendations in real-time.
Access
# Sync agent data
bun run sync:evolution
# Open dashboard
bun run evolution:dashboard
bun run evolution:open
# or visit http://localhost:3001
Dashboard Tabs
| Tab | Description |
|---|---|
| Overview | Stats, recent changes, pending recommendations |
| All Agents | Filterable agent cards with history |
| Timeline | Full evolution history |
| Recommendations | Priority-based model suggestions |
| Model Matrix | Agent × Model mapping with fit scores |
Data Sources
| Source | What it tracks |
|---|---|
.kilo/agents/*.md |
Model, description, capabilities |
.kilo/kilo.jsonc |
Model assignments |
.kilo/capability-index.yaml |
Capability routing |
| Git History | Model and prompt changes |
| Gitea Comments | Performance scores |
Evolution Data Structure
{
"agents": {
"lead-developer": {
"current": { "model": "qwen3-coder:480b", "fit_score": 92 },
"history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }],
"performance_log": [{ "issue": 42, "score": 8, "success": true }]
}
}
}
Recommendations Priority
| Priority | When | Example |
|---|---|---|
| Critical | Fit score < 70 | Immediate model change required |
| High | Model unavailable | Switch to fallback |
| Medium | Better model available | Consider upgrade |
| Low | Optimization possible | Optional improvement |
Code Style
- Use TypeScript for new files
- Follow existing patterns
- Write tests before code (TDD)
- Keep functions under 50 lines
- Use early returns
- No comments unless explicitly requested