The regex r'[:/]([^/]+/[^/]+?)(?:\.git)?$' fails on URLs with trailing slashes like 'https://git.softuniq.eu/UniqueSoft/APAW/' because the final '/' breaks the pattern. Added .rstrip('/') in Python and sed 's:/*' in Bash to all get_target_repo() implementations across 11 files.
16 KiB
Kilo Code Agents Reference
This file configures AI agent behavior for the APAW project - a self-improving code pipeline with Gitea logging.
Pipeline Workflow
The main workflow is /pipeline - use it to process issues through all agents automatically.
User: /pipeline 42
Agent: Runs full pipeline for issue #42 with Gitea logging
Commands (Slash Commands)
| Command | Description | Usage |
|---|---|---|
/pipeline <issue> |
Run full agent pipeline for issue | /pipeline 42 |
/nextjs |
Next.js 14+ full-stack app pipeline | /nextjs my-app |
/vue |
Vue/Nuxt 3 full-stack app pipeline | /vue my-app |
/laravel |
Laravel full-stack app pipeline | /laravel my-app |
/wordpress |
WordPress plugin/site pipeline | /wordpress my-plugin |
/feature |
Feature development pipeline | /feature |
/commerce |
E-commerce site pipeline | /commerce |
/status <issue> |
Check pipeline status for issue | /status 42 |
/evolve |
Run evolution cycle with fitness scoring | /evolve --issue 42 |
/evaluate <issue> |
Generate performance report | /evaluate 42 |
/plan |
Creates detailed task plans | /plan feature X |
/ask |
Answers codebase questions | /ask how does auth work |
/debug |
Analyzes and fixes bugs | /debug error in login |
/code |
Quick code generation | /code add validation |
/research [topic] |
Run research and self-improvement | /research multi-agent |
/evolution log |
Log agent model change | /evolution log planner "reason" |
/evolution report |
Generate evolution report | /evolution report |
/web-test <url> |
Visual regression testing in Docker | /web-test https://bbox.wtf |
/e2e-test <url> |
E2E browser automation tests | /e2e-test https://my-app.com |
Pipeline Agents (Subagents)
These agents are invoked automatically by /pipeline or manually via @mention:
Core Development
| Agent | Role | Model | Variant | Can Call |
|---|---|---|---|---|
@requirement-refiner |
Converts ideas to User Stories | glm-5.1 | thinking | history-miner, system-analyst |
@history-miner |
Finds duplicates in git | nemotron-3-super | — | (read-only) |
@system-analyst |
Designs specifications | glm-5.1 | thinking | sdet-engineer, orchestrator |
@sdet-engineer |
Writes tests (TDD) | qwen3-coder:480b | thinking | lead-developer, orchestrator |
@lead-developer |
Implements code | qwen3-coder:480b | thinking | code-skeptic, orchestrator |
@frontend-developer |
UI (Next.js, Vue/Nuxt, React) | qwen3-coder:480b | — | code-skeptic, visual-tester, orchestrator |
@backend-developer |
Node.js/Express/APIs | qwen3-coder:480b | — | code-skeptic, orchestrator |
@php-developer |
PHP/Laravel/Symfony/WordPress | qwen3-coder:480b | thinking | code-skeptic, security-auditor, orchestrator |
@python-developer |
Python/Django/FastAPI | qwen3-coder:480b | thinking | code-skeptic, security-auditor, orchestrator |
@go-developer |
Go backend services | qwen3-coder:480b | — | code-skeptic, orchestrator |
@flutter-developer |
Flutter mobile apps | qwen3-coder:480b | — | code-skeptic, orchestrator |
Quality Assurance
| Agent | Role | Model | Variant | Can Call |
|---|---|---|---|---|
@code-skeptic |
Adversarial review | minimax-m2.5 | — | the-fixer, performance-engineer, orchestrator |
@the-fixer |
Fixes issues | minimax-m2.5 | — | code-skeptic, orchestrator |
@performance-engineer |
Performance review | nemotron-3-super | — | the-fixer, security-auditor, orchestrator |
@security-auditor |
Security audit | nemotron-3-super | — | the-fixer, release-manager, orchestrator |
@visual-tester |
Visual regression + bbox extraction + console/network errors | qwen3-coder:480b | — | the-fixer, orchestrator |
@browser-automation |
E2E testing | qwen3-coder:480b | — | orchestrator |
DevOps & Infrastructure
| Agent | Role | Model | Variant | Can Call |
|---|---|---|---|---|
@devops-engineer |
Docker/K8s/CI-CD | nemotron-3-super | — | code-skeptic, security-auditor, orchestrator |
@release-manager |
Git operations, releases | glm-5.1 | — | evaluator |
Meta & Process
| Agent | Role | Model | Variant | Can Call |
|---|---|---|---|---|
@evaluator |
Scores effectiveness | glm-5.1 | thinking | prompt-optimizer, product-owner, orchestrator |
@pipeline-judge |
Objective fitness scoring | glm-5.1 | — | prompt-optimizer |
@prompt-optimizer |
Improves prompts | glm-5.1 | instant | (edits files) |
@product-owner |
Manages issues/tracking | glm-5.1 | — | (read-only) |
Analysis & Design
| Agent | Role | Model | Variant | Can Call |
|---|---|---|---|---|
@capability-analyst |
Analyzes task coverage | glm-5.1 | — | agent-architect, orchestrator |
@agent-architect |
Creates new agents | glm-5.1 | thinking | capability-analyst, requirement-refiner, system-analyst |
@workflow-architect |
Creates workflows | glm-5.1 | thinking | (edits files) |
@markdown-validator |
Validates Markdown | nemotron-3-nano:30b | — | orchestrator |
Cognitive Enhancement
| Agent | Role | Model | Variant | Can Call |
|---|---|---|---|---|
@planner |
Task decomposition | nemotron-3-super | — | (read-only) |
@reflector |
Self-reflection | nemotron-3-super | — | (read-only) |
@memory-manager |
Memory systems | nemotron-3-super | — | (read-only) |
Workflow State Machine
[new]
↓ @requirement-refiner
[planned]
↓ @capability-analyst → (gaps?) → @agent-architect → create new agents
↓ @history-miner
[researching]
↓ @system-analyst
[designed]
↓ @sdet-engineer (writes failing tests)
[testing]
↓ @lead-developer (makes tests pass)
[implementing]
↓ @code-skeptic (review)
[reviewing] ──[fail]──→ [fixing] ──→ [reviewing]
↓ @review-watcher → (auto-validate) → create fix tasks
↓ [pass]
[perf-check]
↓ @performance-engineer
[security-check]
↓ @security-auditor
[releasing]
↓ @release-manager
[evaluated]
↓ @evaluator (subjective score 1-10)
├── [score ≥ 7] → [@pipeline-judge] → fitness scoring
└── [score < 7] → @prompt-optimizer → [@evaluated]
↓
[@pipeline-judge] ← runs tests, measures tokens/time
↓
fitness score
↓
┌──────────────────────────────────────┐
│ fitness >= 0.85 │──→ [completed]
│ fitness 0.70-0.84 │──→ @prompt-optimizer → [evolving]
│ fitness < 0.70 │──→ @prompt-optimizer (major) → [evolving]
│ fitness < 0.50 │──→ @agent-architect → redesign
└──────────────────────────────────────┘
↓
[evolving] → re-run workflow → [@pipeline-judge]
↓
compare fitness_before vs fitness_after
↓
[improved?] → commit prompts → [completed]
└─ [not improved?] → revert → try different strategy
Capability Analysis Flow
When starting a complex task:
[User Request]
↓
[@capability-analyst] ← Analyzes requirements vs existing capabilities
↓
[Gap Analysis] ← Identifies missing agents, workflows, skills
↓
[Recommendations] → Create new or enhance existing?
↓
[Decision]
├── [Create New] → [@agent-architect] → Create component → Review
└── [Enhance] → [@lead-developer] → Modify existing
↓
[Integration] ← Verify new component works with system
↓
[Complete] ← Task can now be handled
Gitea Integration
Status Labels
Pipeline uses Gitea labels to track progress:
status: new→status: planned→status: researching→ ...- Agents add/remove labels automatically
Performance Logging
Each agent logs to Gitea issue comments:
## ✅ lead-developer completed
**Score**: 8/10
**Duration**: 1.2h
**Files**: src/auth.ts, src/user.ts
### Notes
- Clean implementation
- Follows existing patterns
- Tests passing
Efficiency Tracking
Scores saved to .kilo/logs/efficiency_score.json:
{
"version": "1.0",
"history": [
{
"issue": 42,
"date": "2024-01-02T10:00:00Z",
"agents": {
"lead-developer": 8,
"code-skeptic": 7,
"the-fixer": 9
},
"iterations": 2,
"duration_hours": 1.5
}
]
}
Fitness Tracking
Fitness scores saved to .kilo/logs/fitness-history.jsonl:
{"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47}
{"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47}
Manual Agent Invocation
// Use Task tool to invoke subagent
Task tool with:
subagent_type: "lead-developer"
prompt: "Implement authentication for issue #42"
Or via @mention:
@lead-developer implement authentication flow
Environment Variables
Gitea integration uses centralized authentication (see .kilo/shared/gitea-auth.md and .kilo/gitea.jsonc):
| Variable | Required | Description |
|---|---|---|
GITEA_API_URL |
No | API base URL (default: https://git.softuniq.eu/api/v1) |
GITEA_TOKEN |
Preferred | Pre-existing API token |
GITEA_USER |
Fallback | Username for Basic Auth token creation |
GITEA_PASS |
Fallback | Password for Basic Auth token creation |
GITEA_TARGET_REPO |
No | Override target project (auto-detected otherwise) |
Auth resolution: GITEA_TOKEN → GITEA_USER+GITEA_PASS → ValueError. NEVER hardcode credentials.
Self-Improvement Cycle
- Pipeline runs for each issue
- Evaluator scores each agent (1-10) - subjective
- Pipeline Judge measures fitness objectively (0.0-1.0)
- Low fitness (<0.70) triggers prompt-optimizer
- Prompt optimizer analyzes failures and improves prompts
- Re-run workflow with improved prompts
- Compare fitness before/after - commit if improved
- Log results to
.kilo/logs/fitness-history.jsonl
Evaluator vs Pipeline Judge
| Aspect | Evaluator | Pipeline Judge |
|---|---|---|
| Type | Subjective | Objective |
| Score | 1-10 (opinion) | 0.0-1.0 (metrics) |
| Metrics | Observations | Tests, tokens, time |
| Trigger | After workflow | After evaluator |
| Action | Logs to Gitea | Triggers optimization |
Fitness Score Components
fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25)
where:
test_pass_rate = passed_tests / total_tests
quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage)
efficiency_score = 1.0 - clamp(normalized_cost, 0, 1)
Architecture Files
| File | Purpose |
|---|---|
AGENTS.md |
This file - main config |
.kilo/agents/*.md |
Agent definitions with prompts |
.kilo/commands/*.md |
Workflow commands |
.kilo/rules/*.md |
Custom rules loaded globally |
.kilo/skills/ |
Skill modules |
.kilo/shared/gitea-auth.md |
Centralized Gitea auth (env vars, no hardcoded creds) |
.kilo/gitea.jsonc |
Gitea auth structure (env var mapping) |
.kilo/shared/gitea-api.md |
Centralized Gitea API client |
.kilo/shared/gitea-commenting.md |
Comment format for Gitea |
.kilo/shared/self-evolution.md |
Self-evolution protocol |
src/kilocode/ |
TypeScript API for programmatic use |
Using the TypeScript API
import {
PipelineRunner,
GiteaClient,
decideRouting
} from './src/kilocode/index.js'
const runner = await createPipelineRunner({
giteaToken: process.env.GITEA_TOKEN
})
await runner.run({ issueNumber: 42 })
Agent Evolution Dashboard
Track agent model changes, performance, and recommendations in real-time.
Access
# Sync agent data
bun run sync:evolution
# Open dashboard
bun run evolution:dashboard
bun run evolution:open
# or visit http://localhost:3001
Dashboard Tabs
| Tab | Description |
|---|---|
| Overview | Stats, recent changes, pending recommendations |
| All Agents | Filterable agent cards with history |
| Timeline | Full evolution history |
| Recommendations | Priority-based model suggestions |
| Model Matrix | Agent × Model mapping with fit scores |
Data Sources
| Source | What it tracks |
|---|---|
.kilo/agents/*.md |
Model, description, capabilities |
.kilo/kilo.jsonc |
Model assignments |
.kilo/capability-index.yaml |
Capability routing |
| Git History | Model and prompt changes |
| Gitea Comments | Performance scores |
Evolution Data Structure
{
"agents": {
"lead-developer": {
"current": { "model": "qwen3-coder:480b", "fit_score": 92 },
"history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }],
"performance_log": [{ "issue": 42, "score": 8, "success": true }]
}
}
}
Recommendations Priority
| Priority | When | Example |
|---|---|---|
| Critical | Fit score < 70 | Immediate model change required |
| High | Model unavailable | Switch to fallback |
| Medium | Better model available | Consider upgrade |
| Low | Optimization possible | Optional improvement |
Agent Execution Monitoring
Every agent invocation is logged to .kilo/logs/agent-executions.jsonl for project-level monitoring.
Log Format
{"ts":"2026-04-18T14:00:00Z","agent":"php-developer","issue":42,"project":"UniqueSoft/my-shop","task":"Create Product model","subtask_type":"model_creation","duration_ms":45000,"tokens_used":8500,"status":"success","files":["app/Models/Product.php"],"score":8,"next_agent":"code-skeptic"}
Monitoring Commands
# Agent stats report
bun run scripts/agent-stats.ts
# Stats for last 7 days
bun run scripts/agent-stats.ts --last 7
# Stats for specific project
bun run scripts/agent-stats.ts --project UniqueSoft/my-shop
Required Logging Fields
| Field | Description |
|---|---|
agent |
Agent name |
issue |
Gitea issue number |
project |
Target project repo (NOT hardcoded APAW) |
task |
Atomic task description |
duration_ms |
Execution time |
tokens_used |
Token estimate |
status |
success/fail/pass/blocked |
Critical Rules
Target Project (NOT APAW)
Issues MUST be created in the target project repository, NOT in APAW. APAW is the agent framework, not the default project.
# Auto-detect from git remote
TARGET_REPO=$(git remote get-url origin | sed 's:/*$::' | sed -E 's|.*[:/]([^/]+/[^/]+?)(\.git)?$|\1|')
Atomic Tasks (1 action = 1 task)
Every agent invocation solves exactly ONE atomic task:
- ❌ "Implement the entire e-commerce backend"
- ✅ "Create Product model with migration"
- ✅ "Add POST /api/products endpoint"
Modular Code
- Maximum 100 lines per file
- Maximum 30 lines per function
- Features organized as independent modules
- Cross-module communication via events/interfaces only
Token Budgets
| Task Size | Max Tokens | Example |
|---|---|---|
| Tiny | 2,000 | Fix typo, add config |
| Small | 5,000 | Create model + migration |
| Medium | 10,000 | Create API endpoint + test |
| Large | 20,000 | Create service with 3 methods |
Code Style
- Use TypeScript for new files
- Follow existing patterns
- Write tests before code (TDD)
- Keep functions under 50 lines
- Use early returns
- No comments unless explicitly requested