Files

¨NW¨ b46a1a20a8 feat: add PHP development stack, atomic tasks, modular code rules, agent monitoring, fix target project detection

7 evolutionary tasks implemented:

1. PHP web development: php-developer agent + 6 skills (Laravel, Symfony, WordPress, security, testing, modular architecture) + 2 pipeline commands (/laravel, /wordpress)

2. Atomic task decomposition: 1 action = 1 task rule, task sizing guide, decomposition protocol for orchestrator, token budgets per complexity

3. Modular code rules: max 100 lines/file, max 30 lines/function, service/repository patterns, cross-module communication via events only

4. Gitea-centric workflow: mandatory issue creation before work, research with links, progress checkboxes, screenshots on test, git history as knowledge base

5. Fix: target project auto-detection — removed all hardcoded UniqueSoft/APAW from API calls, added get_target_repo() via git remote, GITEA_TARGET_REPO env override

6. Agent execution monitoring: agent-executions.jsonl logging, agent-stats.ts statistics script, required fields per invocation, Gitea comment includes duration/tokens

7. Token optimization: 1 action = 1 task principle, token budgets by task type, routing matrix, no scope creep, skip unnecessary pipeline steps

2026-04-18 23:43:04 +01:00

14 KiB

Raw Blame History

Kilo Code Agents Reference

This file configures AI agent behavior for the APAW project - a self-improving code pipeline with Gitea logging.

Pipeline Workflow

The main workflow is /pipeline - use it to process issues through all agents automatically.

User: /pipeline 42
Agent: Runs full pipeline for issue #42 with Gitea logging

Commands (Slash Commands)

Command	Description	Usage
`/pipeline <issue>`	Run full agent pipeline for issue	`/pipeline 42`
`/status <issue>`	Check pipeline status for issue	`/status 42`
`/evolve`	Run evolution cycle with fitness scoring	`/evolve --issue 42`
`/evaluate <issue>`	Generate performance report	`/evaluate 42`
`/plan`	Creates detailed task plans	`/plan feature X`
`/ask`	Answers codebase questions	`/ask how does auth work`
`/debug`	Analyzes and fixes bugs	`/debug error in login`
`/code`	Quick code generation	`/code add validation`
`/research [topic]`	Run research and self-improvement	`/research multi-agent`
`/evolution log`	Log agent model change	`/evolution log planner "reason"`
`/evolution report`	Generate evolution report	`/evolution report`
`/web-test <url>`	Visual regression testing in Docker	`/web-test https://bbox.wtf`
`/e2e-test <url>`	E2E browser automation tests	`/e2e-test https://my-app.com`

Pipeline Agents (Subagents)

These agents are invoked automatically by /pipeline or manually via @mention:

Core Development

Agent	Role	Model	Variant	Can Call
`@requirement-refiner`	Converts ideas to User Stories	glm-5.1	thinking	history-miner, system-analyst
`@history-miner`	Finds duplicates in git	nemotron-3-super	—	(read-only)
`@system-analyst`	Designs specifications	glm-5.1	thinking	sdet-engineer, orchestrator
`@sdet-engineer`	Writes tests (TDD)	qwen3-coder:480b	thinking	lead-developer, orchestrator
`@lead-developer`	Implements code	qwen3-coder:480b	thinking	code-skeptic, orchestrator
`@frontend-developer`	UI implementation	qwen3-coder:480b	—	code-skeptic, orchestrator
`@backend-developer`	Node.js/Express/APIs	qwen3-coder:480b	—	code-skeptic, orchestrator
`@php-developer`	PHP/Laravel/Symfony/WordPress	qwen3-coder:480b	thinking	code-skeptic, security-auditor, orchestrator
`@go-developer`	Go backend services	qwen3-coder:480b	—	code-skeptic, orchestrator
`@flutter-developer`	Flutter mobile apps	qwen3-coder:480b	—	code-skeptic, orchestrator

Quality Assurance

Agent	Role	Model	Variant	Can Call
`@code-skeptic`	Adversarial review	minimax-m2.5	—	the-fixer, performance-engineer, orchestrator
`@the-fixer`	Fixes issues	minimax-m2.5	—	code-skeptic, orchestrator
`@performance-engineer`	Performance review	nemotron-3-super	—	the-fixer, security-auditor, orchestrator
`@security-auditor`	Security audit	nemotron-3-super	—	the-fixer, release-manager, orchestrator
`@visual-tester`	Visual regression + bbox extraction + console/network errors	qwen3-coder:480b	—	the-fixer, orchestrator
`@browser-automation`	E2E testing	qwen3-coder:480b	—	orchestrator

DevOps & Infrastructure

Agent	Role	Model	Variant	Can Call
`@devops-engineer`	Docker/K8s/CI-CD	nemotron-3-super	—	code-skeptic, security-auditor, orchestrator
`@release-manager`	Git operations, releases	glm-5.1	—	evaluator

Meta & Process

Agent	Role	Model	Variant	Can Call
`@evaluator`	Scores effectiveness	glm-5.1	thinking	prompt-optimizer, product-owner, orchestrator
`@pipeline-judge`	Objective fitness scoring	glm-5.1	—	prompt-optimizer
`@prompt-optimizer`	Improves prompts	glm-5.1	instant	(edits files)
`@product-owner`	Manages issues/tracking	glm-5.1	—	(read-only)

Analysis & Design

Agent	Role	Model	Variant	Can Call
`@capability-analyst`	Analyzes task coverage	glm-5.1	—	agent-architect, orchestrator
`@agent-architect`	Creates new agents	glm-5.1	thinking	capability-analyst, requirement-refiner, system-analyst
`@workflow-architect`	Creates workflows	glm-5.1	thinking	(edits files)
`@markdown-validator`	Validates Markdown	nemotron-3-nano:30b	—	orchestrator

Cognitive Enhancement

Agent	Role	Model	Variant	Can Call
`@planner`	Task decomposition	nemotron-3-super	—	(read-only)
`@reflector`	Self-reflection	nemotron-3-super	—	(read-only)
`@memory-manager`	Memory systems	nemotron-3-super	—	(read-only)

Workflow State Machine

[new] 
  ↓ @requirement-refiner
[planned] 
  ↓ @capability-analyst → (gaps?) → @agent-architect → create new agents
  ↓ @history-miner
[researching] 
  ↓ @system-analyst
[designed] 
  ↓ @sdet-engineer (writes failing tests)
[testing] 
  ↓ @lead-developer (makes tests pass)
[implementing] 
  ↓ @code-skeptic (review)
[reviewing] ──[fail]──→ [fixing] ──→ [reviewing]
  ↓ @review-watcher → (auto-validate) → create fix tasks
  ↓ [pass]
[perf-check] 
  ↓ @performance-engineer
[security-check] 
  ↓ @security-auditor
[releasing] 
  ↓ @release-manager
[evaluated] 
  ↓ @evaluator (subjective score 1-10)
  ├── [score ≥ 7] → [@pipeline-judge] → fitness scoring
  └── [score < 7] → @prompt-optimizer → [@evaluated]
        ↓
    [@pipeline-judge] ← runs tests, measures tokens/time
        ↓
    fitness score
        ↓
┌──────────────────────────────────────┐
│ fitness >= 0.85                      │──→ [completed]
│ fitness 0.70-0.84                    │──→ @prompt-optimizer → [evolving]
│ fitness < 0.70                      │──→ @prompt-optimizer (major) → [evolving]
│ fitness < 0.50                      │──→ @agent-architect → redesign
└──────────────────────────────────────┘
        ↓
[evolving] → re-run workflow → [@pipeline-judge]
        ↓
    compare fitness_before vs fitness_after
        ↓
    [improved?] → commit prompts → [completed]
              └─ [not improved?] → revert → try different strategy

Capability Analysis Flow

When starting a complex task:

[User Request]
      ↓
[@capability-analyst] ← Analyzes requirements vs existing capabilities
      ↓
[Gap Analysis] ← Identifies missing agents, workflows, skills
      ↓
[Recommendations] → Create new or enhance existing?
      ↓
[Decision]
  ├── [Create New] → [@agent-architect] → Create component → Review
  └── [Enhance] → [@lead-developer] → Modify existing
      ↓
[Integration] ← Verify new component works with system
      ↓
[Complete] ← Task can now be handled

Gitea Integration

Status Labels

Pipeline uses Gitea labels to track progress:

status: new → status: planned → status: researching → ...
Agents add/remove labels automatically

Performance Logging

Each agent logs to Gitea issue comments:

## ✅ lead-developer completed

**Score**: 8/10
**Duration**: 1.2h
**Files**: src/auth.ts, src/user.ts

### Notes
- Clean implementation
- Follows existing patterns
- Tests passing

Efficiency Tracking

Scores saved to .kilo/logs/efficiency_score.json:

{
  "version": "1.0",
  "history": [
    {
      "issue": 42,
      "date": "2024-01-02T10:00:00Z",
      "agents": {
        "lead-developer": 8,
        "code-skeptic": 7,
        "the-fixer": 9
      },
      "iterations": 2,
      "duration_hours": 1.5
    }
  ]
}

Fitness Tracking

Fitness scores saved to .kilo/logs/fitness-history.jsonl:

{"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47}
{"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47}

Manual Agent Invocation

// Use Task tool to invoke subagent
Task tool with:
  subagent_type: "lead-developer"
  prompt: "Implement authentication for issue #42"

Or via @mention:

@lead-developer implement authentication flow

Environment Variables

Required for Gitea integration:

GITEA_API_URL=https://git.softuniq.eu/api/v1
GITEA_TOKEN=your-token-here

Self-Improvement Cycle

Pipeline runs for each issue
Evaluator scores each agent (1-10) - subjective
Pipeline Judge measures fitness objectively (0.0-1.0)
Low fitness (<0.70) triggers prompt-optimizer
Prompt optimizer analyzes failures and improves prompts
Re-run workflow with improved prompts
Compare fitness before/after - commit if improved
Log results to .kilo/logs/fitness-history.jsonl

Evaluator vs Pipeline Judge

Aspect	Evaluator	Pipeline Judge
Type	Subjective	Objective
Score	1-10 (opinion)	0.0-1.0 (metrics)
Metrics	Observations	Tests, tokens, time
Trigger	After workflow	After evaluator
Action	Logs to Gitea	Triggers optimization

Fitness Score Components

fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25)

where:
  test_pass_rate = passed_tests / total_tests
  quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage)
  efficiency_score = 1.0 - clamp(normalized_cost, 0, 1)

Architecture Files

File	Purpose
`AGENTS.md`	This file - main config
`.kilo/agents/*.md`	Agent definitions with prompts
`.kilo/commands/*.md`	Workflow commands
`.kilo/rules/*.md`	Custom rules loaded globally
`.kilo/skills/`	Skill modules
`src/kilocode/`	TypeScript API for programmatic use

Using the TypeScript API

import { 
  PipelineRunner, 
  GiteaClient, 
  decideRouting 
} from './src/kilocode/index.js'

const runner = await createPipelineRunner({
  giteaToken: process.env.GITEA_TOKEN
})

await runner.run({ issueNumber: 42 })

Agent Evolution Dashboard

Track agent model changes, performance, and recommendations in real-time.

Access

# Sync agent data
bun run sync:evolution

# Open dashboard
bun run evolution:dashboard
bun run evolution:open
# or visit http://localhost:3001

Dashboard Tabs

Tab	Description
Overview	Stats, recent changes, pending recommendations
All Agents	Filterable agent cards with history
Timeline	Full evolution history
Recommendations	Priority-based model suggestions
Model Matrix	Agent × Model mapping with fit scores

Data Sources

Source	What it tracks
`.kilo/agents/*.md`	Model, description, capabilities
`.kilo/kilo.jsonc`	Model assignments
`.kilo/capability-index.yaml`	Capability routing
Git History	Model and prompt changes
Gitea Comments	Performance scores

Evolution Data Structure

{
  "agents": {
    "lead-developer": {
      "current": { "model": "qwen3-coder:480b", "fit_score": 92 },
      "history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }],
      "performance_log": [{ "issue": 42, "score": 8, "success": true }]
    }
  }
}

Recommendations Priority

Priority	When	Example
Critical	Fit score < 70	Immediate model change required
High	Model unavailable	Switch to fallback
Medium	Better model available	Consider upgrade
Low	Optimization possible	Optional improvement

Agent Execution Monitoring

Every agent invocation is logged to .kilo/logs/agent-executions.jsonl for project-level monitoring.

Log Format

{"ts":"2026-04-18T14:00:00Z","agent":"php-developer","issue":42,"project":"UniqueSoft/my-shop","task":"Create Product model","subtask_type":"model_creation","duration_ms":45000,"tokens_used":8500,"status":"success","files":["app/Models/Product.php"],"score":8,"next_agent":"code-skeptic"}

Monitoring Commands

# Agent stats report
bun run scripts/agent-stats.ts

# Stats for last 7 days
bun run scripts/agent-stats.ts --last 7

# Stats for specific project
bun run scripts/agent-stats.ts --project UniqueSoft/my-shop

Required Logging Fields

Field	Description
`agent`	Agent name
`issue`	Gitea issue number
`project`	Target project repo (NOT hardcoded APAW)
`task`	Atomic task description
`duration_ms`	Execution time
`tokens_used`	Token estimate
`status`	success/fail/pass/blocked

Critical Rules

Target Project (NOT APAW)

Issues MUST be created in the target project repository, NOT in APAW. APAW is the agent framework, not the default project.

# Auto-detect from git remote
TARGET_REPO=$(git remote get-url origin | sed -E 's|.*[:/]([^/]+/[^/]+?)(\.git)?$|\1|')

Atomic Tasks (1 action = 1 task)

Every agent invocation solves exactly ONE atomic task:

❌ "Implement the entire e-commerce backend"
✅ "Create Product model with migration"
✅ "Add POST /api/products endpoint"

Modular Code

Maximum 100 lines per file
Maximum 30 lines per function
Features organized as independent modules
Cross-module communication via events/interfaces only

Token Budgets

Task Size	Max Tokens	Example
Tiny	2,000	Fix typo, add config
Small	5,000	Create model + migration
Medium	10,000	Create API endpoint + test
Large	20,000	Create service with 3 methods

Code Style

Use TypeScript for new files
Follow existing patterns
Write tests before code (TDD)
Keep functions under 50 lines
Use early returns
No comments unless explicitly requested

14 KiB Raw Blame History Unescape Escape