Files
APAW/AGENTS.md
¨NW¨ 9d85dd9f83 merge: dev into main — centralized auth + trailing-slash fix + all recent features
- Security: extricate hardcoded Gitea credentials, add centralized auth module
- Fix: get_target_repo() regex now handles trailing slashes (.rstrip('/') in Python, sed 's:/*' in Bash)
- Fix: task-analysis broken functions (orphaned req references, stray parentheses)
- Documentation: README.md, STRUCTURE.md, AGENTS.md updated with auth section
- Evolution: Entry #5 documenting credentials extrication
2026-04-19 12:20:38 +01:00

16 KiB
Raw Blame History

Kilo Code Agents Reference

This file configures AI agent behavior for the project - a self-improving code pipeline with Gitea logging.

Pipeline Workflow

The main workflow is /pipeline - use it to process issues through all agents automatically.

User: /pipeline 42
Agent: Runs full pipeline for issue #42 with Gitea logging

Commands (Slash Commands)

Command Description Usage
/pipeline <issue> Run full agent pipeline for issue /pipeline 42
/nextjs Next.js 14+ full-stack app pipeline /nextjs my-app
/vue Vue/Nuxt 3 full-stack app pipeline /vue my-app
/laravel Laravel full-stack app pipeline /laravel my-app
/wordpress WordPress plugin/site pipeline /wordpress my-plugin
/feature Feature development pipeline /feature
/commerce E-commerce site pipeline /commerce
/status <issue> Check pipeline status for issue /status 42
/evolve Run evolution cycle with fitness scoring /evolve --issue 42
/evaluate <issue> Generate performance report /evaluate 42
/plan Creates detailed task plans /plan feature X
/ask Answers codebase questions /ask how does auth work
/debug Analyzes and fixes bugs /debug error in login
/code Quick code generation /code add validation
/research [topic] Run research and self-improvement /research multi-agent
/evolution log Log agent model change /evolution log planner "reason"
/evolution report Generate evolution report /evolution report
/web-test <url> Visual regression testing in Docker /web-test https://bbox.wtf
/e2e-test <url> E2E browser automation tests /e2e-test https://my-app.com

Pipeline Agents (Subagents)

These agents are invoked automatically by /pipeline or manually via @mention:

Core Development

Agent Role Model Variant Can Call
@requirement-refiner Converts ideas to User Stories glm-5.1 thinking history-miner, system-analyst
@history-miner Finds duplicates in git nemotron-3-super (read-only)
@system-analyst Designs specifications glm-5.1 thinking sdet-engineer, orchestrator
@sdet-engineer Writes tests (TDD) qwen3-coder:480b thinking lead-developer, orchestrator
@lead-developer Implements code qwen3-coder:480b thinking code-skeptic, orchestrator
@frontend-developer UI (Next.js, Vue/Nuxt, React) qwen3-coder:480b code-skeptic, visual-tester, orchestrator
@backend-developer Node.js/Express/APIs qwen3-coder:480b code-skeptic, orchestrator
@php-developer PHP/Laravel/Symfony/WordPress qwen3-coder:480b thinking code-skeptic, security-auditor, orchestrator
@python-developer Python/Django/FastAPI qwen3-coder:480b thinking code-skeptic, security-auditor, orchestrator
@go-developer Go backend services qwen3-coder:480b code-skeptic, orchestrator
@flutter-developer Flutter mobile apps qwen3-coder:480b code-skeptic, orchestrator

Quality Assurance

Agent Role Model Variant Can Call
@code-skeptic Adversarial review minimax-m2.5 the-fixer, performance-engineer, orchestrator
@the-fixer Fixes issues minimax-m2.5 code-skeptic, orchestrator
@performance-engineer Performance review nemotron-3-super the-fixer, security-auditor, orchestrator
@security-auditor Security audit nemotron-3-super the-fixer, release-manager, orchestrator
@visual-tester Visual regression + bbox extraction + console/network errors qwen3-coder:480b the-fixer, orchestrator
@browser-automation E2E testing qwen3-coder:480b orchestrator

DevOps & Infrastructure

Agent Role Model Variant Can Call
@devops-engineer Docker/K8s/CI-CD nemotron-3-super code-skeptic, security-auditor, orchestrator
@release-manager Git operations, releases glm-5.1 evaluator

Meta & Process

Agent Role Model Variant Can Call
@evaluator Scores effectiveness glm-5.1 thinking prompt-optimizer, product-owner, orchestrator
@pipeline-judge Objective fitness scoring glm-5.1 prompt-optimizer
@prompt-optimizer Improves prompts glm-5.1 instant (edits files)
@product-owner Manages issues/tracking glm-5.1 (read-only)

Analysis & Design

Agent Role Model Variant Can Call
@capability-analyst Analyzes task coverage glm-5.1 agent-architect, orchestrator
@agent-architect Creates new agents glm-5.1 thinking capability-analyst, requirement-refiner, system-analyst
@workflow-architect Creates workflows glm-5.1 thinking (edits files)
@markdown-validator Validates Markdown nemotron-3-nano:30b orchestrator

Cognitive Enhancement

Agent Role Model Variant Can Call
@planner Task decomposition nemotron-3-super (read-only)
@reflector Self-reflection nemotron-3-super (read-only)
@memory-manager Memory systems nemotron-3-super (read-only)

Workflow State Machine

[new] 
  ↓ @requirement-refiner
[planned] 
  ↓ @capability-analyst → (gaps?) → @agent-architect → create new agents
  ↓ @history-miner
[researching] 
  ↓ @system-analyst
[designed] 
  ↓ @sdet-engineer (writes failing tests)
[testing] 
  ↓ @lead-developer (makes tests pass)
[implementing] 
  ↓ @code-skeptic (review)
[reviewing] ──[fail]──→ [fixing] ──→ [reviewing]
  ↓ @review-watcher → (auto-validate) → create fix tasks
  ↓ [pass]
[perf-check] 
  ↓ @performance-engineer
[security-check] 
  ↓ @security-auditor
[releasing] 
  ↓ @release-manager
[evaluated] 
  ↓ @evaluator (subjective score 1-10)
  ├── [score ≥ 7] → [@pipeline-judge] → fitness scoring
  └── [score < 7] → @prompt-optimizer → [@evaluated]
        ↓
    [@pipeline-judge] ← runs tests, measures tokens/time
        ↓
    fitness score
        ↓
┌──────────────────────────────────────┐
│ fitness >= 0.85                      │──→ [completed]
│ fitness 0.70-0.84                    │──→ @prompt-optimizer → [evolving]
│ fitness < 0.70                      │──→ @prompt-optimizer (major) → [evolving]
│ fitness < 0.50                      │──→ @agent-architect → redesign
└──────────────────────────────────────┘
        ↓
[evolving] → re-run workflow → [@pipeline-judge]
        ↓
    compare fitness_before vs fitness_after
        ↓
    [improved?] → commit prompts → [completed]
              └─ [not improved?] → revert → try different strategy

Capability Analysis Flow

When starting a complex task:

[User Request]
      ↓
[@capability-analyst] ← Analyzes requirements vs existing capabilities
      ↓
[Gap Analysis] ← Identifies missing agents, workflows, skills
      ↓
[Recommendations] → Create new or enhance existing?
      ↓
[Decision]
  ├── [Create New] → [@agent-architect] → Create component → Review
  └── [Enhance] → [@lead-developer] → Modify existing
      ↓
[Integration] ← Verify new component works with system
      ↓
[Complete] ← Task can now be handled

Gitea Integration

Status Labels

Pipeline uses Gitea labels to track progress:

  • status: newstatus: plannedstatus: researching → ...
  • Agents add/remove labels automatically

Performance Logging

Each agent logs to Gitea issue comments:

## ✅ lead-developer completed

**Score**: 8/10
**Duration**: 1.2h
**Files**: src/auth.ts, src/user.ts

### Notes
- Clean implementation
- Follows existing patterns
- Tests passing

Efficiency Tracking

Scores saved to .kilo/logs/efficiency_score.json:

{
  "version": "1.0",
  "history": [
    {
      "issue": 42,
      "date": "2024-01-02T10:00:00Z",
      "agents": {
        "lead-developer": 8,
        "code-skeptic": 7,
        "the-fixer": 9
      },
      "iterations": 2,
      "duration_hours": 1.5
    }
  ]
}

Fitness Tracking

Fitness scores saved to .kilo/logs/fitness-history.jsonl:

{"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47}
{"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47}

Manual Agent Invocation

// Use Task tool to invoke subagent
Task tool with:
  subagent_type: "lead-developer"
  prompt: "Implement authentication for issue #42"

Or via @mention:

@lead-developer implement authentication flow

Environment Variables

Gitea integration uses centralized authentication (see .kilo/shared/gitea-auth.md and .kilo/gitea.jsonc):

Variable Required Description
GITEA_API_URL No API base URL (default: https://git.softuniq.eu/api/v1)
GITEA_TOKEN Preferred Pre-existing API token
GITEA_USER Fallback Username for Basic Auth token creation
GITEA_PASS Fallback Password for Basic Auth token creation
GITEA_TARGET_REPO No Override target project (auto-detected otherwise)

Auth resolution: GITEA_TOKENGITEA_USER+GITEA_PASSValueError. NEVER hardcode credentials.

Self-Improvement Cycle

  1. Pipeline runs for each issue
  2. Evaluator scores each agent (1-10) - subjective
  3. Pipeline Judge measures fitness objectively (0.0-1.0)
  4. Low fitness (<0.70) triggers prompt-optimizer
  5. Prompt optimizer analyzes failures and improves prompts
  6. Re-run workflow with improved prompts
  7. Compare fitness before/after - commit if improved
  8. Log results to .kilo/logs/fitness-history.jsonl

Evaluator vs Pipeline Judge

Aspect Evaluator Pipeline Judge
Type Subjective Objective
Score 1-10 (opinion) 0.0-1.0 (metrics)
Metrics Observations Tests, tokens, time
Trigger After workflow After evaluator
Action Logs to Gitea Triggers optimization

Fitness Score Components

fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25)

where:
  test_pass_rate = passed_tests / total_tests
  quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage)
  efficiency_score = 1.0 - clamp(normalized_cost, 0, 1)

Architecture Files

File Purpose
AGENTS.md This file - main config
.kilo/agents/*.md Agent definitions with prompts
.kilo/commands/*.md Workflow commands
.kilo/rules/*.md Custom rules loaded globally
.kilo/skills/ Skill modules
.kilo/shared/gitea-auth.md Centralized Gitea auth (env vars, no hardcoded creds)
.kilo/gitea.jsonc Gitea auth structure (env var mapping)
.kilo/shared/gitea-api.md Centralized Gitea API client
.kilo/shared/gitea-commenting.md Comment format for Gitea
.kilo/shared/self-evolution.md Self-evolution protocol
src/kilocode/ TypeScript API for programmatic use

Using the TypeScript API

import { 
  PipelineRunner, 
  GiteaClient, 
  decideRouting 
} from './src/kilocode/index.js'

const runner = await createPipelineRunner({
  giteaToken: process.env.GITEA_TOKEN
})

await runner.run({ issueNumber: 42 })

Agent Evolution Dashboard

Track agent model changes, performance, and recommendations in real-time.

Access

# Sync agent data
bun run sync:evolution

# Open dashboard
bun run evolution:dashboard
bun run evolution:open
# or visit http://localhost:3001

Dashboard Tabs

Tab Description
Overview Stats, recent changes, pending recommendations
All Agents Filterable agent cards with history
Timeline Full evolution history
Recommendations Priority-based model suggestions
Model Matrix Agent × Model mapping with fit scores

Data Sources

Source What it tracks
.kilo/agents/*.md Model, description, capabilities
.kilo/kilo.jsonc Model assignments
.kilo/capability-index.yaml Capability routing
Git History Model and prompt changes
Gitea Comments Performance scores

Evolution Data Structure

{
  "agents": {
    "lead-developer": {
      "current": { "model": "qwen3-coder:480b", "fit_score": 92 },
      "history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }],
      "performance_log": [{ "issue": 42, "score": 8, "success": true }]
    }
  }
}

Recommendations Priority

Priority When Example
Critical Fit score < 70 Immediate model change required
High Model unavailable Switch to fallback
Medium Better model available Consider upgrade
Low Optimization possible Optional improvement

Agent Execution Monitoring

Every agent invocation is logged to .kilo/logs/agent-executions.jsonl for project-level monitoring.

Log Format

{"ts":"2026-04-18T14:00:00Z","agent":"php-developer","issue":42,"project":"UniqueSoft/my-shop","task":"Create Product model","subtask_type":"model_creation","duration_ms":45000,"tokens_used":8500,"status":"success","files":["app/Models/Product.php"],"score":8,"next_agent":"code-skeptic"}

Monitoring Commands

# Agent stats report
bun run scripts/agent-stats.ts

# Stats for last 7 days
bun run scripts/agent-stats.ts --last 7

# Stats for specific project
bun run scripts/agent-stats.ts --project UniqueSoft/my-shop

Required Logging Fields

Field Description
agent Agent name
issue Gitea issue number
project Target project repo (NOT hardcoded APAW)
task Atomic task description
duration_ms Execution time
tokens_used Token estimate
status success/fail/pass/blocked

Critical Rules

Target Project (NOT APAW)

Issues MUST be created in the target project repository, NOT in APAW. APAW is the agent framework, not the default project.

# Auto-detect from git remote
TARGET_REPO=$(git remote get-url origin | sed 's:/*$::' | sed -E 's|.*[:/]([^/]+/[^/]+?)(\.git)?$|\1|')

Atomic Tasks (1 action = 1 task)

Every agent invocation solves exactly ONE atomic task:

  • "Implement the entire e-commerce backend"
  • "Create Product model with migration"
  • "Add POST /api/products endpoint"

Modular Code

  • Maximum 100 lines per file
  • Maximum 30 lines per function
  • Features organized as independent modules
  • Cross-module communication via events/interfaces only

Token Budgets

Task Size Max Tokens Example
Tiny 2,000 Fix typo, add config
Small 5,000 Create model + migration
Medium 10,000 Create API endpoint + test
Large 20,000 Create service with 3 methods

Code Style

  • Use TypeScript for new files
  • Follow existing patterns
  • Write tests before code (TDD)
  • Keep functions under 50 lines
  • Use early returns
  • No comments unless explicitly requested