APAW/AGENTS.md

# Kilo Code Agents Reference

This file configures AI agent behavior for the APAW project - a self-improving code pipeline with Gitea logging.

## Pipeline Workflow

The main workflow is `/pipeline` - use it to process issues through all agents automatically.

```
User: /pipeline 42
Agent: Runs full pipeline for issue #42 with Gitea logging
```

## Commands (Slash Commands)

| Command | Description | Usage |
|---------|-------------|-------|
| `/pipeline <issue>` | Run full agent pipeline for issue | `/pipeline 42` |
| `/status <issue>` | Check pipeline status for issue | `/status 42` |
| `/evolve` | Run evolution cycle with fitness scoring | `/evolve --issue 42` |
| `/evaluate <issue>` | Generate performance report | `/evaluate 42` |
| `/plan` | Creates detailed task plans | `/plan feature X` |
| `/ask` | Answers codebase questions | `/ask how does auth work` |
| `/debug` | Analyzes and fixes bugs | `/debug error in login` |
| `/code` | Quick code generation | `/code add validation` |
| `/research [topic]` | Run research and self-improvement | `/research multi-agent` |
| `/evolution log` | Log agent model change | `/evolution log planner "reason"` |
| `/evolution report` | Generate evolution report | `/evolution report` |
| `/web-test <url>` | Visual regression testing in Docker | `/web-test https://bbox.wtf` |
| `/e2e-test <url>` | E2E browser automation tests | `/e2e-test https://my-app.com` |

## Pipeline Agents (Subagents)

These agents are invoked automatically by `/pipeline` or manually via `@mention`:

### Core Development
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@requirement-refiner` | Converts ideas to User Stories | glm-5.1 | thinking | history-miner, system-analyst |
| `@history-miner` | Finds duplicates in git | nemotron-3-super | — | *(read-only)* |
| `@system-analyst` | Designs specifications | glm-5.1 | thinking | sdet-engineer, orchestrator |
| `@sdet-engineer` | Writes tests (TDD) | qwen3-coder:480b | thinking | lead-developer, orchestrator |
| `@lead-developer` | Implements code | qwen3-coder:480b | thinking | code-skeptic, orchestrator |
| `@frontend-developer` | UI implementation | qwen3-coder:480b | — | code-skeptic, orchestrator |
| `@backend-developer` | Node.js/Express/APIs | qwen3-coder:480b | — | code-skeptic, orchestrator |
| `@php-developer` | PHP/Laravel/Symfony/WordPress | qwen3-coder:480b | thinking | code-skeptic, security-auditor, orchestrator |
| `@go-developer` | Go backend services | qwen3-coder:480b | — | code-skeptic, orchestrator |
| `@flutter-developer` | Flutter mobile apps | qwen3-coder:480b | — | code-skeptic, orchestrator |

### Quality Assurance
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@code-skeptic` | Adversarial review | minimax-m2.5 | — | the-fixer, performance-engineer, orchestrator |
| `@the-fixer` | Fixes issues | minimax-m2.5 | — | code-skeptic, orchestrator |
| `@performance-engineer` | Performance review | nemotron-3-super | — | the-fixer, security-auditor, orchestrator |
| `@security-auditor` | Security audit | nemotron-3-super | — | the-fixer, release-manager, orchestrator |
| `@visual-tester` | Visual regression + bbox extraction + console/network errors | qwen3-coder:480b | — | the-fixer, orchestrator |
| `@browser-automation` | E2E testing | qwen3-coder:480b | — | orchestrator |

### DevOps & Infrastructure
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@devops-engineer` | Docker/K8s/CI-CD | nemotron-3-super | — | code-skeptic, security-auditor, orchestrator |
| `@release-manager` | Git operations, releases | glm-5.1 | — | evaluator |

### Meta & Process
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@evaluator` | Scores effectiveness | glm-5.1 | thinking | prompt-optimizer, product-owner, orchestrator |
| `@pipeline-judge` | Objective fitness scoring | glm-5.1 | — | prompt-optimizer |
| `@prompt-optimizer` | Improves prompts | glm-5.1 | instant | *(edits files)* |
| `@product-owner` | Manages issues/tracking | glm-5.1 | — | *(read-only)* |

### Analysis & Design
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@capability-analyst` | Analyzes task coverage | glm-5.1 | — | agent-architect, orchestrator |
| `@agent-architect` | Creates new agents | glm-5.1 | thinking | capability-analyst, requirement-refiner, system-analyst |
| `@workflow-architect` | Creates workflows | glm-5.1 | thinking | *(edits files)* |
| `@markdown-validator` | Validates Markdown | nemotron-3-nano:30b | — | orchestrator |

### Cognitive Enhancement
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@planner` | Task decomposition | nemotron-3-super | — | *(read-only)* |
| `@reflector` | Self-reflection | nemotron-3-super | — | *(read-only)* |
| `@memory-manager` | Memory systems | nemotron-3-super | — | *(read-only)* |

## Workflow State Machine

```
[new]
  ↓ @requirement-refiner
[planned]
  ↓ @capability-analyst → (gaps?) → @agent-architect → create new agents
  ↓ @history-miner
[researching]
  ↓ @system-analyst
[designed]
  ↓ @sdet-engineer (writes failing tests)
[testing]
  ↓ @lead-developer (makes tests pass)
[implementing]
  ↓ @code-skeptic (review)
[reviewing] ──[fail]──→ [fixing] ──→ [reviewing]
  ↓ @review-watcher → (auto-validate) → create fix tasks
  ↓ [pass]
[perf-check]
  ↓ @performance-engineer
[security-check]
  ↓ @security-auditor
[releasing]
  ↓ @release-manager
[evaluated]
  ↓ @evaluator (subjective score 1-10)
  ├── [score ≥ 7] → [@pipeline-judge] → fitness scoring
  └── [score < 7] → @prompt-optimizer → [@evaluated]
        ↓
    [@pipeline-judge] ← runs tests, measures tokens/time
        ↓
    fitness score
        ↓
┌──────────────────────────────────────┐
│ fitness >= 0.85                      │──→ [completed]
│ fitness 0.70-0.84                    │──→ @prompt-optimizer → [evolving]
│ fitness < 0.70                      │──→ @prompt-optimizer (major) → [evolving]
│ fitness < 0.50                      │──→ @agent-architect → redesign
└──────────────────────────────────────┘
        ↓
[evolving] → re-run workflow → [@pipeline-judge]
        ↓
    compare fitness_before vs fitness_after
        ↓
    [improved?] → commit prompts → [completed]
              └─ [not improved?] → revert → try different strategy
```

## Capability Analysis Flow

When starting a complex task:

```
[User Request]
      ↓
[@capability-analyst] ← Analyzes requirements vs existing capabilities
      ↓
[Gap Analysis] ← Identifies missing agents, workflows, skills
      ↓
[Recommendations] → Create new or enhance existing?
      ↓
[Decision]
  ├── [Create New] → [@agent-architect] → Create component → Review
  └── [Enhance] → [@lead-developer] → Modify existing
      ↓
[Integration] ← Verify new component works with system
      ↓
[Complete] ← Task can now be handled
```

## Gitea Integration

### Status Labels

Pipeline uses Gitea labels to track progress:
- `status: new` → `status: planned` → `status: researching` → ...
- Agents add/remove labels automatically

### Performance Logging

Each agent logs to Gitea issue comments:
```markdown
## ✅ lead-developer completed

**Score**: 8/10
**Duration**: 1.2h
**Files**: src/auth.ts, src/user.ts

### Notes
- Clean implementation
- Follows existing patterns
- Tests passing
```

### Efficiency Tracking

Scores saved to `.kilo/logs/efficiency_score.json`:
```json
{
  "version": "1.0",
  "history": [
    {
      "issue": 42,
      "date": "2024-01-02T10:00:00Z",
      "agents": {
        "lead-developer": 8,
        "code-skeptic": 7,
        "the-fixer": 9
      },
      "iterations": 2,
      "duration_hours": 1.5
    }
  ]
}
```

### Fitness Tracking

Fitness scores saved to `.kilo/logs/fitness-history.jsonl`:
```jsonl
{"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47}
{"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47}
```

## Manual Agent Invocation

```typescript
// Use Task tool to invoke subagent
Task tool with:
  subagent_type: "lead-developer"
  prompt: "Implement authentication for issue #42"
```

Or via `@mention`:
```
@lead-developer implement authentication flow
```

## Environment Variables

Required for Gitea integration:
```bash
GITEA_API_URL=https://git.softuniq.eu/api/v1
GITEA_TOKEN=your-token-here
```

## Self-Improvement Cycle

1. **Pipeline runs** for each issue
2. **Evaluator scores** each agent (1-10) - subjective
3. **Pipeline Judge measures** fitness objectively (0.0-1.0)
4. **Low fitness (<0.70)** triggers prompt-optimizer
5. **Prompt optimizer** analyzes failures and improves prompts
6. **Re-run workflow** with improved prompts
7. **Compare fitness** before/after - commit if improved
8. **Log results** to `.kilo/logs/fitness-history.jsonl`

### Evaluator vs Pipeline Judge

| Aspect | Evaluator | Pipeline Judge |
|--------|-----------|----------------|
| Type | Subjective | Objective |
| Score | 1-10 (opinion) | 0.0-1.0 (metrics) |
| Metrics | Observations | Tests, tokens, time |
| Trigger | After workflow | After evaluator |
| Action | Logs to Gitea | Triggers optimization |

### Fitness Score Components

```
fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25)

where:
  test_pass_rate = passed_tests / total_tests
  quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage)
  efficiency_score = 1.0 - clamp(normalized_cost, 0, 1)
```

## Architecture Files

| File | Purpose |
|------|---------|
| `AGENTS.md` | This file - main config |
| `.kilo/agents/*.md` | Agent definitions with prompts |
| `.kilo/commands/*.md` | Workflow commands |
| `.kilo/rules/*.md` | Custom rules loaded globally |
| `.kilo/skills/` | Skill modules |
| `src/kilocode/` | TypeScript API for programmatic use |

## Using the TypeScript API

```typescript
import {
  PipelineRunner,
  GiteaClient,
  decideRouting
} from './src/kilocode/index.js'

const runner = await createPipelineRunner({
  giteaToken: process.env.GITEA_TOKEN
})

await runner.run({ issueNumber: 42 })
```

## Agent Evolution Dashboard

Track agent model changes, performance, and recommendations in real-time.

### Access

```bash
# Sync agent data
bun run sync:evolution

# Open dashboard
bun run evolution:dashboard
bun run evolution:open
# or visit http://localhost:3001
```

### Dashboard Tabs

| Tab | Description |
|-----|-------------|
| **Overview** | Stats, recent changes, pending recommendations |
| **All Agents** | Filterable agent cards with history |
| **Timeline** | Full evolution history |
| **Recommendations** | Priority-based model suggestions |
| **Model Matrix** | Agent × Model mapping with fit scores |

### Data Sources

| Source | What it tracks |
|--------|----------------|
| `.kilo/agents/*.md` | Model, description, capabilities |
| `.kilo/kilo.jsonc` | Model assignments |
| `.kilo/capability-index.yaml` | Capability routing |
| Git History | Model and prompt changes |
| Gitea Comments | Performance scores |

### Evolution Data Structure

```json
{
  "agents": {
    "lead-developer": {
      "current": { "model": "qwen3-coder:480b", "fit_score": 92 },
      "history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }],
      "performance_log": [{ "issue": 42, "score": 8, "success": true }]
    }
  }
}
```

### Recommendations Priority

| Priority | When | Example |
|----------|------|---------|
| **Critical** | Fit score < 70 | Immediate model change required |
| **High** | Model unavailable | Switch to fallback |
| **Medium** | Better model available | Consider upgrade |
| **Low** | Optimization possible | Optional improvement |

## Agent Execution Monitoring

Every agent invocation is logged to `.kilo/logs/agent-executions.jsonl` for project-level monitoring.

### Log Format

```jsonl
{"ts":"2026-04-18T14:00:00Z","agent":"php-developer","issue":42,"project":"UniqueSoft/my-shop","task":"Create Product model","subtask_type":"model_creation","duration_ms":45000,"tokens_used":8500,"status":"success","files":["app/Models/Product.php"],"score":8,"next_agent":"code-skeptic"}
```

### Monitoring Commands

```bash
# Agent stats report
bun run scripts/agent-stats.ts

# Stats for last 7 days
bun run scripts/agent-stats.ts --last 7

# Stats for specific project
bun run scripts/agent-stats.ts --project UniqueSoft/my-shop
```

### Required Logging Fields

| Field | Description |
|-------|-------------|
| `agent` | Agent name |
| `issue` | Gitea issue number |
| `project` | Target project repo (NOT hardcoded APAW) |
| `task` | Atomic task description |
| `duration_ms` | Execution time |
| `tokens_used` | Token estimate |
| `status` | success/fail/pass/blocked |

## Critical Rules

### Target Project (NOT APAW)

**Issues MUST be created in the target project repository, NOT in APAW.** APAW is the agent framework, not the default project.

```bash
# Auto-detect from git remote
TARGET_REPO=$(git remote get-url origin | sed -E 's|.*[:/]([^/]+/[^/]+?)(\.git)?$|\1|')
```

### Atomic Tasks (1 action = 1 task)

Every agent invocation solves exactly ONE atomic task:
- ❌ "Implement the entire e-commerce backend"
- ✅ "Create Product model with migration"
- ✅ "Add POST /api/products endpoint"

### Modular Code

- Maximum 100 lines per file
- Maximum 30 lines per function
- Features organized as independent modules
- Cross-module communication via events/interfaces only

### Token Budgets

| Task Size | Max Tokens | Example |
|----------|-----------|---------|
| Tiny | 2,000 | Fix typo, add config |
| Small | 5,000 | Create model + migration |
| Medium | 10,000 | Create API endpoint + test |
| Large | 20,000 | Create service with 3 methods |

## Code Style

- Use TypeScript for new files
- Follow existing patterns
- Write tests before code (TDD)
- Keep functions under 50 lines
- Use early returns
- No comments unless explicitly requested