Files
APAW/AGENTS.md
¨NW¨ b46a1a20a8 feat: add PHP development stack, atomic tasks, modular code rules, agent monitoring, fix target project detection
7 evolutionary tasks implemented:

1. PHP web development: php-developer agent + 6 skills (Laravel, Symfony, WordPress, security, testing, modular architecture) + 2 pipeline commands (/laravel, /wordpress)

2. Atomic task decomposition: 1 action = 1 task rule, task sizing guide, decomposition protocol for orchestrator, token budgets per complexity

3. Modular code rules: max 100 lines/file, max 30 lines/function, service/repository patterns, cross-module communication via events only

4. Gitea-centric workflow: mandatory issue creation before work, research with links, progress checkboxes, screenshots on test, git history as knowledge base

5. Fix: target project auto-detection — removed all hardcoded UniqueSoft/APAW from API calls, added get_target_repo() via git remote, GITEA_TARGET_REPO env override

6. Agent execution monitoring: agent-executions.jsonl logging, agent-stats.ts statistics script, required fields per invocation, Gitea comment includes duration/tokens

7. Token optimization: 1 action = 1 task principle, token budgets by task type, routing matrix, no scope creep, skip unnecessary pipeline steps
2026-04-18 23:43:04 +01:00

430 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Kilo Code Agents Reference
This file configures AI agent behavior for the APAW project - a self-improving code pipeline with Gitea logging.
## Pipeline Workflow
The main workflow is `/pipeline` - use it to process issues through all agents automatically.
```
User: /pipeline 42
Agent: Runs full pipeline for issue #42 with Gitea logging
```
## Commands (Slash Commands)
| Command | Description | Usage |
|---------|-------------|-------|
| `/pipeline <issue>` | Run full agent pipeline for issue | `/pipeline 42` |
| `/status <issue>` | Check pipeline status for issue | `/status 42` |
| `/evolve` | Run evolution cycle with fitness scoring | `/evolve --issue 42` |
| `/evaluate <issue>` | Generate performance report | `/evaluate 42` |
| `/plan` | Creates detailed task plans | `/plan feature X` |
| `/ask` | Answers codebase questions | `/ask how does auth work` |
| `/debug` | Analyzes and fixes bugs | `/debug error in login` |
| `/code` | Quick code generation | `/code add validation` |
| `/research [topic]` | Run research and self-improvement | `/research multi-agent` |
| `/evolution log` | Log agent model change | `/evolution log planner "reason"` |
| `/evolution report` | Generate evolution report | `/evolution report` |
| `/web-test <url>` | Visual regression testing in Docker | `/web-test https://bbox.wtf` |
| `/e2e-test <url>` | E2E browser automation tests | `/e2e-test https://my-app.com` |
## Pipeline Agents (Subagents)
These agents are invoked automatically by `/pipeline` or manually via `@mention`:
### Core Development
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@requirement-refiner` | Converts ideas to User Stories | glm-5.1 | thinking | history-miner, system-analyst |
| `@history-miner` | Finds duplicates in git | nemotron-3-super | — | *(read-only)* |
| `@system-analyst` | Designs specifications | glm-5.1 | thinking | sdet-engineer, orchestrator |
| `@sdet-engineer` | Writes tests (TDD) | qwen3-coder:480b | thinking | lead-developer, orchestrator |
| `@lead-developer` | Implements code | qwen3-coder:480b | thinking | code-skeptic, orchestrator |
| `@frontend-developer` | UI implementation | qwen3-coder:480b | — | code-skeptic, orchestrator |
| `@backend-developer` | Node.js/Express/APIs | qwen3-coder:480b | — | code-skeptic, orchestrator |
| `@php-developer` | PHP/Laravel/Symfony/WordPress | qwen3-coder:480b | thinking | code-skeptic, security-auditor, orchestrator |
| `@go-developer` | Go backend services | qwen3-coder:480b | — | code-skeptic, orchestrator |
| `@flutter-developer` | Flutter mobile apps | qwen3-coder:480b | — | code-skeptic, orchestrator |
### Quality Assurance
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@code-skeptic` | Adversarial review | minimax-m2.5 | — | the-fixer, performance-engineer, orchestrator |
| `@the-fixer` | Fixes issues | minimax-m2.5 | — | code-skeptic, orchestrator |
| `@performance-engineer` | Performance review | nemotron-3-super | — | the-fixer, security-auditor, orchestrator |
| `@security-auditor` | Security audit | nemotron-3-super | — | the-fixer, release-manager, orchestrator |
| `@visual-tester` | Visual regression + bbox extraction + console/network errors | qwen3-coder:480b | — | the-fixer, orchestrator |
| `@browser-automation` | E2E testing | qwen3-coder:480b | — | orchestrator |
### DevOps & Infrastructure
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@devops-engineer` | Docker/K8s/CI-CD | nemotron-3-super | — | code-skeptic, security-auditor, orchestrator |
| `@release-manager` | Git operations, releases | glm-5.1 | — | evaluator |
### Meta & Process
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@evaluator` | Scores effectiveness | glm-5.1 | thinking | prompt-optimizer, product-owner, orchestrator |
| `@pipeline-judge` | Objective fitness scoring | glm-5.1 | — | prompt-optimizer |
| `@prompt-optimizer` | Improves prompts | glm-5.1 | instant | *(edits files)* |
| `@product-owner` | Manages issues/tracking | glm-5.1 | — | *(read-only)* |
### Analysis & Design
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@capability-analyst` | Analyzes task coverage | glm-5.1 | — | agent-architect, orchestrator |
| `@agent-architect` | Creates new agents | glm-5.1 | thinking | capability-analyst, requirement-refiner, system-analyst |
| `@workflow-architect` | Creates workflows | glm-5.1 | thinking | *(edits files)* |
| `@markdown-validator` | Validates Markdown | nemotron-3-nano:30b | — | orchestrator |
### Cognitive Enhancement
| Agent | Role | Model | Variant | Can Call |
|-------|------|-------|---------|----------|
| `@planner` | Task decomposition | nemotron-3-super | — | *(read-only)* |
| `@reflector` | Self-reflection | nemotron-3-super | — | *(read-only)* |
| `@memory-manager` | Memory systems | nemotron-3-super | — | *(read-only)* |
## Workflow State Machine
```
[new]
↓ @requirement-refiner
[planned]
↓ @capability-analyst → (gaps?) → @agent-architect → create new agents
↓ @history-miner
[researching]
↓ @system-analyst
[designed]
↓ @sdet-engineer (writes failing tests)
[testing]
↓ @lead-developer (makes tests pass)
[implementing]
↓ @code-skeptic (review)
[reviewing] ──[fail]──→ [fixing] ──→ [reviewing]
↓ @review-watcher → (auto-validate) → create fix tasks
↓ [pass]
[perf-check]
↓ @performance-engineer
[security-check]
↓ @security-auditor
[releasing]
↓ @release-manager
[evaluated]
↓ @evaluator (subjective score 1-10)
├── [score ≥ 7] → [@pipeline-judge] → fitness scoring
└── [score < 7] → @prompt-optimizer → [@evaluated]
[@pipeline-judge] ← runs tests, measures tokens/time
fitness score
┌──────────────────────────────────────┐
│ fitness >= 0.85 │──→ [completed]
│ fitness 0.70-0.84 │──→ @prompt-optimizer → [evolving]
│ fitness < 0.70 │──→ @prompt-optimizer (major) → [evolving]
│ fitness < 0.50 │──→ @agent-architect → redesign
└──────────────────────────────────────┘
[evolving] → re-run workflow → [@pipeline-judge]
compare fitness_before vs fitness_after
[improved?] → commit prompts → [completed]
└─ [not improved?] → revert → try different strategy
```
## Capability Analysis Flow
When starting a complex task:
```
[User Request]
[@capability-analyst] ← Analyzes requirements vs existing capabilities
[Gap Analysis] ← Identifies missing agents, workflows, skills
[Recommendations] → Create new or enhance existing?
[Decision]
├── [Create New] → [@agent-architect] → Create component → Review
└── [Enhance] → [@lead-developer] → Modify existing
[Integration] ← Verify new component works with system
[Complete] ← Task can now be handled
```
## Gitea Integration
### Status Labels
Pipeline uses Gitea labels to track progress:
- `status: new``status: planned``status: researching` → ...
- Agents add/remove labels automatically
### Performance Logging
Each agent logs to Gitea issue comments:
```markdown
## ✅ lead-developer completed
**Score**: 8/10
**Duration**: 1.2h
**Files**: src/auth.ts, src/user.ts
### Notes
- Clean implementation
- Follows existing patterns
- Tests passing
```
### Efficiency Tracking
Scores saved to `.kilo/logs/efficiency_score.json`:
```json
{
"version": "1.0",
"history": [
{
"issue": 42,
"date": "2024-01-02T10:00:00Z",
"agents": {
"lead-developer": 8,
"code-skeptic": 7,
"the-fixer": 9
},
"iterations": 2,
"duration_hours": 1.5
}
]
}
```
### Fitness Tracking
Fitness scores saved to `.kilo/logs/fitness-history.jsonl`:
```jsonl
{"ts":"2026-04-06T00:00:00Z","issue":42,"workflow":"feature","fitness":0.82,"tokens":38400,"time_ms":245000,"tests_passed":45,"tests_total":47}
{"ts":"2026-04-06T01:30:00Z","issue":43,"workflow":"bugfix","fitness":0.91,"tokens":12000,"time_ms":85000,"tests_passed":47,"tests_total":47}
```
## Manual Agent Invocation
```typescript
// Use Task tool to invoke subagent
Task tool with:
subagent_type: "lead-developer"
prompt: "Implement authentication for issue #42"
```
Or via `@mention`:
```
@lead-developer implement authentication flow
```
## Environment Variables
Required for Gitea integration:
```bash
GITEA_API_URL=https://git.softuniq.eu/api/v1
GITEA_TOKEN=your-token-here
```
## Self-Improvement Cycle
1. **Pipeline runs** for each issue
2. **Evaluator scores** each agent (1-10) - subjective
3. **Pipeline Judge measures** fitness objectively (0.0-1.0)
4. **Low fitness (<0.70)** triggers prompt-optimizer
5. **Prompt optimizer** analyzes failures and improves prompts
6. **Re-run workflow** with improved prompts
7. **Compare fitness** before/after - commit if improved
8. **Log results** to `.kilo/logs/fitness-history.jsonl`
### Evaluator vs Pipeline Judge
| Aspect | Evaluator | Pipeline Judge |
|--------|-----------|----------------|
| Type | Subjective | Objective |
| Score | 1-10 (opinion) | 0.0-1.0 (metrics) |
| Metrics | Observations | Tests, tokens, time |
| Trigger | After workflow | After evaluator |
| Action | Logs to Gitea | Triggers optimization |
### Fitness Score Components
```
fitness = (test_pass_rate × 0.50) + (quality_gates_rate × 0.25) + (efficiency_score × 0.25)
where:
test_pass_rate = passed_tests / total_tests
quality_gates_rate = passed_gates / total_gates (build, lint, types, tests, coverage)
efficiency_score = 1.0 - clamp(normalized_cost, 0, 1)
```
## Architecture Files
| File | Purpose |
|------|---------|
| `AGENTS.md` | This file - main config |
| `.kilo/agents/*.md` | Agent definitions with prompts |
| `.kilo/commands/*.md` | Workflow commands |
| `.kilo/rules/*.md` | Custom rules loaded globally |
| `.kilo/skills/` | Skill modules |
| `src/kilocode/` | TypeScript API for programmatic use |
## Using the TypeScript API
```typescript
import {
PipelineRunner,
GiteaClient,
decideRouting
} from './src/kilocode/index.js'
const runner = await createPipelineRunner({
giteaToken: process.env.GITEA_TOKEN
})
await runner.run({ issueNumber: 42 })
```
## Agent Evolution Dashboard
Track agent model changes, performance, and recommendations in real-time.
### Access
```bash
# Sync agent data
bun run sync:evolution
# Open dashboard
bun run evolution:dashboard
bun run evolution:open
# or visit http://localhost:3001
```
### Dashboard Tabs
| Tab | Description |
|-----|-------------|
| **Overview** | Stats, recent changes, pending recommendations |
| **All Agents** | Filterable agent cards with history |
| **Timeline** | Full evolution history |
| **Recommendations** | Priority-based model suggestions |
| **Model Matrix** | Agent × Model mapping with fit scores |
### Data Sources
| Source | What it tracks |
|--------|----------------|
| `.kilo/agents/*.md` | Model, description, capabilities |
| `.kilo/kilo.jsonc` | Model assignments |
| `.kilo/capability-index.yaml` | Capability routing |
| Git History | Model and prompt changes |
| Gitea Comments | Performance scores |
### Evolution Data Structure
```json
{
"agents": {
"lead-developer": {
"current": { "model": "qwen3-coder:480b", "fit_score": 92 },
"history": [{ "type": "model_change", "from": "deepseek", "to": "qwen3" }],
"performance_log": [{ "issue": 42, "score": 8, "success": true }]
}
}
}
```
### Recommendations Priority
| Priority | When | Example |
|----------|------|---------|
| **Critical** | Fit score < 70 | Immediate model change required |
| **High** | Model unavailable | Switch to fallback |
| **Medium** | Better model available | Consider upgrade |
| **Low** | Optimization possible | Optional improvement |
## Agent Execution Monitoring
Every agent invocation is logged to `.kilo/logs/agent-executions.jsonl` for project-level monitoring.
### Log Format
```jsonl
{"ts":"2026-04-18T14:00:00Z","agent":"php-developer","issue":42,"project":"UniqueSoft/my-shop","task":"Create Product model","subtask_type":"model_creation","duration_ms":45000,"tokens_used":8500,"status":"success","files":["app/Models/Product.php"],"score":8,"next_agent":"code-skeptic"}
```
### Monitoring Commands
```bash
# Agent stats report
bun run scripts/agent-stats.ts
# Stats for last 7 days
bun run scripts/agent-stats.ts --last 7
# Stats for specific project
bun run scripts/agent-stats.ts --project UniqueSoft/my-shop
```
### Required Logging Fields
| Field | Description |
|-------|-------------|
| `agent` | Agent name |
| `issue` | Gitea issue number |
| `project` | Target project repo (NOT hardcoded APAW) |
| `task` | Atomic task description |
| `duration_ms` | Execution time |
| `tokens_used` | Token estimate |
| `status` | success/fail/pass/blocked |
## Critical Rules
### Target Project (NOT APAW)
**Issues MUST be created in the target project repository, NOT in APAW.** APAW is the agent framework, not the default project.
```bash
# Auto-detect from git remote
TARGET_REPO=$(git remote get-url origin | sed -E 's|.*[:/]([^/]+/[^/]+?)(\.git)?$|\1|')
```
### Atomic Tasks (1 action = 1 task)
Every agent invocation solves exactly ONE atomic task:
- ❌ "Implement the entire e-commerce backend"
- ✅ "Create Product model with migration"
- ✅ "Add POST /api/products endpoint"
### Modular Code
- Maximum 100 lines per file
- Maximum 30 lines per function
- Features organized as independent modules
- Cross-module communication via events/interfaces only
### Token Budgets
| Task Size | Max Tokens | Example |
|----------|-----------|---------|
| Tiny | 2,000 | Fix typo, add config |
| Small | 5,000 | Create model + migration |
| Medium | 10,000 | Create API endpoint + test |
| Large | 20,000 | Create service with 3 methods |
## Code Style
- Use TypeScript for new files
- Follow existing patterns
- Write tests before code (TDD)
- Keep functions under 50 lines
- Use early returns
- No comments unless explicitly requested