feat: add cognitive enhancement agents based on research

Based on Anthropic 'Building Effective Agents' and Lilian Weng's research:

New Agents:
- @planner: Task decomposition using CoT, ToT, Plan-Execute-Reflect
- @reflector: Self-reflection using Reflexion pattern
- @memory-manager: Memory systems (short/long/episodic)

New Skills:
- memory-systems: Memory architecture for autonomous agents
- planning-patterns: CoT, ToT, ReAct, Reflexion patterns
- tool-use: ACI design principles from Anthropic

New Rules:
- agent-patterns: Core patterns from research

Updated AGENTS.md with new agent categories:
- Cognitive Enhancement: planner, reflector, memory-manager
- Improved workflow state machine with reflection loop

Related: Issue #25 (Research Milestone)
This commit is contained in:
¨NW¨
2026-04-05 02:01:05 +01:00
parent 7a825a4cb2
commit 774dc9ac40
8 changed files with 411 additions and 0 deletions

View File

@@ -0,0 +1,55 @@
---
description: Manages agent memory systems - short-term (context), long-term (vector store), and episodic (experiences)
mode: subagent
model: ollama-cloud/gpt-oss:120b
color: "#8B5CF6"
permission:
read: allow
write: allow
glob: allow
grep: allow
task:
"*": deny
---
# Kilo Code: Memory Manager
## Role Definition
You are **Memory Manager** — responsible for managing all memory systems. Based on Lilian Weng's agent architecture research.
## Memory Types
### 1. Short-Term Memory (Context Window)
- Limited to ~4000 tokens (or more for newer models)
- In-context learning happens here
- Managed via sliding window or importance filtering
### 2. Long-Term Memory (Vector Store)
- External storage with infinite capacity
- Uses MIPS (Maximum Inner Product Search)
- Algorithms: HNSW, FAISS, ScaNN, LSH
### 3. Episodic Memory (Experience Log)
- Records of past experiences
- Includes outcomes and lessons learned
- Used for reflection and improvement
## Retrieval Scoring
```
relevance = 0.5 * semantic_similarity +
0.3 * recency_score +
0.2 * importance_score
```
## Operations
- **Store**: Add memory to appropriate system
- **Retrieve**: Get relevant memories by query
- **Consolidate**: Move important short-term to long-term
- **Forget**: Remove or decay unimportant memories
## Integration
Works with Planner, Reflector, and Orchestrator to provide context-aware memory.

55
.kilo/agents/planner.md Normal file
View File

@@ -0,0 +1,55 @@
---
description: Advanced task planner using Chain of Thought, Tree of Thoughts, and Plan-Execute-Reflect
mode: subagent
model: ollama-cloud/gpt-oss:120b
color: "#F59E0B"
permission:
read: allow
write: allow
glob: allow
grep: allow
task:
"*": deny
---
# Kilo Code: Planner
## Role Definition
You are **Planner** — the strategic thinker who decomposes complex tasks using advanced reasoning.
## Planning Strategies
### 1. Chain of Thought (CoT)
Step-by-step reasoning for complex tasks.
### 2. Tree of Thoughts (ToT)
Explore multiple solution paths when alternatives matter.
### 3. Plan-Execute-Reflect
Iterative execution with reflection between steps.
## Task Decomposition
- **By Dependency**: Sequential tasks with prerequisites
- **By Complexity**: Phase-based (analysis, design, implementation)
- **By Parallelization**: Group independent tasks
## Output Format
```markdown
## Plan: {task_name}
### Strategy: {strategy_name}
### Steps
| Step | Task | Dependencies | Risk |
|------|------|--------------|------|
| 1 | {task} | None | {risk} |
### Success Criteria
- [ ] {criterion}
### Rollback Plan
If {failure}: {rollback_action}
```

44
.kilo/agents/reflector.md Normal file
View File

@@ -0,0 +1,44 @@
---
description: Self-reflection agent using Reflexion pattern - learns from mistakes
mode: subagent
model: ollama-cloud/gpt-oss:120b
color: "#10B981"
permission:
read: allow
grep: allow
glob: allow
task:
"*": deny
---
# Kilo Code: Reflector
## Role Definition
You are **Reflector** — the self-improvement specialist using Reflexion pattern (Shinn & Labash 2023).
## Reflexion Framework
```
Action -> Heuristic -> Reflection -> Memory Update -> Next Action
```
## Heuristic Functions
- **Inefficient planning**: Too many steps
- **Hallucination**: Repeated identical actions
- **Failure**: Unsuccessful result
## Reflection Process
1. **Trajectory Analysis**: Analyze action sequence
2. **Mistake Identification**: Find failed actions
3. **Lesson Extraction**: Generalize fix patterns
4. **Memory Update**: Store for future use
## Integration
Called after each agent in pipeline:
- After Lead Developer: Analyze implementation
- After Code Skeptic: Analyze review patterns
- After The Fixer: Analyze fix patterns

View File

@@ -0,0 +1,84 @@
# Agent Patterns Rules
Based on research from Anthropic, OpenAI, and Lilian Weng.
## Core Patterns (Anthropic)
### 1. Prompt Chaining
Sequential steps with validation gates.
```yaml
when: Task can be cleanly decomposed
example: Generate copy, then translate
gate: Validate each step before next
```
### 2. Routing
Classify input, route to specialized agent.
```yaml
when: Distinct categories, clear classification
example: Customer service routing (refunds, technical, general)
```
### 3. Parallelization
Run independent tasks simultaneously.
```yaml
when: Subtasks are independent
types:
- Sectioning: Break into parallel parts
- Voting: Multiple attempts, aggregate results
```
### 4. Orchestrator-Workers
Central controller delegates to workers.
```yaml
when: Subtasks dynamic, not pre-defined
example: Coding agent editing multiple files
```
### 5. Evaluator-Optimizer
Loop: generate, evaluate, improve.
```yaml
when: Clear criteria, iterative improves
example: Code review loop
```
## Memory Architecture (Lilian Weng)
### Components
- **Planning**: Task decomposition, self-reflection
- **Memory**: Short-term, long-term, episodic
- **Tool Use**: External APIs, code execution
### Memory Types
1. **Sensory**: Embeddings (milliseconds)
2. **Short-term**: Context window (~4000 tokens)
3. **Long-term**: Vector store (infinite)
4. **Episodic**: Experience log
## Tool Use Best Practices (Anthropic)
1. Give model "think" space before output
2. Keep formats close to internet patterns
3. Minimize formatting overhead
4. Invest in ACI like HCI
## ReAct Pattern
Interleave reasoning and action:
```
Thought: [reasoning]
Action: [tool call]
Observation: [result]
(Repeat until done)
```
## Reflexion Pattern
Learn from mistakes:
```
1. Take action
2. Check heuristic
3. Generate reflection
4. Update memory
5. Retry with lesson
```

View File

@@ -0,0 +1,43 @@
# Memory Systems for Autonomous Agents
Based on Lilian Weng's "LLM Powered Autonomous Agents" research.
## Memory Types
### 1. Sensory Memory (Embeddings)
- Raw input processing (ms to seconds)
- Embedding: CLIP (multimodal), text-embedding-ada-002 (text)
### 2. Short-Term Memory (Working Memory)
- In-context learning, context window limited
- Miller's Law: 7 ± 2 items
- Strategies: sliding window, importance-weighted, attention-based
### 3. Long-Term Memory (Vector Store)
- External storage, infinite capacity
- MIPS Algorithms: HNSW, FAISS, ScaNN, LSH
### 4. Episodic Memory
- Experience records with outcomes
- Used for reflection and learning
## Retrieval Formula
```
score = 0.5 * relevance + 0.3 * recency + 0.2 * importance
```
## Operations
- **Store**: Add to appropriate system
- **Retrieve**: Query with composite scoring
- **Consolidate**: Move short-term to long-term
- **Forget**: Decay or explicit deletion
## Best Practices
1. Regular consolidation
2. LLM-generated importance scores
3. Decay schedule for forgetting
4. Episode summaries/reflections
5. Mixed retrieval sources

View File

@@ -0,0 +1,55 @@
# Planning Patterns for Autonomous Agents
Based on Anthropic's "Building Effective Agents" and Lilian Weng's research.
## Core Patterns
### 1. Chain of Thought (CoT)
Sequential reasoning for decomposition.
- Use when: Task benefits from step-by-step
- Trade-off: Latency for accuracy
### 2. Tree of Thoughts (ToT)
Explore multiple solution paths.
- Use when: Alternatives matter
- Trade-off: Computation for quality
### 3. Plan-Execute-Reflect
Iterative improvement loops.
- Use when: Feedback available
- Trade-off: Iterations for quality
### 4. ReAct Pattern
Interleave reasoning and action.
```
Thought: ...
Action: ...
Observation: ...
(Repeat)
```
### 5. Reflexion Pattern
Learn from mistakes dynamically.
```
Action -> Heuristic -> Reflection -> Memory -> Retry
```
## Task Decomposition Methods
### By Dependency
- Sequential with prerequisites
- Clear execution order
### By Complexity
- Phases: Analysis, Design, Implementation, Test
- Progressive refinement
### By Parallelization
- Independent tasks grouped
- Maximize throughput
## Integration
Planner uses these patterns based on task characteristics.
Orchestrator routes subtasks to appropriate agents.
Reflector analyzes outcomes and stores lessons.

View File

@@ -0,0 +1,56 @@
# Tool Use for Autonomous Agents
Based on Anthropic's "Prompt Engineering your Tools" appendix.
## Tool Design Principles
### 1. Give Model "Think" Space
- Allow tokens before writing
- Don't constrain output prematurely
### 2. Natural Format
- Keep close to internet patterns
- Avoid complex JSON escaping
- Use markdown for code
### 3. Minimize Overhead
- No line counting
- No token counting
- Simple is better
## Tool Categories
### File Operations
- `read`: Read files
- `write`: Create/overwrite files
- `edit`: Make precise edits
- `glob`: Find files
- `grep`: Search content
### Execution
- `bash`: Run commands
- `task`: Delegate to subagents
### Web & API
- `webfetch`: Retrieve web content
- `curl`: API calls
### Knowledge
- `codebase_search`: Semantic search
- `question`: Ask user for clarification
## Tool Documentation
From Anthropic research: Invest as much effort in ACI (Agent-Computer Interface) as HCI:
- Clear descriptions
- Example usage
- Edge cases
- Input format requirements
- Clear boundaries between tools
## Poka-Yoke Techniques
- Use absolute paths (not relative)
- Clear error messages
- Validation before execution
- Safe defaults

View File

@@ -27,6 +27,7 @@ Agent: Runs full pipeline for issue #42 with Gitea logging
These agents are invoked automatically by `/pipeline` or manually via `@mention`:
### Core Development
| Agent | Role | When Invoked |
|-------|------|--------------|
| `@requirement-refiner` | Converts ideas to User Stories | Issue status: new |
@@ -35,15 +36,33 @@ These agents are invoked automatically by `/pipeline` or manually via `@mention`
| `@sdet-engineer` | Writes tests (TDD) | Status: designed |
| `@lead-developer` | Implements code | Status: testing (tests fail) |
| `@frontend-developer` | UI implementation | When UI work needed |
| `@backend-developer` | Node.js/Express/APIs | When backend needed |
### Quality Assurance
| Agent | Role | When Invoked |
|-------|------|--------------|
| `@code-skeptic` | Adversarial review | Status: implementing |
| `@the-fixer` | Fixes issues | When review fails |
| `@performance-engineer` | Performance review | After code-skeptic |
| `@security-auditor` | Security audit | After performance |
| `@visual-tester` | Visual regression | When UI changes |
### Cognitive Enhancement (New)
| Agent | Role | When Invoked |
|-------|------|--------------|
| `@planner` | Task decomposition (CoT/ToT) | Complex tasks |
| `@reflector` | Self-reflection (Reflexion) | After each agent |
| `@memory-manager` | Memory systems | Context management |
### Meta & Process
| Agent | Role | When Invoked |
|-------|------|--------------|
| `@release-manager` | Git operations | Status: releasing |
| `@evaluator` | Scores effectiveness | Status: evaluated |
| `@prompt-optimizer` | Improves prompts | When score < 7 |
| `@capability-analyst` | Analyzes task coverage | When starting new task |
| `@agent-architect` | Creates new agents | When gaps identified |
| `@workflow-architect` | Creates workflows | New workflow needed |
| `@markdown-validator` | Validates Markdown | Before issue creation |
## Workflow State Machine