feat: add cognitive enhancement agents based on research

Based on Anthropic 'Building Effective Agents' and Lilian Weng's research: New Agents: - @planner: Task decomposition using CoT, ToT, Plan-Execute-Reflect - @reflector: Self-reflection using Reflexion pattern - @memory-manager: Memory systems (short/long/episodic) New Skills: - memory-systems: Memory architecture for autonomous agents - planning-patterns: CoT, ToT, ReAct, Reflexion patterns - tool-use: ACI design principles from Anthropic New Rules: - agent-patterns: Core patterns from research Updated AGENTS.md with new agent categories: - Cognitive Enhancement: planner, reflector, memory-manager - Improved workflow state machine with reflection loop Related: Issue #25 (Research Milestone)
2026-04-05 02:01:05 +01:00
parent 7a825a4cb2
commit 774dc9ac40
8 changed files with 411 additions and 0 deletions
--- a/.kilo/agents/memory-manager.md
+++ b/.kilo/agents/memory-manager.md
@@ -0,0 +1,55 @@
+---
+description: Manages agent memory systems - short-term (context), long-term (vector store), and episodic (experiences)
+mode: subagent
+model: ollama-cloud/gpt-oss:120b
+color: "#8B5CF6"
+permission:
+  read: allow
+  write: allow
+  glob: allow
+  grep: allow
+  task:
+    "*": deny
+---
+
+# Kilo Code: Memory Manager
+
+## Role Definition
+
+You are **Memory Manager** — responsible for managing all memory systems. Based on Lilian Weng's agent architecture research.
+
+## Memory Types
+
+### 1. Short-Term Memory (Context Window)
+- Limited to ~4000 tokens (or more for newer models)
+- In-context learning happens here
+- Managed via sliding window or importance filtering
+
+### 2. Long-Term Memory (Vector Store)
+- External storage with infinite capacity
+- Uses MIPS (Maximum Inner Product Search)
+- Algorithms: HNSW, FAISS, ScaNN, LSH
+
+### 3. Episodic Memory (Experience Log)
+- Records of past experiences
+- Includes outcomes and lessons learned
+- Used for reflection and improvement
+
+## Retrieval Scoring
+
+```
+relevance = 0.5 * semantic_similarity + 
+            0.3 * recency_score + 
+            0.2 * importance_score
+```
+
+## Operations
+
+- **Store**: Add memory to appropriate system
+- **Retrieve**: Get relevant memories by query
+- **Consolidate**: Move important short-term to long-term
+- **Forget**: Remove or decay unimportant memories
+
+## Integration
+
+Works with Planner, Reflector, and Orchestrator to provide context-aware memory.
--- a/.kilo/agents/planner.md
+++ b/.kilo/agents/planner.md
@@ -0,0 +1,55 @@
+---
+description: Advanced task planner using Chain of Thought, Tree of Thoughts, and Plan-Execute-Reflect
+mode: subagent
+model: ollama-cloud/gpt-oss:120b
+color: "#F59E0B"
+permission:
+  read: allow
+  write: allow
+  glob: allow
+  grep: allow
+  task:
+    "*": deny
+---
+
+# Kilo Code: Planner
+
+## Role Definition
+
+You are **Planner** — the strategic thinker who decomposes complex tasks using advanced reasoning.
+
+## Planning Strategies
+
+### 1. Chain of Thought (CoT)
+Step-by-step reasoning for complex tasks.
+
+### 2. Tree of Thoughts (ToT)
+Explore multiple solution paths when alternatives matter.
+
+### 3. Plan-Execute-Reflect
+Iterative execution with reflection between steps.
+
+## Task Decomposition
+
+- **By Dependency**: Sequential tasks with prerequisites
+- **By Complexity**: Phase-based (analysis, design, implementation)
+- **By Parallelization**: Group independent tasks
+
+## Output Format
+
+```markdown
+## Plan: {task_name}
+
+### Strategy: {strategy_name}
+
+### Steps
+| Step | Task | Dependencies | Risk |
+|------|------|--------------|------|
+| 1 | {task} | None | {risk} |
+
+### Success Criteria
+- [ ] {criterion}
+
+### Rollback Plan
+If {failure}: {rollback_action}
+```
--- a/.kilo/agents/reflector.md
+++ b/.kilo/agents/reflector.md
@@ -0,0 +1,44 @@
+---
+description: Self-reflection agent using Reflexion pattern - learns from mistakes
+mode: subagent
+model: ollama-cloud/gpt-oss:120b
+color: "#10B981"
+permission:
+  read: allow
+  grep: allow
+  glob: allow
+  task:
+    "*": deny
+---
+
+# Kilo Code: Reflector
+
+## Role Definition
+
+You are **Reflector** — the self-improvement specialist using Reflexion pattern (Shinn & Labash 2023).
+
+## Reflexion Framework
+
+```
+Action -> Heuristic -> Reflection -> Memory Update -> Next Action
+```
+
+## Heuristic Functions
+
+- **Inefficient planning**: Too many steps
+- **Hallucination**: Repeated identical actions
+- **Failure**: Unsuccessful result
+
+## Reflection Process
+
+1. **Trajectory Analysis**: Analyze action sequence
+2. **Mistake Identification**: Find failed actions
+3. **Lesson Extraction**: Generalize fix patterns
+4. **Memory Update**: Store for future use
+
+## Integration
+
+Called after each agent in pipeline:
+- After Lead Developer: Analyze implementation
+- After Code Skeptic: Analyze review patterns
+- After The Fixer: Analyze fix patterns
--- a/.kilo/rules/agent-patterns.md
+++ b/.kilo/rules/agent-patterns.md
@@ -0,0 +1,84 @@
+# Agent Patterns Rules
+
+Based on research from Anthropic, OpenAI, and Lilian Weng.
+
+## Core Patterns (Anthropic)
+
+### 1. Prompt Chaining
+Sequential steps with validation gates.
+```yaml
+when: Task can be cleanly decomposed
+example: Generate copy, then translate
+gate: Validate each step before next
+```
+
+### 2. Routing
+Classify input, route to specialized agent.
+```yaml
+when: Distinct categories, clear classification
+example: Customer service routing (refunds, technical, general)
+```
+
+### 3. Parallelization
+Run independent tasks simultaneously.
+```yaml
+when: Subtasks are independent
+types:
+  - Sectioning: Break into parallel parts
+  - Voting: Multiple attempts, aggregate results
+```
+
+### 4. Orchestrator-Workers
+Central controller delegates to workers.
+```yaml
+when: Subtasks dynamic, not pre-defined
+example: Coding agent editing multiple files
+```
+
+### 5. Evaluator-Optimizer
+Loop: generate, evaluate, improve.
+```yaml
+when: Clear criteria, iterative improves
+example: Code review loop
+```
+
+## Memory Architecture (Lilian Weng)
+
+### Components
+- **Planning**: Task decomposition, self-reflection
+- **Memory**: Short-term, long-term, episodic
+- **Tool Use**: External APIs, code execution
+
+### Memory Types
+1. **Sensory**: Embeddings (milliseconds)
+2. **Short-term**: Context window (~4000 tokens)
+3. **Long-term**: Vector store (infinite)
+4. **Episodic**: Experience log
+
+## Tool Use Best Practices (Anthropic)
+
+1. Give model "think" space before output
+2. Keep formats close to internet patterns
+3. Minimize formatting overhead
+4. Invest in ACI like HCI
+
+## ReAct Pattern
+
+Interleave reasoning and action:
+```
+Thought: [reasoning]
+Action: [tool call]
+Observation: [result]
+(Repeat until done)
+```
+
+## Reflexion Pattern
+
+Learn from mistakes:
+```
+1. Take action
+2. Check heuristic
+3. Generate reflection
+4. Update memory
+5. Retry with lesson
+```
--- a/.kilo/skills/memory-systems/SKILL.md
+++ b/.kilo/skills/memory-systems/SKILL.md
@@ -0,0 +1,43 @@
+# Memory Systems for Autonomous Agents
+
+Based on Lilian Weng's "LLM Powered Autonomous Agents" research.
+
+## Memory Types
+
+### 1. Sensory Memory (Embeddings)
+- Raw input processing (ms to seconds)
+- Embedding: CLIP (multimodal), text-embedding-ada-002 (text)
+
+### 2. Short-Term Memory (Working Memory)
+- In-context learning, context window limited
+- Miller's Law: 7 ± 2 items
+- Strategies: sliding window, importance-weighted, attention-based
+
+### 3. Long-Term Memory (Vector Store)
+- External storage, infinite capacity
+- MIPS Algorithms: HNSW, FAISS, ScaNN, LSH
+
+### 4. Episodic Memory
+- Experience records with outcomes
+- Used for reflection and learning
+
+## Retrieval Formula
+
+```
+score = 0.5 * relevance + 0.3 * recency + 0.2 * importance
+```
+
+## Operations
+
+- **Store**: Add to appropriate system
+- **Retrieve**: Query with composite scoring
+- **Consolidate**: Move short-term to long-term
+- **Forget**: Decay or explicit deletion
+
+## Best Practices
+
+1. Regular consolidation
+2. LLM-generated importance scores
+3. Decay schedule for forgetting
+4. Episode summaries/reflections
+5. Mixed retrieval sources
--- a/.kilo/skills/planning-patterns/SKILL.md
+++ b/.kilo/skills/planning-patterns/SKILL.md
@@ -0,0 +1,55 @@
+# Planning Patterns for Autonomous Agents
+
+Based on Anthropic's "Building Effective Agents" and Lilian Weng's research.
+
+## Core Patterns
+
+### 1. Chain of Thought (CoT)
+Sequential reasoning for decomposition.
+- Use when: Task benefits from step-by-step
+- Trade-off: Latency for accuracy
+
+### 2. Tree of Thoughts (ToT)
+Explore multiple solution paths.
+- Use when: Alternatives matter
+- Trade-off: Computation for quality
+
+### 3. Plan-Execute-Reflect
+Iterative improvement loops.
+- Use when: Feedback available
+- Trade-off: Iterations for quality
+
+### 4. ReAct Pattern
+Interleave reasoning and action.
+```
+Thought: ...
+Action: ...
+Observation: ...
+(Repeat)
+```
+
+### 5. Reflexion Pattern
+Learn from mistakes dynamically.
+```
+Action -> Heuristic -> Reflection -> Memory -> Retry
+```
+
+## Task Decomposition Methods
+
+### By Dependency
+- Sequential with prerequisites
+- Clear execution order
+
+### By Complexity
+- Phases: Analysis, Design, Implementation, Test
+- Progressive refinement
+
+### By Parallelization
+- Independent tasks grouped
+- Maximize throughput
+
+## Integration
+
+Planner uses these patterns based on task characteristics.
+Orchestrator routes subtasks to appropriate agents.
+Reflector analyzes outcomes and stores lessons.
--- a/.kilo/skills/tool-use/SKILL.md
+++ b/.kilo/skills/tool-use/SKILL.md
@@ -0,0 +1,56 @@
+# Tool Use for Autonomous Agents
+
+Based on Anthropic's "Prompt Engineering your Tools" appendix.
+
+## Tool Design Principles
+
+### 1. Give Model "Think" Space
+- Allow tokens before writing
+- Don't constrain output prematurely
+
+### 2. Natural Format
+- Keep close to internet patterns
+- Avoid complex JSON escaping
+- Use markdown for code
+
+### 3. Minimize Overhead
+- No line counting
+- No token counting
+- Simple is better
+
+## Tool Categories
+
+### File Operations
+- `read`: Read files
+- `write`: Create/overwrite files
+- `edit`: Make precise edits
+- `glob`: Find files
+- `grep`: Search content
+
+### Execution
+- `bash`: Run commands
+- `task`: Delegate to subagents
+
+### Web & API
+- `webfetch`: Retrieve web content
+- `curl`: API calls
+
+### Knowledge
+- `codebase_search`: Semantic search
+- `question`: Ask user for clarification
+
+## Tool Documentation
+
+From Anthropic research: Invest as much effort in ACI (Agent-Computer Interface) as HCI:
+- Clear descriptions
+- Example usage
+- Edge cases
+- Input format requirements
+- Clear boundaries between tools
+
+## Poka-Yoke Techniques
+
+- Use absolute paths (not relative)
+- Clear error messages
+- Validation before execution
+- Safe defaults
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -27,6 +27,7 @@ Agent: Runs full pipeline for issue #42 with Gitea logging

 These agents are invoked automatically by `/pipeline` or manually via `@mention`:

+### Core Development
 | Agent | Role | When Invoked |
 |-------|------|--------------|
 | `@requirement-refiner` | Converts ideas to User Stories | Issue status: new |
@@ -35,15 +36,33 @@ These agents are invoked automatically by `/pipeline` or manually via `@mention`
 | `@sdet-engineer` | Writes tests (TDD) | Status: designed |
 | `@lead-developer` | Implements code | Status: testing (tests fail) |
 | `@frontend-developer` | UI implementation | When UI work needed |
+| `@backend-developer` | Node.js/Express/APIs | When backend needed |
+
+### Quality Assurance
+| Agent | Role | When Invoked |
+|-------|------|--------------|
 | `@code-skeptic` | Adversarial review | Status: implementing |
 | `@the-fixer` | Fixes issues | When review fails |
 | `@performance-engineer` | Performance review | After code-skeptic |
 | `@security-auditor` | Security audit | After performance |
+| `@visual-tester` | Visual regression | When UI changes |
+
+### Cognitive Enhancement (New)
+| Agent | Role | When Invoked |
+|-------|------|--------------|
+| `@planner` | Task decomposition (CoT/ToT) | Complex tasks |
+| `@reflector` | Self-reflection (Reflexion) | After each agent |
+| `@memory-manager` | Memory systems | Context management |
+
+### Meta & Process
+| Agent | Role | When Invoked |
+|-------|------|--------------|
 | `@release-manager` | Git operations | Status: releasing |
 | `@evaluator` | Scores effectiveness | Status: evaluated |
 | `@prompt-optimizer` | Improves prompts | When score < 7 |
 | `@capability-analyst` | Analyzes task coverage | When starting new task |
 | `@agent-architect` | Creates new agents | When gaps identified |
+| `@workflow-architect` | Creates workflows | New workflow needed |
 | `@markdown-validator` | Validates Markdown | Before issue creation |

 ## Workflow State Machine