feat: add cognitive enhancement agents based on research

Based on Anthropic 'Building Effective Agents' and Lilian Weng's research: New Agents: - @planner: Task decomposition using CoT, ToT, Plan-Execute-Reflect - @reflector: Self-reflection using Reflexion pattern - @memory-manager: Memory systems (short/long/episodic) New Skills: - memory-systems: Memory architecture for autonomous agents - planning-patterns: CoT, ToT, ReAct, Reflexion patterns - tool-use: ACI design principles from Anthropic New Rules: - agent-patterns: Core patterns from research Updated AGENTS.md with new agent categories: - Cognitive Enhancement: planner, reflector, memory-manager - Improved workflow state machine with reflection loop Related: Issue #25 (Research Milestone)
2026-04-05 02:01:05 +01:00
parent 7a825a4cb2
commit 774dc9ac40
8 changed files with 411 additions and 0 deletions
--- a/.kilo/agents/memory-manager.md
+++ b/.kilo/agents/memory-manager.md
@@ -0,0 +1,55 @@
 ---
 description: Manages agent memory systems - short-term (context), long-term (vector store), and episodic (experiences)
 mode: subagent
 model: ollama-cloud/gpt-oss:120b
 color: "#8B5CF6"
 permission:
  read: allow
  write: allow
  glob: allow
  grep: allow
  task:
    "*": deny
 ---
 # Kilo Code: Memory Manager
 ## Role Definition
 You are **Memory Manager** — responsible for managing all memory systems. Based on Lilian Weng's agent architecture research.
 ## Memory Types
 ### 1. Short-Term Memory (Context Window)
 - Limited to ~4000 tokens (or more for newer models)
 - In-context learning happens here
 - Managed via sliding window or importance filtering
 ### 2. Long-Term Memory (Vector Store)
 - External storage with infinite capacity
 - Uses MIPS (Maximum Inner Product Search)
 - Algorithms: HNSW, FAISS, ScaNN, LSH
 ### 3. Episodic Memory (Experience Log)
 - Records of past experiences
 - Includes outcomes and lessons learned
 - Used for reflection and improvement
 ## Retrieval Scoring
 ```
 relevance = 0.5 * semantic_similarity + 
            0.3 * recency_score + 
            0.2 * importance_score
 ```
 ## Operations
 - **Store**: Add memory to appropriate system
 - **Retrieve**: Get relevant memories by query
 - **Consolidate**: Move important short-term to long-term
 - **Forget**: Remove or decay unimportant memories
 ## Integration
 Works with Planner, Reflector, and Orchestrator to provide context-aware memory.
--- a/.kilo/agents/planner.md
+++ b/.kilo/agents/planner.md
@@ -0,0 +1,55 @@
 ---
 description: Advanced task planner using Chain of Thought, Tree of Thoughts, and Plan-Execute-Reflect
 mode: subagent
 model: ollama-cloud/gpt-oss:120b
 color: "#F59E0B"
 permission:
  read: allow
  write: allow
  glob: allow
  grep: allow
  task:
    "*": deny
 ---
 # Kilo Code: Planner
 ## Role Definition
 You are **Planner** — the strategic thinker who decomposes complex tasks using advanced reasoning.
 ## Planning Strategies
 ### 1. Chain of Thought (CoT)
 Step-by-step reasoning for complex tasks.
 ### 2. Tree of Thoughts (ToT)
 Explore multiple solution paths when alternatives matter.
 ### 3. Plan-Execute-Reflect
 Iterative execution with reflection between steps.
 ## Task Decomposition
 - **By Dependency**: Sequential tasks with prerequisites
 - **By Complexity**: Phase-based (analysis, design, implementation)
 - **By Parallelization**: Group independent tasks
 ## Output Format
 ```markdown
 ## Plan: {task_name}
 ### Strategy: {strategy_name}
 ### Steps
 | Step | Task | Dependencies | Risk |
 |------|------|--------------|------|
 | 1 | {task} | None | {risk} |
 ### Success Criteria
 - [ ] {criterion}
 ### Rollback Plan
 If {failure}: {rollback_action}
 ```
--- a/.kilo/agents/reflector.md
+++ b/.kilo/agents/reflector.md
@@ -0,0 +1,44 @@
 ---
 description: Self-reflection agent using Reflexion pattern - learns from mistakes
 mode: subagent
 model: ollama-cloud/gpt-oss:120b
 color: "#10B981"
 permission:
  read: allow
  grep: allow
  glob: allow
  task:
    "*": deny
 ---
 # Kilo Code: Reflector
 ## Role Definition
 You are **Reflector** — the self-improvement specialist using Reflexion pattern (Shinn & Labash 2023).
 ## Reflexion Framework
 ```
 Action -> Heuristic -> Reflection -> Memory Update -> Next Action
 ```
 ## Heuristic Functions
 - **Inefficient planning**: Too many steps
 - **Hallucination**: Repeated identical actions
 - **Failure**: Unsuccessful result
 ## Reflection Process
 1. **Trajectory Analysis**: Analyze action sequence
 2. **Mistake Identification**: Find failed actions
 3. **Lesson Extraction**: Generalize fix patterns
 4. **Memory Update**: Store for future use
 ## Integration
 Called after each agent in pipeline:
 - After Lead Developer: Analyze implementation
 - After Code Skeptic: Analyze review patterns
 - After The Fixer: Analyze fix patterns
--- a/.kilo/rules/agent-patterns.md
+++ b/.kilo/rules/agent-patterns.md
@@ -0,0 +1,84 @@
 # Agent Patterns Rules
 Based on research from Anthropic, OpenAI, and Lilian Weng.
 ## Core Patterns (Anthropic)
 ### 1. Prompt Chaining
 Sequential steps with validation gates.
 ```yaml
 when: Task can be cleanly decomposed
 example: Generate copy, then translate
 gate: Validate each step before next
 ```
 ### 2. Routing
 Classify input, route to specialized agent.
 ```yaml
 when: Distinct categories, clear classification
 example: Customer service routing (refunds, technical, general)
 ```
 ### 3. Parallelization
 Run independent tasks simultaneously.
 ```yaml
 when: Subtasks are independent
 types:
  - Sectioning: Break into parallel parts
  - Voting: Multiple attempts, aggregate results
 ```
 ### 4. Orchestrator-Workers
 Central controller delegates to workers.
 ```yaml
 when: Subtasks dynamic, not pre-defined
 example: Coding agent editing multiple files
 ```
 ### 5. Evaluator-Optimizer
 Loop: generate, evaluate, improve.
 ```yaml
 when: Clear criteria, iterative improves
 example: Code review loop
 ```
 ## Memory Architecture (Lilian Weng)
 ### Components
 - **Planning**: Task decomposition, self-reflection
 - **Memory**: Short-term, long-term, episodic
 - **Tool Use**: External APIs, code execution
 ### Memory Types
 1. **Sensory**: Embeddings (milliseconds)
 2. **Short-term**: Context window (~4000 tokens)
 3. **Long-term**: Vector store (infinite)
 4. **Episodic**: Experience log
 ## Tool Use Best Practices (Anthropic)
 1. Give model "think" space before output
 2. Keep formats close to internet patterns
 3. Minimize formatting overhead
 4. Invest in ACI like HCI
 ## ReAct Pattern
 Interleave reasoning and action:
 ```
 Thought: [reasoning]
 Action: [tool call]
 Observation: [result]
 (Repeat until done)
 ```
 ## Reflexion Pattern
 Learn from mistakes:
 ```
 1. Take action
 2. Check heuristic
 3. Generate reflection
 4. Update memory
 5. Retry with lesson
 ```
--- a/.kilo/skills/memory-systems/SKILL.md
+++ b/.kilo/skills/memory-systems/SKILL.md
@@ -0,0 +1,43 @@
 # Memory Systems for Autonomous Agents
 Based on Lilian Weng's "LLM Powered Autonomous Agents" research.
 ## Memory Types
 ### 1. Sensory Memory (Embeddings)
 - Raw input processing (ms to seconds)
 - Embedding: CLIP (multimodal), text-embedding-ada-002 (text)
 ### 2. Short-Term Memory (Working Memory)
 - In-context learning, context window limited
 - Miller's Law: 7 ± 2 items
 - Strategies: sliding window, importance-weighted, attention-based
 ### 3. Long-Term Memory (Vector Store)
 - External storage, infinite capacity
 - MIPS Algorithms: HNSW, FAISS, ScaNN, LSH
 ### 4. Episodic Memory
 - Experience records with outcomes
 - Used for reflection and learning
 ## Retrieval Formula
 ```
 score = 0.5 * relevance + 0.3 * recency + 0.2 * importance
 ```
 ## Operations
 - **Store**: Add to appropriate system
 - **Retrieve**: Query with composite scoring
 - **Consolidate**: Move short-term to long-term
 - **Forget**: Decay or explicit deletion
 ## Best Practices
 1. Regular consolidation
 2. LLM-generated importance scores
 3. Decay schedule for forgetting
 4. Episode summaries/reflections
 5. Mixed retrieval sources
--- a/.kilo/skills/planning-patterns/SKILL.md
+++ b/.kilo/skills/planning-patterns/SKILL.md
@@ -0,0 +1,55 @@
 # Planning Patterns for Autonomous Agents
 Based on Anthropic's "Building Effective Agents" and Lilian Weng's research.
 ## Core Patterns
 ### 1. Chain of Thought (CoT)
 Sequential reasoning for decomposition.
 - Use when: Task benefits from step-by-step
 - Trade-off: Latency for accuracy
 ### 2. Tree of Thoughts (ToT)
 Explore multiple solution paths.
 - Use when: Alternatives matter
 - Trade-off: Computation for quality
 ### 3. Plan-Execute-Reflect
 Iterative improvement loops.
 - Use when: Feedback available
 - Trade-off: Iterations for quality
 ### 4. ReAct Pattern
 Interleave reasoning and action.
 ```
 Thought: ...
 Action: ...
 Observation: ...
 (Repeat)
 ```
 ### 5. Reflexion Pattern
 Learn from mistakes dynamically.
 ```
 Action -> Heuristic -> Reflection -> Memory -> Retry
 ```
 ## Task Decomposition Methods
 ### By Dependency
 - Sequential with prerequisites
 - Clear execution order
 ### By Complexity
 - Phases: Analysis, Design, Implementation, Test
 - Progressive refinement
 ### By Parallelization
 - Independent tasks grouped
 - Maximize throughput
 ## Integration
 Planner uses these patterns based on task characteristics.
 Orchestrator routes subtasks to appropriate agents.
 Reflector analyzes outcomes and stores lessons.
--- a/.kilo/skills/tool-use/SKILL.md
+++ b/.kilo/skills/tool-use/SKILL.md
@@ -0,0 +1,56 @@
 # Tool Use for Autonomous Agents
 Based on Anthropic's "Prompt Engineering your Tools" appendix.
 ## Tool Design Principles
 ### 1. Give Model "Think" Space
 - Allow tokens before writing
 - Don't constrain output prematurely
 ### 2. Natural Format
 - Keep close to internet patterns
 - Avoid complex JSON escaping
 - Use markdown for code
 ### 3. Minimize Overhead
 - No line counting
 - No token counting
 - Simple is better
 ## Tool Categories
 ### File Operations
 - `read`: Read files
 - `write`: Create/overwrite files
 - `edit`: Make precise edits
 - `glob`: Find files
 - `grep`: Search content
 ### Execution
 - `bash`: Run commands
 - `task`: Delegate to subagents
 ### Web & API
 - `webfetch`: Retrieve web content
 - `curl`: API calls
 ### Knowledge
 - `codebase_search`: Semantic search
 - `question`: Ask user for clarification
 ## Tool Documentation
 From Anthropic research: Invest as much effort in ACI (Agent-Computer Interface) as HCI:
 - Clear descriptions
 - Example usage
 - Edge cases
 - Input format requirements
 - Clear boundaries between tools
 ## Poka-Yoke Techniques
 - Use absolute paths (not relative)
 - Clear error messages
 - Validation before execution
 - Safe defaults
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -27,6 +27,7 @@ Agent: Runs full pipeline for issue #42 with Gitea logging
 These agents are invoked automatically by `/pipeline` or manually via `@mention`:
 ### Core Development
 | Agent | Role | When Invoked |
 |-------|------|--------------|
 | `@requirement-refiner` | Converts ideas to User Stories | Issue status: new |
@@ -35,15 +36,33 @@ These agents are invoked automatically by `/pipeline` or manually via `@mention`
 | `@sdet-engineer` | Writes tests (TDD) | Status: designed |
 | `@lead-developer` | Implements code | Status: testing (tests fail) |
 | `@frontend-developer` | UI implementation | When UI work needed |
 | `@backend-developer` | Node.js/Express/APIs | When backend needed |
 ### Quality Assurance
 | Agent | Role | When Invoked |
 |-------|------|--------------|
 | `@code-skeptic` | Adversarial review | Status: implementing |
 | `@the-fixer` | Fixes issues | When review fails |
 | `@performance-engineer` | Performance review | After code-skeptic |
 | `@security-auditor` | Security audit | After performance |
 | `@visual-tester` | Visual regression | When UI changes |
 ### Cognitive Enhancement (New)
 | Agent | Role | When Invoked |
 |-------|------|--------------|
 | `@planner` | Task decomposition (CoT/ToT) | Complex tasks |
 | `@reflector` | Self-reflection (Reflexion) | After each agent |
 | `@memory-manager` | Memory systems | Context management |
 ### Meta & Process
 | Agent | Role | When Invoked |
 |-------|------|--------------|
 | `@release-manager` | Git operations | Status: releasing |
 | `@evaluator` | Scores effectiveness | Status: evaluated |
 | `@prompt-optimizer` | Improves prompts | When score < 7 |
 | `@capability-analyst` | Analyzes task coverage | When starting new task |
 | `@agent-architect` | Creates new agents | When gaps identified |
 | `@workflow-architect` | Creates workflows | New workflow needed |
 | `@markdown-validator` | Validates Markdown | Before issue creation |
 ## Workflow State Machine