feat: add Docker/DevOps skills and devops-engineer agent

2026-04-05 12:47:01 +01:00
parent ebbb4d01b2
commit 43747d9875
11 changed files with 4734 additions and 0 deletions
--- a/.kilo/agents/backend-developer.md
+++ b/.kilo/agents/backend-developer.md
@@ -276,10 +276,19 @@ This agent uses the following skills for comprehensive Node.js development:
 |-------|---------|
 | `nodejs-npm-management` | package.json, scripts, dependencies |

+### Containerization (Docker)
+| Skill | Purpose |
+|-------|---------|
+| `docker-compose` | Multi-container application orchestration |
+| `docker-swarm` | Production cluster deployment |
+| `docker-security` | Container security hardening |
+| `docker-monitoring` | Container monitoring and logging |
+
 ### Rules
 | File | Content |
 |------|---------|
 | `.kilo/rules/nodejs.md` | Code style, security, best practices |
+| `.kilo/rules/docker.md` | Docker, Compose, Swarm best practices |

 ## Handoff Protocol

--- a/.kilo/agents/devops-engineer.md
+++ b/.kilo/agents/devops-engineer.md
@@ -0,0 +1,356 @@
+---
+description: DevOps specialist for Docker, Kubernetes, CI/CD pipeline automation, and infrastructure management
+mode: subagent
+model: ollama-cloud/deepseek-v3.2
+color: "#FF6B35"
+permission:
+  read: allow
+  edit: allow
+  write: allow
+  bash: allow
+  glob: allow
+  grep: allow
+  task:
+    "*": deny
+---
+
+# Kilo Code: DevOps Engineer
+
+## Role Definition
+
+You are **DevOps Engineer** — the infrastructure specialist. Your personality is automation-focused, reliability-obsessed, and security-conscious. You design deployment pipelines, manage containerization, and ensure system reliability.
+
+## When to Use
+
+Invoke this mode when:
+- Setting up Docker containers and Compose files
+- Deploying to Docker Swarm or Kubernetes
+- Creating CI/CD pipelines
+- Configuring infrastructure automation
+- Setting up monitoring and logging
+- Managing secrets and configurations
+- Performance tuning deployments
+
+## Short Description
+
+DevOps specialist for Docker, Kubernetes, CI/CD automation, and infrastructure management.
+
+## Behavior Guidelines
+
+1. **Automate everything** — manual steps lead to errors
+2. **Infrastructure as Code** — version control all configurations
+3. **Security first** — minimal privileges, scan all images
+4. **Monitor everything** — metrics, logs, traces
+5. **Test deployments** — staging before production
+
+## Skills Reference
+
+### Containerization
+| Skill | Purpose |
+|-------|---------|
+| `docker-compose` | Multi-container application setup |
+| `docker-swarm` | Production cluster deployment |
+| `docker-security` | Container security hardening |
+| `docker-monitoring` | Container monitoring and logging |
+
+### CI/CD
+| Skill | Purpose |
+|-------|---------|
+| `github-actions` | GitHub Actions workflows |
+| `gitlab-ci` | GitLab CI/CD pipelines |
+| `jenkins` | Jenkins pipelines |
+
+### Infrastructure
+| Skill | Purpose |
+|-------|---------|
+| `terraform` | Infrastructure as Code |
+| `ansible` | Configuration management |
+| `helm` | Kubernetes package manager |
+
+### Rules
+| File | Content |
+|------|---------|
+| `.kilo/rules/docker.md` | Docker best practices |
+
+## Tech Stack
+
+| Layer | Technologies |
+|-------|-------------|
+| Containers | Docker, Docker Compose, Docker Swarm |
+| Orchestration | Kubernetes, Helm |
+| CI/CD | GitHub Actions, GitLab CI, Jenkins |
+| Monitoring | Prometheus, Grafana, Loki |
+| Logging | ELK Stack, Fluentd |
+| Secrets | Docker Secrets, Vault |
+
+## Output Format
+
+```markdown
+## DevOps Implementation: [Feature]
+
+### Container Configuration
+- Base image: node:20-alpine
+- Multi-stage build: ✅
+- Non-root user: ✅
+- Health checks: ✅
+
+### Deployment Configuration
+- Service: api
+- Replicas: 3
+- Resource limits: CPU 1, Memory 1G
+- Networks: app-network (overlay)
+
+### Security Measures
+- ✅ Non-root user (appuser:1001)
+- ✅ Read-only filesystem
+- ✅ Dropped capabilities (ALL)
+- ✅ No new privileges
+- ✅ Security scanning in CI/CD
+
+### Monitoring
+- Health endpoint: /health
+- Metrics: Prometheus /metrics
+- Logging: JSON structured logs
+
+---
+Status: deployed
+@CodeSkeptic ready for review
+```
+
+## Dockerfile Patterns
+
+### Multi-stage Production Build
+
+```dockerfile
+# Build stage
+FROM node:20-alpine AS builder
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci --only=production
+COPY . .
+RUN npm run build
+
+# Production stage
+FROM node:20-alpine
+RUN addgroup -g 1001 appgroup && \
+    adduser -u 1001 -G appgroup -D appuser
+WORKDIR /app
+COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
+COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
+USER appuser
+EXPOSE 3000
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+  CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"
+CMD ["node", "dist/index.js"]
+```
+
+### Development Build
+
+```dockerfile
+FROM node:20-alpine
+WORKDIR /app
+COPY package*.json ./
+RUN npm install
+COPY . .
+EXPOSE 3000
+CMD ["npm", "run", "dev"]
+```
+
+## Docker Compose Patterns
+
+### Development Environment
+
+```yaml
+version: '3.8'
+
+services:
+  app:
+    build:
+      context: .
+      dockerfile: Dockerfile.dev
+    volumes:
+      - .:/app
+      - /app/node_modules
+    environment:
+      - NODE_ENV=development
+      - DATABASE_URL=postgres://db:5432/app
+    ports:
+      - "3000:3000"
+    depends_on:
+      db:
+        condition: service_healthy
+  
+  db:
+    image: postgres:15-alpine
+    environment:
+      POSTGRES_DB: app
+      POSTGRES_USER: app
+      POSTGRES_PASSWORD: ${DB_PASSWORD}
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U app"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+volumes:
+  postgres-data:
+```
+
+### Production Environment
+
+```yaml
+version: '3.8'
+
+services:
+  app:
+    image: myapp:${VERSION}
+    deploy:
+      replicas: 3
+      update_config:
+        parallelism: 1
+        delay: 10s
+        failure_action: rollback
+      rollback_config:
+        parallelism: 1
+        delay: 10s
+      restart_policy:
+        condition: on-failure
+        max_attempts: 3
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+    healthcheck:
+      test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+    networks:
+      - app-network
+    secrets:
+      - db_password
+      - jwt_secret
+
+networks:
+  app-network:
+    driver: overlay
+    attachable: true
+
+secrets:
+  db_password:
+    external: true
+  jwt_secret:
+    external: true
+```
+
+## CI/CD Pipeline Patterns
+
+### GitHub Actions
+
+```yaml
+# .github/workflows/docker.yml
+name: Docker CI/CD
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v2
+      
+      - name: Login to Registry
+        uses: docker/login-action@v2
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      
+      - name: Build and Push
+        uses: docker/build-push-action@v4
+        with:
+          context: .
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      
+      - name: Scan Image
+        uses: aquasecurity/trivy-action@master
+        with:
+          image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
+          format: 'table'
+          exit-code: '1'
+          severity: 'CRITICAL,HIGH'
+  
+  deploy:
+    needs: build
+    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to Swarm
+        run: |
+          docker stack deploy -c docker-compose.prod.yml mystack
+```
+
+## Security Checklist
+
+```
+□ Non-root user in Dockerfile
+□ Minimal base image (alpine/distroless)
+□ Multi-stage build
+□ .dockerignore includes secrets
+□ No secrets in images
+□ Vulnerability scanning in CI/CD
+□ Read-only filesystem
+□ Dropped capabilities
+□ Resource limits defined
+□ Health checks configured
+□ Network segmentation
+□ TLS for external communication
+```
+
+## Prohibited Actions
+
+- DO NOT use `latest` tag in production
+- DO NOT run containers as root
+- DO NOT store secrets in images
+- DO NOT expose unnecessary ports
+- DO NOT skip vulnerability scanning
+- DO NOT ignore resource limits
+- DO NOT bypass health checks
+
+## Handoff Protocol
+
+After implementation:
+1. Verify containers are running
+2. Check health endpoints
+3. Review resource usage
+4. Validate security configuration
+5. Test deployment updates
+6. Tag `@CodeSkeptic` for review
+## Gitea Commenting (MANDATORY)
+
+**You MUST post a comment to the Gitea issue after completing your work.**
+
+Post a comment with:
+1. ✅ Success: What was done, files changed, duration
+2. ❌ Error: What failed, why, and blocker
+3. ❓ Question: Clarification needed with options
+
+Use the `post_comment` function from `.kilo/skills/gitea-commenting/SKILL.md`.
+
+**NO EXCEPTIONS** - Always comment to Gitea.
--- a/.kilo/agents/security-auditor.md
+++ b/.kilo/agents/security-auditor.md
@@ -115,8 +115,41 @@ gitleaks --path .

 # Check for exposed env
 grep -r "API_KEY\|PASSWORD\|SECRET" --include="*.ts" --include="*.js"
+
+# Docker image vulnerability scan
+trivy image myapp:latest
+docker scout vulnerabilities myapp:latest
+
+# Docker secrets scan
+gitleaks --image myapp:latest
 ```

+## Docker Security Checklist
+
+```
+□ Running as non-root user
+□ Using minimal base images (alpine/distroless)
+□ Using specific image versions (not latest)
+□ No secrets in images
+□ Read-only filesystem where possible
+□ Capabilities dropped to minimum
+□ No new privileges flag set
+□ Resource limits defined
+□ Health checks configured
+□ Network segmentation implemented
+□ TLS for external communication
+□ Secrets managed via Docker secrets/vault
+□ Vulnerability scanning in CI/CD
+□ Base images regularly updated
+```
+
+## Skills Reference
+
+| Skill | Purpose |
+|-------|---------|
+| `docker-security` | Container security hardening |
+| `nodejs-security-owasp` | Node.js OWASP Top 10 |
+
 ## Prohibited Actions

 - DO NOT approve with critical/high vulnerabilities
--- a/.kilo/rules/docker.md
+++ b/.kilo/rules/docker.md
@@ -0,0 +1,549 @@
+# Docker & Containerization Rules
+
+Essential rules for Docker, Docker Compose, Docker Swarm, and container technologies.
+
+## Dockerfile Best Practices
+
+### Layer Optimization
+
+- Minimize layers by combining commands
+- Order layers from least to most frequently changing
+- Use multi-stage builds to reduce image size
+- Clean up package manager caches
+
+```dockerfile
+# ✅ Good: Multi-stage build with layer optimization
+FROM node:20-alpine AS builder
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci --only=production
+
+FROM node:20-alpine
+WORKDIR /app
+COPY --from=builder /app/node_modules ./node_modules
+COPY . .
+USER node
+EXPOSE 3000
+CMD ["node", "server.js"]
+
+# ❌ Bad: Single stage, many layers
+FROM node:20
+RUN npm install -g nodemon
+WORKDIR /app
+COPY . .
+RUN npm install
+EXPOSE 3000
+CMD ["nodemon", "server.js"]
+```
+
+### Security
+
+- Run as non-root user
+- Use specific image versions, not `latest`
+- Scan images for vulnerabilities
+- Don't store secrets in images
+
+```dockerfile
+# ✅ Good
+FROM node:20-alpine
+RUN addgroup -g 1001 appgroup && \
+    adduser -u 1001 -G appgroup -D appuser
+WORKDIR /app
+COPY --chown=appuser:appgroup . .
+USER appuser
+CMD ["node", "server.js"]
+
+# ❌ Bad
+FROM node:latest  # Unpredictable version
+# Running as root (default)
+COPY . .
+CMD ["node", "server.js"]
+```
+
+### Caching Strategy
+
+```dockerfile
+# ✅ Good: Dependencies cached separately
+COPY package*.json ./
+RUN npm ci
+COPY . .
+
+# ❌ Bad: All code copied before dependencies
+COPY . .
+RUN npm install
+```
+
+## Docker Compose
+
+### Service Structure
+
+- Use version 3.8+ for modern features
+- Define services in logical order
+- Use environment variables for configuration
+- Set resource limits
+
+```yaml
+# ✅ Good
+version: '3.8'
+
+services:
+  app:
+    image: myapp:latest
+    build:
+      context: .
+      dockerfile: Dockerfile
+    environment:
+      - NODE_ENV=production
+      - DATABASE_URL=postgres://db:5432/app
+    depends_on:
+      db:
+        condition: service_healthy
+    networks:
+      - app-network
+    deploy:
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 512M
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 40s
+
+  db:
+    image: postgres:15-alpine
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+    environment:
+      POSTGRES_DB: app
+      POSTGRES_USER: ${DB_USER}
+      POSTGRES_PASSWORD: ${DB_PASSWORD}
+    networks:
+      - app-network
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+networks:
+  app-network:
+    driver: bridge
+
+volumes:
+  postgres-data:
+```
+
+### Environment Variables
+
+- Use `.env` files for local development
+- Never commit `.env` files with secrets
+- Use Docker secrets for sensitive data in Swarm
+
+```bash
+# .env (gitignored)
+NODE_ENV=production
+DB_PASSWORD=secure_password_here
+JWT_SECRET=your_jwt_secret_here
+```
+
+```yaml
+# docker-compose.yml
+services:
+  app:
+    env_file:
+      - .env
+    # OR explicit for non-sensitive
+    environment:
+      - NODE_ENV=production
+    # Secrets for sensitive data in Swarm
+    secrets:
+      - db_password
+```
+
+### Network Patterns
+
+```yaml
+# ✅ Good: Separated networks for security
+networks:
+  frontend:
+    driver: bridge
+  backend:
+    driver: bridge
+    internal: true  # No external access
+
+services:
+  web:
+    networks:
+      - frontend
+      - backend
+  api:
+    networks:
+      - backend
+  db:
+    networks:
+      - backend
+```
+
+### Volume Management
+
+```yaml
+# ✅ Good: Named volumes with labels
+volumes:
+  postgres-data:
+    driver: local
+    labels:
+      - "app=myapp"
+      - "type=database"
+
+services:
+  db:
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+      - ./init-scripts:/docker-entrypoint-initdb.d:ro
+```
+
+## Docker Swarm
+
+### Service Deployment
+
+```yaml
+# docker-compose.yml (Swarm compatible)
+version: '3.8'
+
+services:
+  api:
+    image: myapp/api:latest
+    deploy:
+      mode: replicated
+      replicas: 3
+      update_config:
+        parallelism: 1
+        delay: 10s
+        failure_action: rollback
+      rollback_config:
+        parallelism: 1
+        delay: 10s
+      restart_policy:
+        condition: on-failure
+        delay: 5s
+        max_attempts: 3
+        window: 120s
+      placement:
+        constraints:
+          - node.role == worker
+        preferences:
+          - spread: node.id
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 512M
+        reservations:
+          cpus: '0.25'
+          memory: 256M
+    networks:
+      - app-network
+    secrets:
+      - db_password
+      - jwt_secret
+    configs:
+      - app_config
+
+networks:
+  app-network:
+    driver: overlay
+    attachable: true
+
+secrets:
+  db_password:
+    external: true
+  jwt_secret:
+    external: true
+
+configs:
+  app_config:
+    external: true
+```
+
+### Stack Deployment
+
+```bash
+# Deploy stack
+docker stack deploy -c docker-compose.yml mystack
+
+# List services
+docker stack services mystack
+
+# Scale service
+docker service scale mystack_api=5
+
+# Update service
+docker service update --image myapp/api:v2 mystack_api
+
+# Rollback
+docker service rollback mystack_api
+```
+
+### Health Checks
+
+```yaml
+services:
+  api:
+    # Health check in Dockerfile
+    healthcheck:
+      test: ["CMD", "node", "healthcheck.js"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+    # Or in compose
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+```
+
+### Secrets Management
+
+```bash
+# Create secret
+echo "my_secret_password" | docker secret create db_password -
+
+# Create secret from file
+docker secret create jwt_secret ./jwt_secret.txt
+
+# List secrets
+docker secret ls
+
+# Use in compose
+secrets:
+  db_password:
+    external: true
+```
+
+### Config Management
+
+```bash
+# Create config
+docker config create app_config ./config.json
+
+# Use in compose
+configs:
+  app_config:
+    external: true
+
+services:
+  api:
+    configs:
+      - app_config
+```
+
+## Container Security
+
+### Image Security
+
+```bash
+# Scan image for vulnerabilities
+docker scout vulnerabilities myapp:latest
+trivy image myapp:latest
+
+# Check image for secrets
+gitleaks --image myapp:latest
+```
+
+### Runtime Security
+
+```dockerfile
+# ✅ Good: Security measures
+FROM node:20-alpine
+
+# Create non-root user
+RUN addgroup -g 1001 appgroup && \
+    adduser -u 1001 -G appgroup -D appuser
+
+# Set read-only filesystem
+RUN chmod -R 755 /app && \
+    chown -R appuser:appgroup /app
+
+WORKDIR /app
+COPY --chown=appuser:appgroup . .
+
+# Drop all capabilities
+USER appuser
+VOLUME ["/tmp"]
+
+CMD ["node", "server.js"]
+```
+
+### Network Security
+
+```yaml
+# ✅ Good: Limited network access
+services:
+  api:
+    networks:
+      - backend
+    # No ports exposed to host
+
+  db:
+    networks:
+      - backend
+    # Internal network only
+
+networks:
+  backend:
+    internal: true  # No internet access
+```
+
+### Resource Limits
+
+```yaml
+services:
+  api:
+    deploy:
+      resources:
+        limits:
+          cpus: '1.0'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+```
+
+## Common Patterns
+
+### Development Setup
+
+```yaml
+# docker-compose.dev.yml
+version: '3.8'
+services:
+  app:
+    build:
+      context: .
+      dockerfile: Dockerfile.dev
+    volumes:
+      - .:/app
+      - /app/node_modules
+    environment:
+      - NODE_ENV=development
+    ports:
+      - "3000:3000"
+    command: npm run dev
+```
+
+### Production Setup
+
+```yaml
+# docker-compose.prod.yml
+version: '3.8'
+services:
+  app:
+    image: myapp:${VERSION}
+    environment:
+      - NODE_ENV=production
+    deploy:
+      replicas: 3
+      update_config:
+        parallelism: 1
+        delay: 10s
+    healthcheck:
+      test: ["CMD", "node", "healthcheck.js"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+```
+
+### Multi-Environment
+
+```bash
+# Override files
+docker-compose -f docker-compose.yml -f docker-compose.dev.yml up
+docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
+```
+
+### Logging
+
+```yaml
+services:
+  app:
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "10m"
+        max-file: "3"
+        labels: "app,environment"
+```
+
+## CI/CD Integration
+
+### Build Pipeline
+
+```yaml
+# .github/workflows/docker.yml
+name: Docker Build
+
+on:
+  push:
+    branches: [main]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      
+      - name: Build image
+        run: docker build -t myapp:${{ github.sha }} .
+      
+      - name: Scan image
+        run: trivy image myapp:${{ github.sha }}
+      
+      - name: Push to registry
+        run: |
+          echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USER }} --password-stdin
+          docker push myapp:${{ github.sha }}
+```
+
+## Troubleshooting
+
+### Common Commands
+
+```bash
+# View logs
+docker-compose logs -f app
+
+# Execute in container
+docker-compose exec app sh
+
+# Check health
+docker inspect --format='{{.State.Health.Status}}' <container>
+
+# View resource usage
+docker stats
+
+# Remove unused resources
+docker system prune -a
+
+# Debug network
+docker network inspect app-network
+
+# Swarm diagnostics
+docker node ls
+docker service ps mystack_api
+```
+
+## Prohibitions
+
+- DO NOT run containers as root
+- DO NOT use `latest` tag in production
+- DO NOT expose unnecessary ports
+- DO NOT store secrets in images
+- DO NOT use privileged mode unnecessarily
+- DO NOT mount host directories without restrictions
+- DO NOT skip health checks in production
+- DO NOT ignore vulnerability scans
--- a/.kilo/skills/docker-compose/SKILL.md
+++ b/.kilo/skills/docker-compose/SKILL.md
@@ -0,0 +1,576 @@
+# Skill: Docker Compose
+
+## Purpose
+
+Comprehensive skill for Docker Compose configuration, orchestration, and multi-container application deployment.
+
+## Overview
+
+Docker Compose is a tool for defining and running multi-container Docker applications. Use this skill when working with local development environments, CI/CD pipelines, and production deployments.
+
+## When to Use
+
+- Setting up local development environments
+- Configuring multi-container applications
+- Managing service dependencies
+- Implementing health checks and waiting strategies
+- Creating development/production configurations
+
+## Skill Files Structure
+
+```
+docker-compose/
+├── SKILL.md              # This file
+├── patterns/
+│   ├── basic-service.md  # Basic service templates
+│   ├── networking.md     # Network patterns
+│   ├── volumes.md        # Volume management
+│   └── healthchecks.md   # Health check patterns
+└── examples/
+    ├── nodejs-api.md      # Node.js API template
+    ├── postgres.md       # PostgreSQL template
+    └── redis.md          # Redis template
+```
+
+## Core Patterns
+
+### 1. Basic Service Configuration
+
+```yaml
+version: '3.8'
+
+services:
+  app:
+    build:
+      context: .
+      dockerfile: Dockerfile
+      args:
+        - NODE_ENV=production
+    image: myapp:latest
+    container_name: myapp
+    restart: unless-stopped
+    ports:
+      - "3000:3000"
+    environment:
+      - NODE_ENV=production
+      - DATABASE_URL=postgres://db:5432/app
+    volumes:
+      - ./data:/app/data
+    networks:
+      - app-network
+    depends_on:
+      db:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 40s
+```
+
+### 2. Environment Configuration
+
+```yaml
+# Use .env file for secrets
+services:
+  app:
+    env_file:
+      - .env
+      - .env.local
+    environment:
+      # Non-sensitive defaults
+      - NODE_ENV=production
+      - LOG_LEVEL=info
+      # Override from .env
+      - DATABASE_URL=${DATABASE_URL}
+      - JWT_SECRET=${JWT_SECRET}
+```
+
+### 3. Network Patterns
+
+```yaml
+# Isolated networks for security
+networks:
+  frontend:
+    driver: bridge
+  backend:
+    driver: bridge
+    internal: true  # No external access
+
+services:
+  web:
+    networks:
+      - frontend
+      - backend
+  
+  api:
+    networks:
+      - backend
+  
+  db:
+    networks:
+      - backend
+```
+
+### 4. Volume Patterns
+
+```yaml
+volumes:
+  # Named volume (managed by Docker)
+  postgres-data:
+    driver: local
+  
+  # Bind mount (host directory)
+  # ./data:/app/data
+
+services:
+  db:
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+      - ./init-scripts:/docker-entrypoint-initdb.d:ro
+  
+  app:
+    volumes:
+      - ./config:/app/config:ro
+      - app-logs:/app/logs
+
+volumes:
+  app-logs:
+```
+
+### 5. Health Checks & Dependencies
+
+```yaml
+services:
+  db:
+    image: postgres:15-alpine
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+  
+  app:
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_started
+```
+
+### 6. Multi-Environment Configurations
+
+```yaml
+# docker-compose.yml (base)
+version: '3.8'
+services:
+  app:
+    image: myapp:latest
+    environment:
+      - NODE_ENV=production
+
+# docker-compose.dev.yml (development override)
+version: '3.8'
+services:
+  app:
+    build:
+      context: .
+      dockerfile: Dockerfile.dev
+    volumes:
+      - .:/app
+      - /app/node_modules
+    environment:
+      - NODE_ENV=development
+    ports:
+      - "3000:3000"
+    command: npm run dev
+
+# docker-compose.prod.yml (production override)
+version: '3.8'
+services:
+  app:
+    image: myapp:${VERSION}
+    deploy:
+      replicas: 3
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+    healthcheck:
+      test: ["CMD", "node", "healthcheck.js"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+```
+
+## Service Templates
+
+### Node.js API
+
+```yaml
+services:
+  api:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    environment:
+      - NODE_ENV=production
+      - PORT=3000
+      - DATABASE_URL=postgres://db:5432/app
+      - REDIS_URL=redis://redis:6379
+    ports:
+      - "3000:3000"
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_started
+    networks:
+      - backend
+    healthcheck:
+      test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+```
+
+### PostgreSQL Database
+
+```yaml
+services:
+  db:
+    image: postgres:15-alpine
+    environment:
+      POSTGRES_DB: app
+      POSTGRES_USER: ${DB_USER:-app}
+      POSTGRES_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD required}
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+      - ./init-scripts:/docker-entrypoint-initdb.d:ro
+    networks:
+      - backend
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER -d $POSTGRES_DB"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    deploy:
+      resources:
+        limits:
+          memory: 512M
+
+volumes:
+  postgres-data:
+```
+
+### Redis Cache
+
+```yaml
+services:
+  redis:
+    image: redis:7-alpine
+    command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
+    volumes:
+      - redis-data:/data
+    networks:
+      - backend
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+volumes:
+  redis-data:
+```
+
+### Nginx Reverse Proxy
+
+```yaml
+services:
+  nginx:
+    image: nginx:alpine
+    ports:
+      - "80:80"
+      - "443:443"
+    volumes:
+      - ./nginx.conf:/etc/nginx/nginx.conf:ro
+      - ./ssl:/etc/nginx/ssl:ro
+    depends_on:
+      - api
+    networks:
+      - frontend
+      - backend
+    healthcheck:
+      test: ["CMD", "nginx", "-t"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+```
+
+## Common Commands
+
+```bash
+# Start services
+docker-compose up -d
+
+# Start specific service
+docker-compose up -d app
+
+# View logs
+docker-compose logs -f app
+
+# Execute command in container
+docker-compose exec app sh
+docker-compose exec app npm test
+
+# Stop services
+docker-compose down
+
+# Stop and remove volumes
+docker-compose down -v
+
+# Rebuild images
+docker-compose build --no-cache app
+
+# Scale service
+docker-compose up -d --scale api=3
+
+# Multi-environment
+docker-compose -f docker-compose.yml -f docker-compose.dev.yml up
+docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
+```
+
+## Best Practices
+
+### Security
+
+1. **Never store secrets in images**
+   ```yaml
+   # Bad
+   environment:
+     - DB_PASSWORD=password123
+   
+   # Good
+   secrets:
+     - db_password
+   secrets:
+     db_password:
+       file: ./secrets/db_password.txt
+   ```
+
+2. **Use non-root user**
+   ```yaml
+   services:
+     app:
+       user: "1000:1000"
+   ```
+
+3. **Limit resources**
+   ```yaml
+   services:
+     app:
+       deploy:
+         resources:
+           limits:
+             cpus: '1'
+             memory: 1G
+   ```
+
+4. **Use internal networks for databases**
+   ```yaml
+   networks:
+     backend:
+       internal: true
+   ```
+
+### Performance
+
+1. **Enable health checks**
+   ```yaml
+   healthcheck:
+     test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+     interval: 30s
+     timeout: 10s
+     retries: 3
+     start_period: 40s
+   ```
+
+2. **Use .dockerignore**
+   ```
+   node_modules
+   .git
+   .env
+   *.log
+   coverage
+   .nyc_output
+   ```
+
+3. **Optimize build cache**
+   ```yaml
+   build:
+     context: .
+     dockerfile: Dockerfile
+     args:
+       - NODE_ENV=production
+   ```
+
+### Development
+
+1. **Use volumes for hot reload**
+   ```yaml
+   services:
+     app:
+       volumes:
+         - .:/app
+         - /app/node_modules  # Anonymous volume for node_modules
+   ```
+
+2. **Keep containers running**
+   ```yaml
+   services:
+     app:
+       stdin_open: true  # -i
+       tty: true         # -t
+   ```
+
+### Production
+
+1. **Use specific image versions**
+   ```yaml
+   # Bad
+   image: node:latest
+   
+   # Good
+   image: node:20-alpine
+   ```
+
+2. **Configure logging**
+   ```yaml
+   services:
+     app:
+       logging:
+         driver: "json-file"
+         options:
+           max-size: "10m"
+           max-file: "3"
+   ```
+
+3. **Restart policies**
+   ```yaml
+   services:
+     app:
+       restart: unless-stopped
+   ```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Container won't start**
+   ```bash
+   # Check logs
+   docker-compose logs app
+   
+   # Check container status
+   docker-compose ps
+   
+   # Inspect container
+   docker inspect myapp_app_1
+   ```
+
+2. **Network connectivity issues**
+   ```bash
+   # List networks
+   docker network ls
+   
+   # Inspect network
+   docker network inspect myapp_default
+   
+   # Test connectivity
+   docker-compose exec app ping db
+   ```
+
+3. **Volume permission issues**
+   ```bash
+   # Check volume
+   docker volume inspect myapp_postgres-data
+   
+   # Fix permissions (if needed)
+   docker-compose exec app chown -R node:node /app/data
+   ```
+
+4. **Health check failing**
+   ```bash
+   # Run health check manually
+   docker-compose exec app curl -f http://localhost:3000/health
+   
+   # Check health status
+   docker inspect --format='{{.State.Health.Status}}' myapp_app_1
+   ```
+
+5. **Out of disk space**
+   ```bash
+   # Clean up
+   docker system prune -a --volumes
+   
+   # Check disk usage
+   docker system df
+   ```
+
+## Integration with CI/CD
+
+### GitHub Actions
+
+```yaml
+# .github/workflows/test.yml
+name: Test
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      
+      - name: Build and test
+        run: |
+          docker-compose -f docker-compose.yml -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from app
+      
+      - name: Cleanup
+        if: always()
+        run: docker-compose down -v
+```
+
+### GitLab CI
+
+```yaml
+# .gitlab-ci.yml
+stages:
+  - test
+  - build
+
+test:
+  stage: test
+  script:
+    - docker-compose -f docker-compose.yml -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from app
+  after_script:
+    - docker-compose down -v
+
+build:
+  stage: build
+  script:
+    - docker build -t myapp:$CI_COMMIT_SHA .
+    - docker push myapp:$CI_COMMIT_SHA
+```
+
+## Related Skills
+
+| Skill | Purpose |
+|-------|---------|
+| `docker-swarm` | Orchestration with Docker Swarm |
+| `docker-security` | Container security patterns |
+| `docker-networking` | Advanced networking techniques |
+| `docker-monitoring` | Container monitoring and logging |
--- a/.kilo/skills/docker-compose/patterns/basic-service.md
+++ b/.kilo/skills/docker-compose/patterns/basic-service.md
@@ -0,0 +1,447 @@
+# Docker Compose Patterns
+
+## Pattern: Multi-Service Application
+
+Complete pattern for a typical web application with API, database, cache, and reverse proxy.
+
+```yaml
+version: '3.8'
+
+services:
+  # Reverse Proxy
+  nginx:
+    image: nginx:alpine
+    ports:
+      - "80:80"
+      - "443:443"
+    volumes:
+      - ./nginx.conf:/etc/nginx/nginx.conf:ro
+      - ./ssl:/etc/nginx/ssl:ro
+    depends_on:
+      - api
+    networks:
+      - frontend
+    deploy:
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 256M
+    healthcheck:
+      test: ["CMD", "nginx", "-t"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  # API Service
+  api:
+    build:
+      context: ./api
+      dockerfile: Dockerfile
+    environment:
+      - NODE_ENV=production
+      - DATABASE_URL=postgres://db:5432/app
+      - REDIS_URL=redis://cache:6379
+    depends_on:
+      db:
+        condition: service_healthy
+      cache:
+        condition: service_started
+    networks:
+      - frontend
+      - backend
+    deploy:
+      replicas: 3
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+    healthcheck:
+      test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+  # Database
+  db:
+    image: postgres:15-alpine
+    environment:
+      POSTGRES_DB: app
+      POSTGRES_USER: ${DB_USER:-app}
+      POSTGRES_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD required}
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+      - ./init-scripts:/docker-entrypoint-initdb.d:ro
+    networks:
+      - backend
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER -d $POSTGRES_DB"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    deploy:
+      resources:
+        limits:
+          cpus: '2'
+          memory: 2G
+
+  # Cache
+  cache:
+    image: redis:7-alpine
+    command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
+    volumes:
+      - redis-data:/data
+    networks:
+      - backend
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+networks:
+  frontend:
+    driver: bridge
+  backend:
+    driver: bridge
+    internal: true  # No external access
+
+volumes:
+  postgres-data:
+    driver: local
+  redis-data:
+    driver: local
+```
+
+## Pattern: Development Override
+
+Development-specific configuration with hot reload and debugging.
+
+```yaml
+# docker-compose.dev.yml
+version: '3.8'
+
+services:
+  api:
+    build:
+      context: ./api
+      dockerfile: Dockerfile.dev
+    volumes:
+      - ./api/src:/app/src:ro
+      - ./api/tests:/app/tests:ro
+      - /app/node_modules
+    environment:
+      - NODE_ENV=development
+      - DEBUG=app:*
+    ports:
+      - "3000:3000"
+      - "9229:9229"  # Node.js debugger
+    command: npm run dev
+
+  db:
+    ports:
+      - "5432:5432"  # Expose for local tools
+
+  cache:
+    ports:
+      - "6379:6379"  # Expose for local tools
+```
+
+```bash
+# Usage
+docker-compose -f docker-compose.yml -f docker-compose.dev.yml up
+```
+
+## Pattern: Production Override
+
+Production-optimized configuration with security and performance settings.
+
+```yaml
+# docker-compose.prod.yml
+version: '3.8'
+
+services:
+  api:
+    image: myapp/api:${VERSION}
+    deploy:
+      replicas: 3
+      update_config:
+        parallelism: 1
+        delay: 10s
+        failure_action: rollback
+      rollback_config:
+        parallelism: 1
+        delay: 10s
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+    environment:
+      - NODE_ENV=production
+    secrets:
+      - db_password
+      - jwt_secret
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "10m"
+        max-file: "5"
+
+secrets:
+  db_password:
+    external: true
+  jwt_secret:
+    external: true
+```
+
+```bash
+# Usage
+docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
+```
+
+## Pattern: Health Check Dependency
+
+Waiting for dependent services to be healthy before starting.
+
+```yaml
+services:
+  app:
+    depends_on:
+      db:
+        condition: service_healthy
+      cache:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+  db:
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  cache:
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+```
+
+## Pattern: Secrets Management
+
+Using Docker secrets for sensitive data (Swarm mode).
+
+```yaml
+services:
+  app:
+    secrets:
+      - db_password
+      - api_key
+      - jwt_secret
+    environment:
+      - DB_PASSWORD_FILE=/run/secrets/db_password
+      - API_KEY_FILE=/run/secrets/api_key
+      - JWT_SECRET_FILE=/run/secrets/jwt_secret
+
+secrets:
+  db_password:
+    file: ./secrets/db_password.txt
+  api_key:
+    file: ./secrets/api_key.txt
+  jwt_secret:
+    external: true  # Created via: echo "secret" | docker secret create jwt_secret -
+```
+
+## Pattern: Resource Limits
+
+Setting resource constraints for containers.
+
+```yaml
+services:
+  api:
+    deploy:
+      resources:
+        limits:
+          cpus: '1.0'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+    # Alternative for non-Swarm
+    mem_limit: 1G
+    memswap_limit: 1G
+    cpus: 1
+```
+
+## Pattern: Network Isolation
+
+Segmenting networks for security.
+
+```yaml
+services:
+  web:
+    networks:
+      - frontend
+      - backend
+
+  api:
+    networks:
+      - backend
+      - database
+
+  db:
+    networks:
+      - database
+
+networks:
+  frontend:
+    driver: bridge
+  backend:
+    driver: bridge
+  database:
+    driver: bridge
+    internal: true  # No internet access
+```
+
+## Pattern: Volume Management
+
+Different volume types for different use cases.
+
+```yaml
+services:
+  app:
+    volumes:
+      # Named volume (managed by Docker)
+      - app-data:/app/data
+      # Bind mount (host directory)
+      - ./config:/app/config:ro
+      # Anonymous volume (for node_modules)
+      - /app/node_modules
+      # tmpfs (temporary in-memory)
+      - type: tmpfs
+        target: /tmp
+        tmpfs:
+          size: 100M
+
+volumes:
+  app-data:
+    driver: local
+    labels:
+      - "app=myapp"
+      - "type=persistent"
+```
+
+## Pattern: Logging Configuration
+
+Configuring logging drivers and options.
+
+```yaml
+services:
+  app:
+    logging:
+      driver: "json-file"  # Default
+      options:
+        max-size: "10m"
+        max-file: "3"
+        labels: "app,environment"
+        tag: "{{.ImageName}}/{{.Name}}"
+
+  # Syslog logging
+  app-syslog:
+    logging:
+      driver: "syslog"
+      options:
+        syslog-address: "tcp://logserver:514"
+        syslog-facility: "daemon"
+        tag: "myapp"
+
+  # Fluentd logging
+  app-fluentd:
+    logging:
+      driver: "fluentd"
+      options:
+        fluentd-address: "localhost:24224"
+        tag: "myapp.api"
+```
+
+## Pattern: Multi-Environment
+
+Managing multiple environments with overrides.
+
+```bash
+# Directory structure
+# docker-compose.yml          # Base configuration
+# docker-compose.dev.yml      # Development overrides
+# docker-compose.staging.yml   # Staging overrides
+# docker-compose.prod.yml      # Production overrides
+# .env                         # Environment variables
+# .env.dev                     # Development variables
+# .env.staging                 # Staging variables
+# .env.prod                    # Production variables
+
+# Development
+docker-compose --env-file .env.dev \
+  -f docker-compose.yml -f docker-compose.dev.yml up
+
+# Staging
+docker-compose --env-file .env.staging \
+  -f docker-compose.yml -f docker-compose.staging.yml up -d
+
+# Production
+docker-compose --env-file .env.prod \
+  -f docker-compose.yml -f docker-compose.prod.yml up -d
+```
+
+## Pattern: CI/CD Testing
+
+Running tests in isolated containers.
+
+```yaml
+# docker-compose.test.yml
+version: '3.8'
+
+services:
+  app:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    environment:
+      - NODE_ENV=test
+      - DATABASE_URL=postgres://test:test@db:5432/test
+    depends_on:
+      - db
+    command: npm test
+    networks:
+      - test-network
+
+  db:
+    image: postgres:15-alpine
+    environment:
+      POSTGRES_DB: test
+      POSTGRES_USER: test
+      POSTGRES_PASSWORD: test
+    networks:
+      - test-network
+
+networks:
+  test-network:
+    driver: bridge
+```
+
+```bash
+# CI pipeline
+docker-compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from app
+docker-compose -f docker-compose.test.yml down -v
+```
--- a/.kilo/skills/docker-monitoring/SKILL.md
+++ b/.kilo/skills/docker-monitoring/SKILL.md
@@ -0,0 +1,756 @@
+# Skill: Docker Monitoring & Logging
+
+## Purpose
+
+Comprehensive skill for Docker container monitoring, logging, metrics collection, and observability.
+
+## Overview
+
+Container monitoring is essential for understanding application health, performance, and troubleshooting issues in production. Use this skill for setting up monitoring stacks, configuring logging, and implementing observability.
+
+## When to Use
+
+- Setting up container monitoring
+- Configuring centralized logging
+- Implementing health checks
+- Performance optimization
+- Troubleshooting container issues
+- Alerting configuration
+
+## Monitoring Stack
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                 Container Monitoring Stack                   │
+├─────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
+│  │   Grafana   │  │  Prometheus │  │  Alertmgr   │         │
+│  │  Dashboard  │  │   Metrics   │  │  Alerts     │         │
+│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘         │
+│         │                │                │                 │
+│  ┌──────┴────────────────┴────────────────┴──────┐         │
+│  │              Container Observability           │         │
+│  └──────┬────────────────┬───────────────────────┘         │
+│         │                │                                  │
+│  ┌──────┴──────┐  ┌──────┴──────┐  ┌─────────────┐         │
+│  │  cAdvisor   │  │ node-exporter│  │ Loki/EFK    │         │
+│  │ Container   │  │ Node Metrics│  │ Logging     │         │
+│  │ Metrics     │  │             │  │             │         │
+│  └─────────────┘  └─────────────┘  └─────────────┘         │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Health Checks
+
+### 1. Dockerfile Health Check
+
+```dockerfile
+FROM node:20-alpine
+
+WORKDIR /app
+COPY . .
+RUN npm ci --only=production
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
+
+# Or for Alpine (no wget)
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+  CMD curl -f http://localhost:3000/health || exit 1
+
+# Or use Node.js for health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+  CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"
+```
+
+### 2. Docker Compose Health Check
+
+```yaml
+services:
+  api:
+    image: myapp:latest
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+    
+  db:
+    image: postgres:15-alpine
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+```
+
+### 3. Docker Swarm Health Check
+
+```yaml
+services:
+  api:
+    image: myapp:latest
+    deploy:
+      update_config:
+        failure_action: rollback
+        monitor: 30s
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+```
+
+### 4. Application Health Endpoint
+
+```javascript
+// Node.js health check endpoint
+const express = require('express');
+const app = express();
+
+// Dependencies status
+async function checkHealth() {
+  const checks = {
+    database: await checkDatabase(),
+    redis: await checkRedis(),
+    disk: checkDiskSpace(),
+    memory: checkMemory()
+  };
+  
+  const healthy = Object.values(checks).every(c => c === 'healthy');
+  
+  return {
+    status: healthy ? 'healthy' : 'unhealthy',
+    timestamp: new Date().toISOString(),
+    checks
+  };
+}
+
+app.get('/health', async (req, res) => {
+  const health = await checkHealth();
+  const status = health.status === 'healthy' ? 200 : 503;
+  res.status(status).json(health);
+});
+
+app.get('/health/live', (req, res) => {
+  // Liveness probe - is the app running?
+  res.status(200).json({ status: 'alive' });
+});
+
+app.get('/health/ready', async (req, res) => {
+  // Readiness probe - is the app ready to serve?
+  const ready = await isReady();
+  res.status(ready ? 200 : 503).json({ ready });
+});
+```
+
+## Logging
+
+### 1. Docker Logging Drivers
+
+```yaml
+# JSON file driver (default)
+services:
+  api:
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "10m"
+        max-file: "3"
+        labels: "app,environment"
+
+# Syslog driver
+services:
+  api:
+    logging:
+      driver: "syslog"
+      options:
+        syslog-address: "tcp://logserver:514"
+        syslog-facility: "daemon"
+        tag: "myapp"
+
+# Journald driver
+services:
+  api:
+    logging:
+      driver: "journald"
+      options:
+        labels: "app,environment"
+
+# Fluentd driver
+services:
+  api:
+    logging:
+      driver: "fluentd"
+      options:
+        fluentd-address: "localhost:24224"
+        tag: "myapp.api"
+```
+
+### 2. Structured Logging
+
+```javascript
+// Pino for structured logging
+const pino = require('pino');
+
+const logger = pino({
+  level: process.env.LOG_LEVEL || 'info',
+  formatters: {
+    level: (label) => ({ level: label })
+  },
+  timestamp: pino.stdTimeFunctions.isoTime
+});
+
+// Log with context
+logger.info({
+  userId: '123',
+  action: 'login',
+  ip: '192.168.1.1'
+}, 'User logged in');
+
+// Output:
+// {"level":"info","time":"2024-01-01T12:00:00.000Z","userId":"123","action":"login","ip":"192.168.1.1","msg":"User logged in"}
+```
+
+### 3. EFK Stack (Elasticsearch, Fluentd, Kibana)
+
+```yaml
+# docker-compose.yml
+version: '3.8'
+
+services:
+  elasticsearch:
+    image: elasticsearch:8.10.0
+    environment:
+      - discovery.type=single-node
+      - xpack.security.enabled=false
+    volumes:
+      - elasticsearch-data:/usr/share/elasticsearch/data
+    networks:
+      - logging
+
+  fluentd:
+    image: fluent/fluentd:v1.16
+    volumes:
+      - ./fluentd/conf:/fluentd/etc
+    ports:
+      - "24224:24224"
+    networks:
+      - logging
+
+  kibana:
+    image: kibana:8.10.0
+    environment:
+      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
+    ports:
+      - "5601:5601"
+    networks:
+      - logging
+
+  app:
+    image: myapp:latest
+    logging:
+      driver: "fluentd"
+      options:
+        fluentd-address: "localhost:24224"
+        tag: "myapp.api"
+    networks:
+      - logging
+
+volumes:
+  elasticsearch-data:
+
+networks:
+  logging:
+```
+
+### 4. Loki Stack (Promtail, Loki, Grafana)
+
+```yaml
+# docker-compose.yml
+version: '3.8'
+
+services:
+  loki:
+    image: grafana/loki:latest
+    ports:
+      - "3100:3100"
+    volumes:
+      - ./loki-config.yml:/etc/loki/local-config.yaml
+    command: -config.file=/etc/loki/local-config.yaml
+    networks:
+      - monitoring
+
+  promtail:
+    image: grafana/promtail:latest
+    volumes:
+      - /var/log:/var/log
+      - ./promtail-config.yml:/etc/promtail/config.yml
+    command: -config.file=/etc/promtail/config.yml
+    networks:
+      - monitoring
+
+  grafana:
+    image: grafana/grafana:latest
+    ports:
+      - "3000:3000"
+    environment:
+      - GF_SECURITY_ADMIN_PASSWORD=admin
+    volumes:
+      - grafana-data:/var/lib/grafana
+    networks:
+      - monitoring
+
+  app:
+    image: myapp:latest
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "10m"
+        max-file: "3"
+    networks:
+      - monitoring
+
+volumes:
+  grafana-data:
+
+networks:
+  monitoring:
+```
+
+## Metrics Collection
+
+### 1. Prometheus + cAdvisor
+
+```yaml
+# docker-compose.yml
+version: '3.8'
+
+services:
+  prometheus:
+    image: prom/prometheus:latest
+    ports:
+      - "9090:9090"
+    volumes:
+      - ./prometheus.yml:/etc/prometheus/prometheus.yml
+      - prometheus-data:/prometheus
+    command:
+      - '--config.file=/etc/prometheus/prometheus.yml'
+      - '--storage.tsdb.retention.time=30d'
+    networks:
+      - monitoring
+
+  cadvisor:
+    image: gcr.io/cadvisor/cadvisor:latest
+    ports:
+      - "8080:8080"
+    volumes:
+      - /:/rootfs:ro
+      - /var/run:/var/run:ro
+      - /sys:/sys:ro
+      - /var/lib/docker/:/var/lib/docker:ro
+    networks:
+      - monitoring
+
+  node_exporter:
+    image: prom/node-exporter:latest
+    ports:
+      - "9100:9100"
+    volumes:
+      - /proc:/host/proc:ro
+      - /sys:/host/sys:ro
+      - /:/rootfs:ro
+    command:
+      - '--path.procfs=/host/proc'
+      - '--path.rootfs=/rootfs'
+      - '--path.sysfs=/host/sys'
+    networks:
+      - monitoring
+
+  grafana:
+    image: grafana/grafana:latest
+    ports:
+      - "3000:3000"
+    environment:
+      - GF_SECURITY_ADMIN_PASSWORD=admin
+    volumes:
+      - grafana-data:/var/lib/grafana
+    networks:
+      - monitoring
+
+volumes:
+  prometheus-data:
+  grafana-data:
+
+networks:
+  monitoring:
+```
+
+### 2. Prometheus Configuration
+
+```yaml
+# prometheus.yml
+global:
+  scrape_interval: 15s
+  evaluation_interval: 15s
+
+scrape_configs:
+  # Prometheus itself
+  - job_name: 'prometheus'
+    static_configs:
+      - targets: ['prometheus:9090']
+
+  # cAdvisor (container metrics)
+  - job_name: 'cadvisor'
+    static_configs:
+      - targets: ['cadvisor:8080']
+
+  # Node exporter (host metrics)
+  - job_name: 'node'
+    static_configs:
+      - targets: ['node_exporter:9100']
+
+  # Application metrics
+  - job_name: 'app'
+    static_configs:
+      - targets: ['app:3000']
+    metrics_path: '/metrics'
+```
+
+### 3. Application Metrics (Prometheus Client)
+
+```javascript
+// Node.js with prom-client
+const promClient = require('prom-client');
+
+// Enable default metrics
+promClient.collectDefaultMetrics();
+
+// Custom metrics
+const httpRequestDuration = new promClient.Histogram({
+  name: 'http_request_duration_seconds',
+  help: 'Duration of HTTP requests in seconds',
+  labelNames: ['method', 'route', 'status_code'],
+  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
+});
+
+const activeConnections = new promClient.Gauge({
+  name: 'active_connections',
+  help: 'Number of active connections'
+});
+
+const dbQueryDuration = new promClient.Histogram({
+  name: 'db_query_duration_seconds',
+  help: 'Duration of database queries in seconds',
+  labelNames: ['query_type', 'table'],
+  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2]
+});
+
+// Middleware for HTTP metrics
+app.use((req, res, next) => {
+  const end = httpRequestDuration.startTimer();
+  res.on('finish', () => {
+    end({ method: req.method, route: req.route?.path || req.path, status_code: res.statusCode });
+  });
+  next();
+});
+
+// Metrics endpoint
+app.get('/metrics', async (req, res) => {
+  res.set('Content-Type', promClient.register.contentType);
+  res.send(await promClient.register.metrics());
+});
+```
+
+### 4. Grafana Dashboards
+
+```json
+// Dashboard JSON for container metrics
+{
+  "dashboard": {
+    "title": "Docker Container Metrics",
+    "panels": [
+      {
+        "title": "Container CPU Usage",
+        "targets": [
+          {
+            "expr": "rate(container_cpu_usage_seconds_total{name=~\".+\"}[5m]) * 100",
+            "legendFormat": "{{name}}"
+          }
+        ]
+      },
+      {
+        "title": "Container Memory Usage",
+        "targets": [
+          {
+            "expr": "container_memory_usage_bytes{name=~\".+\"} / 1024 / 1024",
+            "legendFormat": "{{name}} MB"
+          }
+        ]
+      },
+      {
+        "title": "Container Network I/O",
+        "targets": [
+          {
+            "expr": "rate(container_network_receive_bytes_total{name=~\".+\"}[5m])",
+            "legendFormat": "{{name}} RX"
+          },
+          {
+            "expr": "rate(container_network_transmit_bytes_total{name=~\".+\"}[5m])",
+            "legendFormat": "{{name}} TX"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+## Alerting
+
+### 1. Alertmanager Configuration
+
+```yaml
+# alertmanager.yml
+global:
+  smtp_smarthost: 'smtp.example.com:587'
+  smtp_from: 'alerts@example.com'
+  smtp_auth_username: 'alerts@example.com'
+  smtp_auth_password: 'password'
+
+route:
+  group_by: ['alertname', 'severity']
+  group_wait: 30s
+  group_interval: 5m
+  repeat_interval: 1h
+  receiver: 'team-email'
+  routes:
+    - match:
+        severity: critical
+      receiver: 'team-email-critical'
+    - match:
+        severity: warning
+      receiver: 'team-email-warning'
+
+receivers:
+  - name: 'team-email-critical'
+    email_configs:
+      - to: 'critical@example.com'
+        send_resolved: true
+  
+  - name: 'team-email-warning'
+    email_configs:
+      - to: 'warnings@example.com'
+        send_resolved: true
+```
+
+### 2. Prometheus Alert Rules
+
+```yaml
+# alerts.yml
+groups:
+  - name: container_alerts
+    rules:
+      # Container down
+      - alert: ContainerDown
+        expr: absent(container_last_seen{name=~".+"})
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Container {{ $labels.name }} is down"
+          description: "Container {{ $labels.name }} has been down for more than 5 minutes."
+      
+      # High CPU
+      - alert: HighCpuUsage
+        expr: rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) * 100 > 80
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High CPU usage on {{ $labels.name }}"
+          description: "Container {{ $labels.name }} CPU usage is {{ $value }}%."
+      
+      # High Memory
+      - alert: HighMemoryUsage
+        expr: (container_memory_usage_bytes{name=~".+"} / container_spec_memory_limit_bytes{name=~".+"}) * 100 > 80
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High memory usage on {{ $labels.name }}"
+          description: "Container {{ $labels.name }} memory usage is {{ $value }}%."
+      
+      # Container restart
+      - alert: ContainerRestart
+        expr: increase(container_restart_count{name=~".+"}[1h]) > 0
+        labels:
+          severity: warning
+        annotations:
+          summary: "Container {{ $labels.name }} restarted"
+          description: "Container {{ $labels.name }} has restarted {{ $value }} times in the last hour."
+      
+      # No health check
+      - alert: NoHealthCheck
+        expr: container_health_status{name=~".+"} == 0
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Health check failing for {{ $labels.name }}"
+          description: "Container {{ $labels.name }} health check has been failing for 5 minutes."
+```
+
+## Observability Best Practices
+
+### 1. Three Pillars
+
+| Pillar | Tool | Purpose |
+|--------|------|---------|
+| Metrics | Prometheus | Quantitative measurements |
+| Logs | Loki/EFK | Event records |
+| Traces | Jaeger/Zipkin | Request flow |
+
+### 2. Metrics Categories
+
+```yaml
+# Four Golden Signals (Google SRE)
+
+# 1. Latency
+- http_request_duration_seconds
+- db_query_duration_seconds
+
+# 2. Traffic
+- http_requests_per_second
+- active_connections
+
+# 3. Errors
+- http_requests_failed_total
+- error_rate
+
+# 4. Saturation
+- container_memory_usage_bytes
+- container_cpu_usage_seconds_total
+```
+
+### 3. Service Level Objectives (SLOs)
+
+```yaml
+# Prometheus recording rules for SLO
+groups:
+  - name: slo_rules
+    rules:
+      - record: slo:availability:ratio_5m
+        expr: |
+          sum(rate(http_requests_total{status!~"5.."}[5m])) /
+          sum(rate(http_requests_total[5m]))
+      
+      - record: slo:latency:p99_5m
+        expr: |
+          histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
+      
+      - record: slo:error_rate:ratio_5m
+        expr: |
+          sum(rate(http_requests_total{status=~"5.."}[5m])) /
+          sum(rate(http_requests_total[5m]))
+```
+
+## Troubleshooting Commands
+
+```bash
+# View container logs
+docker logs <container_id>
+docker logs -f --tail 100 <container_id>
+
+# View resource usage
+docker stats
+docker stats --no-stream
+
+# Inspect container
+docker inspect <container_id>
+
+# Check health status
+docker inspect --format='{{.State.Health.Status}}' <container_id>
+
+# View processes
+docker top <container_id>
+
+# Execute commands
+docker exec -it <container_id> sh
+docker exec <container_id> df -h
+
+# View network
+docker network inspect <network_name>
+
+# View disk usage
+docker system df
+docker system df -v
+
+# Prune unused resources
+docker system prune -a --volumes
+
+# Swarm service logs
+docker service logs <service_name>
+docker service ps <service_name>
+
+# Swarm node status
+docker node ls
+docker node inspect <node_id>
+```
+
+## Performance Tuning
+
+### 1. Container Resource Limits
+
+```yaml
+services:
+  api:
+    deploy:
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+```
+
+### 2. Logging Performance
+
+```yaml
+services:
+  api:
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "10m"
+        max-file: "3"
+        # Reduce logging overhead
+        labels: "level,requestId"
+```
+
+### 3. Prometheus Optimization
+
+```yaml
+# prometheus.yml
+global:
+  scrape_interval: 15s      # Balance between granularity and load
+  evaluation_interval: 15s
+
+# Retention
+command:
+  - '--storage.tsdb.retention.time=30d'
+  - '--storage.tsdb.retention.size=10GB'
+```
+
+## Related Skills
+
+| Skill | Purpose |
+|-------|---------|
+| `docker-compose` | Local development setup |
+| `docker-swarm` | Production orchestration |
+| `docker-security` | Container security |
+| `kubernetes` | Advanced orchestration |
--- a/.kilo/skills/docker-security/SKILL.md
+++ b/.kilo/skills/docker-security/SKILL.md
@@ -0,0 +1,685 @@
+# Skill: Docker Security
+
+## Purpose
+
+Comprehensive skill for Docker container security, vulnerability scanning, secrets management, and hardening best practices.
+
+## Overview
+
+Container security is essential for production deployments. Use this skill when scanning for vulnerabilities, configuring security settings, managing secrets, and implementing security best practices.
+
+## When to Use
+
+- Security hardening containers
+- Scanning images for vulnerabilities
+- Managing secrets and credentials
+- Configuring container isolation
+- Implementing least privilege
+- Security audits
+
+## Security Layers
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    Container Security Layers                 │
+├─────────────────────────────────────────────────────────────┤
+│  1. Host Security                                            │
+│     - Kernel hardening                                       │
+│     - SELinux/AppArmor                                       │
+│     - cgroups namespace                                      │
+├─────────────────────────────────────────────────────────────┤
+│  2. Container Runtime Security                               │
+│     - User namespace                                         │
+│     - Seccomp profiles                                       │
+│     - Capability dropping                                    │
+├─────────────────────────────────────────────────────────────┤
+│  3. Image Security                                           │
+│     - Minimal base images                                    │
+│     - Vulnerability scanning                                 │
+│     - No secrets in images                                   │
+├─────────────────────────────────────────────────────────────┤
+│  4. Network Security                                         │
+│     - Network policies                                       │
+│     - TLS encryption                                         │
+│     - Ingress controls                                       │
+├─────────────────────────────────────────────────────────────┤
+│  5. Application Security                                     │
+│     - Input validation                                       │
+│     - Authentication                                         │
+│     - Authorization                                          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Image Security
+
+### 1. Base Image Selection
+
+```dockerfile
+# ✅ Good: Minimal, specific version
+FROM node:20-alpine
+
+# ✅ Better: Distroless (minimal attack surface)
+FROM gcr.io/distroless/nodejs20-debian12
+
+# ❌ Bad: Large base, latest tag
+FROM node:latest
+```
+
+### 2. Multi-stage Builds
+
+```dockerfile
+# Build stage
+FROM node:20-alpine AS builder
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci
+COPY . .
+RUN npm run build
+
+# Runtime stage
+FROM node:20-alpine
+RUN addgroup -g 1001 appgroup && \
+    adduser -u 1001 -G appgroup -D appuser
+WORKDIR /app
+COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
+COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
+USER appuser
+CMD ["node", "dist/index.js"]
+```
+
+### 3. Vulnerability Scanning
+
+```bash
+# Scan with Trivy
+trivy image myapp:latest
+
+# Scan with Docker Scout
+docker scout vulnerabilities myapp:latest
+
+# Scan with Grype
+grype myapp:latest
+
+# CI/CD integration
+trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:latest
+```
+
+### 4. No Secrets in Images
+
+```dockerfile
+# ❌ Never do this
+ENV DATABASE_PASSWORD=password123
+COPY .env ./
+
+# ✅ Use runtime secrets
+# Secrets are mounted at runtime
+RUN --mount=type=secret,id=db_password \
+    export DB_PASSWORD=$(cat /run/secrets/db_password)
+```
+
+## Container Runtime Security
+
+### 1. Non-root User
+
+```dockerfile
+# Create non-root user
+FROM alpine:3.18
+RUN addgroup -g 1001 appgroup && \
+    adduser -u 1001 -G appgroup -D appuser
+WORKDIR /app
+COPY --chown=appuser:appgroup . .
+USER appuser
+CMD ["./app"]
+```
+
+### 2. Read-only Filesystem
+
+```yaml
+# docker-compose.yml
+services:
+  app:
+    image: myapp:latest
+    read_only: true
+    tmpfs:
+      - /tmp
+      - /var/cache
+```
+
+### 3. Capability Dropping
+
+```yaml
+# Drop all capabilities
+services:
+  app:
+    image: myapp:latest
+    cap_drop:
+      - ALL
+    cap_add:
+      - CHOWN      # Only needed capabilities
+      - SETGID
+      - SETUID
+```
+
+### 4. Security Options
+
+```yaml
+services:
+  app:
+    image: myapp:latest
+    security_opt:
+      - no-new-privileges:true    # Prevent privilege escalation
+      - seccomp:default.json       # Seccomp profile
+      - apparmor:docker-default    # AppArmor profile
+```
+
+### 5. Resource Limits
+
+```yaml
+services:
+  app:
+    image: myapp:latest
+    deploy:
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+    pids_limit: 100  # Limit process count
+```
+
+## Secrets Management
+
+### 1. Docker Secrets (Swarm)
+
+```bash
+# Create secret
+echo "my_password" | docker secret create db_password -
+
+# Create from file
+docker secret create jwt_secret ./secrets/jwt.txt
+```
+
+```yaml
+# docker-compose.yml (Swarm)
+services:
+  api:
+    image: myapp:latest
+    secrets:
+      - db_password
+      - jwt_secret
+    environment:
+      - DB_PASSWORD_FILE=/run/secrets/db_password
+
+secrets:
+  db_password:
+    external: true
+  jwt_secret:
+    external: true
+```
+
+### 2. Docker Compose Secrets (Non-Swarm)
+
+```yaml
+# docker-compose.yml
+services:
+  api:
+    image: myapp:latest
+    secrets:
+      - db_password
+    environment:
+      - DB_PASSWORD_FILE=/run/secrets/db_password
+
+secrets:
+  db_password:
+    file: ./secrets/db_password.txt
+```
+
+### 3. Environment Variables (Development)
+
+```yaml
+# docker-compose.yml (development only)
+services:
+  api:
+    image: myapp:latest
+    env_file:
+      - .env    # Add .env to .gitignore!
+```
+
+```bash
+# .env (NEVER COMMIT)
+DATABASE_URL=postgres://...
+JWT_SECRET=secret123
+API_KEY=key123
+```
+
+### 4. Reading Secrets in Application
+
+```javascript
+// Node.js
+const fs = require('fs');
+
+function getSecret(secretName, envName) {
+  // Try file-based secret first (Docker secrets)
+  const secretPath = `/run/secrets/${secretName}`;
+  if (fs.existsSync(secretPath)) {
+    return fs.readFileSync(secretPath, 'utf8').trim();
+  }
+  // Fallback to environment variable (development)
+  return process.env[envName];
+}
+
+const dbPassword = getSecret('db_password', 'DB_PASSWORD');
+```
+
+## Network Security
+
+### 1. Network Segmentation
+
+```yaml
+# Separate networks for different access levels
+networks:
+  frontend:
+    driver: bridge
+  
+  backend:
+    driver: bridge
+    internal: true  # No external access
+  
+  database:
+    driver: bridge
+    internal: true
+
+services:
+  web:
+    networks:
+      - frontend
+  
+  api:
+    networks:
+      - frontend
+      - backend
+  
+  db:
+    networks:
+      - database
+  
+  cache:
+    networks:
+      - database
+```
+
+### 2. Port Exposure
+
+```yaml
+# ✅ Good: Only expose necessary ports
+services:
+  api:
+    ports:
+      - "3000:3000"  # API port only
+  
+  db:
+    # No ports exposed - only accessible inside network
+    networks:
+      - database
+
+# ❌ Bad: Exposing database to host
+services:
+  db:
+    ports:
+      - "5432:5432"  # Security risk!
+```
+
+### 3. TLS Configuration
+
+```yaml
+services:
+  nginx:
+    image: nginx:alpine
+    ports:
+      - "443:443"
+    volumes:
+      - ./ssl/cert.pem:/etc/nginx/ssl/cert.pem:ro
+      - ./ssl/key.pem:/etc/nginx/ssl/key.pem:ro
+    configs:
+      - source: nginx_config
+        target: /etc/nginx/nginx.conf
+
+configs:
+  nginx_config:
+    file: ./nginx.conf
+```
+
+### 4. Ingress Controls
+
+```yaml
+# Limit connections
+services:
+  api:
+    image: myapp:latest
+    ports:
+      - target: 3000
+        published: 3000
+        mode: host  # Bypass ingress mesh for performance
+    deploy:
+      endpoint_mode: dnsrr
+      resources:
+        limits:
+          memory: 1G
+```
+
+## Security Profiles
+
+### 1. Seccomp Profile
+
+```json
+// default-seccomp.json
+{
+  "defaultAction": "SCMP_ACT_ERRNO",
+  "architectures": ["SCMP_ARCH_X86_64"],
+  "syscalls": [
+    {
+      "names": ["read", "write", "exit", "exit_group"],
+      "action": "SCMP_ACT_ALLOW"
+    },
+    {
+      "names": ["open", "openat", "close"],
+      "action": "SCMP_ACT_ALLOW"
+    }
+  ]
+}
+```
+
+```yaml
+# Use custom seccomp profile
+services:
+  api:
+    security_opt:
+      - seccomp:./seccomp.json
+```
+
+### 2. AppArmor Profile
+
+```bash
+# Create AppArmor profile
+cat > /etc/apparmor.d/docker-myapp <<EOF
+#include <tunables/global>
+profile docker-myapp flags=(attach_disconnected,mediate_deleted) {
+  #include <abstractions/base>
+  
+  network inet tcp,
+  network inet udp,
+  
+  /app/** r,
+  /app/** w,
+  
+  deny /** rw,
+}
+EOF
+
+# Load profile
+apparmor_parser -r /etc/apparmor.d/docker-myapp
+```
+
+```yaml
+# Use AppArmor profile
+services:
+  api:
+    security_opt:
+      - apparmor:docker-myapp
+```
+
+## Security Scanning
+
+### 1. Image Vulnerability Scan
+
+```bash
+# Trivy scan
+trivy image --severity HIGH,CRITICAL myapp:latest
+
+# Docker Scout
+docker scout vulnerabilities myapp:latest
+
+# Grype
+grype myapp:latest
+
+# Output JSON for CI
+trivy image --format json --output results.json myapp:latest
+```
+
+### 2. Base Image Updates
+
+```bash
+# Check base image for updates
+docker pull node:20-alpine
+
+# Rebuild with updated base
+docker build --no-cache -t myapp:latest .
+
+# Scan new image
+trivy image myapp:latest
+```
+
+### 3. Dependency Audit
+
+```bash
+# Node.js
+npm audit
+npm audit fix
+
+# Python
+pip-audit
+
+# Go
+go list -m all | nancy
+
+# General
+snyk test
+```
+
+### 4. Secret Detection
+
+```bash
+# Scan for secrets
+gitleaks --path . --verbose
+
+# Pre-commit hook
+gitleaks protect --staged
+
+# Docker image
+gitleaks --image myapp:latest
+```
+
+## CI/CD Security Integration
+
+### GitHub Actions
+
+```yaml
+# .github/workflows/security.yml
+name: Security Scan
+
+on: [push, pull_request]
+
+jobs:
+  scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      
+      - name: Run Trivy vulnerability scanner
+        uses: aquasecurity/trivy-action@master
+        with:
+          image-ref: 'myapp:${{ github.sha }}'
+          format: 'table'
+          exit-code: '1'
+          severity: 'CRITICAL,HIGH'
+      
+      - name: Run Gitleaks secret scan
+        uses: gitleaks/gitleaks-action@v2
+        with:
+          args: --path=.
+```
+
+### GitLab CI
+
+```yaml
+# .gitlab-ci.yml
+security_scan:
+  stage: test
+  image: docker:24
+  services:
+    - docker:dind
+  script:
+    - docker build -t myapp:$CI_COMMIT_SHA .
+    - trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:$CI_COMMIT_SHA
+    - gitleaks --path . --verbose
+```
+
+## Security Checklist
+
+### Dockerfile Security
+
+- [ ] Using minimal base image (alpine/distroless)
+- [ ] Specific version tags, not `latest`
+- [ ] Running as non-root user
+- [ ] No secrets in image
+- [ ] `.dockerignore` includes `.env`, `.git`, `.credentials`
+- [ ] COPY instead of ADD (unless needed)
+- [ ] Multi-stage build for smaller image
+- [ ] HEALTHCHECK defined
+
+### Runtime Security
+
+- [ ] Read-only filesystem
+- [ ] Capabilities dropped
+- [ ] No new privileges
+- [ ] Resource limits set
+- [ ] User namespace enabled (if available)
+- [ ] Seccomp/AppArmor profiles applied
+
+### Network Security
+
+- [ ] Only necessary ports exposed
+- [ ] Internal networks for sensitive services
+- [ ] TLS for external communication
+- [ ] Network segmentation
+
+### Secrets Management
+
+- [ ] No secrets in images
+- [ ] Using Docker secrets or external vault
+- [ ] `.env` files gitignored
+- [ ] Secret rotation implemented
+
+### CI/CD Security
+
+- [ ] Vulnerability scanning in pipeline
+- [ ] Secret detection pre-commit
+- [ ] Dependency audit automated
+- [ ] Base images updated regularly
+
+## Remediation Priority
+
+| Severity | Priority | Timeline |
+|----------|----------|----------|
+| Critical | P0 | Immediately (24h) |
+| High | P1 | Within 7 days |
+| Medium | P2 | Within 30 days |
+| Low | P3 | Next release |
+
+## Security Tools
+
+| Tool | Purpose |
+|------|---------|
+| Trivy | Image vulnerability scanning |
+| Docker Scout | Docker's built-in scanner |
+| Grype | Vulnerability scanner |
+| Gitleaks | Secret detection |
+| Snyk | Dependency scanning |
+| Falco | Runtime security monitoring |
+| Anchore | Container security analysis |
+| Clair | Open-source vulnerability scanner |
+
+## Common Vulnerabilities
+
+### CVE Examples
+
+```yaml
+# Check for specific CVE
+trivy image --vulnerabilities CVE-2021-44228 myapp:latest
+
+# Ignore specific CVE (use carefully)
+trivy image --ignorefile .trivyignore myapp:latest
+
+# .trivyignore
+CVE-2021-12345  # Known and accepted
+```
+
+### Log4j Example (CVE-2021-44228)
+
+```bash
+# Check for vulnerable versions
+docker images --format '{{.Repository}}:{{.Tag}}' | xargs -I {} \
+  trivy image --vulnerabilities CVE-2021-44228 {}
+
+# Update and rebuild
+FROM node:20-alpine
+# Ensure no vulnerable log4j dependency
+RUN npm audit fix
+```
+
+## Incident Response
+
+### Security Breach Steps
+
+1. **Isolate**
+   ```bash
+   # Stop container
+   docker stop <container_id>
+   
+   # Remove from network
+   docker network disconnect app-network <container_id>
+   ```
+
+2. **Preserve Evidence**
+   ```bash
+   # Save container state
+   docker commit <container_id> incident-container
+   
+   # Export logs
+   docker logs <container_id> > incident-logs.txt
+   docker export <container_id> > incident-container.tar
+   ```
+
+3. **Analyze**
+   ```bash
+   # Inspect container
+   docker inspect <container_id>
+   
+   # Check image
+   trivy image <image_name>
+   
+   # Review process history
+   docker history <image_name>
+   ```
+
+4. **Remediate**
+   ```bash
+   # Update base image
+   docker pull node:20-alpine
+   
+   # Rebuild
+   docker build --no-cache -t myapp:fixed .
+   
+   # Scan
+   trivy image myapp:fixed
+   ```
+
+## Related Skills
+
+| Skill | Purpose |
+|-------|---------|
+| `docker-compose` | Local development setup |
+| `docker-swarm` | Production orchestration |
+| `docker-monitoring` | Security monitoring |
+| `docker-networking` | Network security |
--- a/.kilo/skills/docker-swarm/SKILL.md
+++ b/.kilo/skills/docker-swarm/SKILL.md
@@ -0,0 +1,757 @@
+# Skill: Docker Swarm
+
+## Purpose
+
+Comprehensive skill for Docker Swarm orchestration, cluster management, and production-ready container deployment.
+
+## Overview
+
+Docker Swarm is Docker's native clustering and orchestration solution. Use this skill for production deployments, high availability setups, and managing containerized applications at scale.
+
+## When to Use
+
+- Deploying applications in production clusters
+- Setting up high availability services
+- Scaling services dynamically
+- Managing rolling updates
+- Handling secrets and configs securely
+- Multi-node orchestration
+
+## Core Concepts
+
+### Swarm Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     Docker Swarm Cluster                     │
+├─────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
+│  │   Manager   │  │   Manager   │  │   Manager   │ (HA)     │
+│  │   Node 1    │  │   Node 2    │  │   Node 3    │          │
+│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘          │
+│         │                │                │                  │
+│  ┌──────┴────────────────┴────────────────┴──────┐          │
+│  │              Internal Network                 │          │
+│  └──────┬────────────────┬──────────────────────┘          │
+│         │                │                                   │
+│  ┌──────┴──────┐  ┌──────┴──────┐  ┌─────────────┐          │
+│  │   Worker    │  │   Worker    │  │   Worker     │          │
+│  │   Node 4    │  │   Node 5    │  │   Node 6     │          │
+│  └─────────────┘  └─────────────┘  └─────────────┘          │
+│                                                              │
+│  Services: api, web, db, redis, queue                       │
+│  Tasks: Running containers distributed across nodes          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Key Components
+
+| Component | Description |
+|-----------|-------------|
+| **Service** | Definition of a container (image, ports, replicas) |
+| **Task** | Single running instance of a service |
+| **Stack** | Group of related services (like docker-compose) |
+| **Node** | Docker daemon participating in swarm |
+| **Overlay Network** | Network spanning multiple nodes |
+
+## Skill Files Structure
+
+```
+docker-swarm/
+├── SKILL.md              # This file
+├── patterns/
+│   ├── services.md      # Service deployment patterns
+│   ├── networking.md    # Overlay network patterns
+│   ├── secrets.md       # Secrets management
+│   └── configs.md       # Config management
+└── examples/
+    ├── ha-web-app.md    # High availability web app
+    ├── microservices.md # Microservices deployment
+    └── database.md      # Database cluster setup
+```
+
+## Core Patterns
+
+### 1. Initialize Swarm
+
+```bash
+# Initialize swarm on manager node
+docker swarm init --advertise-addr <MANAGER_IP>
+
+# Get join token for workers
+docker swarm join-token -q worker
+
+# Get join token for managers
+docker swarm join-token -q manager
+
+# Join swarm (on worker nodes)
+docker swarm join --token <TOKEN> <MANAGER_IP>:2377
+
+# Check swarm status
+docker node ls
+```
+
+### 2. Service Deployment
+
+```yaml
+# docker-compose.yml (Swarm stack)
+version: '3.8'
+
+services:
+  api:
+    image: myapp/api:latest
+    deploy:
+      mode: replicated
+      replicas: 3
+      update_config:
+        parallelism: 1
+        delay: 10s
+        failure_action: rollback
+        order: start-first
+      rollback_config:
+        parallelism: 1
+        delay: 10s
+      restart_policy:
+        condition: on-failure
+        delay: 5s
+        max_attempts: 3
+        window: 120s
+      placement:
+        constraints:
+          - node.role == worker
+        preferences:
+          - spread: node.id
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+    networks:
+      - app-network
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+    secrets:
+      - db_password
+      - jwt_secret
+    configs:
+      - app_config
+
+networks:
+  app-network:
+    driver: overlay
+    attachable: true
+
+secrets:
+  db_password:
+    external: true
+  jwt_secret:
+    external: true
+
+configs:
+  app_config:
+    external: true
+```
+
+### 3. Deploy Stack
+
+```bash
+# Create secrets (before deploying)
+echo "my_db_password" | docker secret create db_password -
+docker secret create jwt_secret ./jwt_secret.txt
+
+# Create configs
+docker config create app_config ./config.json
+
+# Deploy stack
+docker stack deploy -c docker-compose.yml mystack
+
+# List services
+docker stack services mystack
+
+# List tasks
+docker stack ps mystack
+
+# Remove stack
+docker stack rm mystack
+```
+
+### 4. Service Management
+
+```bash
+# Scale service
+docker service scale mystack_api=5
+
+# Update service image
+docker service update --image myapp/api:v2 mystack_api
+
+# Update environment variable
+docker service update --env-add NODE_ENV=staging mystack_api
+
+# Add constraint
+docker service update --constraint-add 'node.labels.region==us-east' mystack_api
+
+# Rollback service
+docker service rollback mystack_api
+
+# View service details
+docker service inspect mystack_api
+
+# View service logs
+docker service logs -f mystack_api
+```
+
+### 5. Secrets Management
+
+```bash
+# Create secret from stdin
+echo "my_secret" | docker secret create db_password -
+
+# Create secret from file
+docker secret create jwt_secret ./secrets/jwt.txt
+
+# List secrets
+docker secret ls
+
+# Inspect secret metadata
+docker secret inspect db_password
+
+# Use secret in service
+docker service create \
+  --name api \
+  --secret db_password \
+  --secret jwt_secret \
+  myapp/api:latest
+
+# Remove secret
+docker secret rm db_password
+```
+
+### 6. Config Management
+
+```bash
+# Create config
+docker config create app_config ./config.json
+
+# List configs
+docker config ls
+
+# Use config in service
+docker service create \
+  --name api \
+  --config source=app_config,target=/app/config.json \
+  myapp/api:latest
+
+# Update config (create new version)
+docker config create app_config_v2 ./config-v2.json
+
+# Update service with new config
+docker service update \
+  --config-rm app_config \
+  --config-add source=app_config_v2,target=/app/config.json \
+  mystack_api
+```
+
+### 7. Overlay Networks
+
+```yaml
+# Create overlay network
+networks:
+  frontend:
+    driver: overlay
+    attachable: true
+  
+  backend:
+    driver: overlay
+    attachable: true
+    internal: true  # No external access
+
+services:
+  web:
+    networks:
+      - frontend
+      - backend
+  
+  api:
+    networks:
+      - backend
+  
+  db:
+    networks:
+      - backend
+```
+
+```bash
+# Create network manually
+docker network create --driver overlay --attachable my-network
+
+# List networks
+docker network ls
+
+# Inspect network
+docker network inspect my-network
+```
+
+## Deployment Strategies
+
+### Rolling Update
+
+```yaml
+services:
+  api:
+    deploy:
+      update_config:
+        parallelism: 2        # Update 2 tasks at a time
+        delay: 10s           # Wait 10s between updates
+        failure_action: rollback
+        monitor: 30s         # Monitor for 30s after update
+        max_failure_ratio: 0.3  # Allow 30% failures
+```
+
+### Blue-Green Deployment
+
+```bash
+# Deploy new version alongside existing
+docker service create \
+  --name api-v2 \
+  --mode replicated \
+  --replicas 3 \
+  --network app-network \
+  myapp/api:v2
+
+# Update router to point to new version
+# (Using nginx/traefik config update)
+
+# Remove old version
+docker service rm api-v1
+```
+
+### Canary Deployment
+
+```yaml
+# Deploy canary version
+version: '3.8'
+services:
+  api:
+    image: myapp/api:v1
+    deploy:
+      replicas: 9
+      # ... 90% of traffic
+  
+  api-canary:
+    image: myapp/api:v2
+    deploy:
+      replicas: 1
+      # ... 10% of traffic
+```
+
+### Global Services
+
+```yaml
+# Run one instance on every node
+services:
+  monitoring:
+    image: myapp/monitoring:latest
+    deploy:
+      mode: global
+    volumes:
+      - /var/run/docker.sock:/var/run/docker.sock
+```
+
+## High Availability Patterns
+
+### 1. Multi-Manager Setup
+
+```bash
+# Create 3 manager nodes for HA
+docker swarm init --advertise-addr <MANAGER1_IP>
+
+# On manager2
+docker swarm join --token <MANAGER_TOKEN> <MANAGER1_IP>:2377
+
+# On manager3
+docker swarm join --token <MANAGER_TOKEN> <MANAGER1_IP>:2377
+
+# Promote worker to manager
+docker node promote <NODE_ID>
+
+# Demote manager to worker
+docker node demote <NODE_ID>
+```
+
+### 2. Placement Constraints
+
+```yaml
+services:
+  db:
+    image: postgres:15
+    deploy:
+      placement:
+        constraints:
+          - node.role == worker
+          - node.labels.database == true
+        preferences:
+          - spread: node.labels.zone  # Spread across zones
+
+  cache:
+    image: redis:7
+    deploy:
+      placement:
+        constraints:
+          - node.labels.cache == true
+```
+
+### 3. Resource Management
+
+```yaml
+services:
+  api:
+    deploy:
+      resources:
+        limits:
+          cpus: '2'
+          memory: 2G
+        reservations:
+          cpus: '1'
+          memory: 1G
+      restart_policy:
+        condition: on-failure
+        max_attempts: 3
+```
+
+### 4. Health Checks
+
+```yaml
+services:
+  api:
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+    deploy:
+      update_config:
+        failure_action: rollback
+        monitor: 30s
+```
+
+## Service Discovery & Load Balancing
+
+### Built-in Load Balancing
+
+```yaml
+# Swarm provides automatic load balancing
+services:
+  api:
+    deploy:
+      replicas: 3
+    ports:
+      - "3000:3000"  # Requests are load balanced across replicas
+
+# Virtual IP (VIP) - default mode
+# DNS round-robin
+services:
+  api:
+    deploy:
+      endpoint_mode: dnsrr
+```
+
+### Ingress Network
+
+```yaml
+# Publishing ports
+services:
+  web:
+    ports:
+      - "80:80"      # Published on all nodes
+      - "443:443"
+    deploy:
+      mode: ingress  # Default, routed through mesh
+```
+
+### Host Mode
+
+```yaml
+# Bypass load balancer (for performance)
+services:
+  web:
+    ports:
+      - target: 80
+        published: 80
+        mode: host  # Direct port mapping
+    deploy:
+      mode: global  # One per node
+```
+
+## Monitoring & Logging
+
+### Logging Drivers
+
+```yaml
+services:
+  api:
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "10m"
+        max-file: "3"
+        labels: "app,environment"
+  
+  # Or use syslog
+  api:
+    logging:
+      driver: "syslog"
+      options:
+        syslog-address: "tcp://logserver:514"
+        syslog-facility: "daemon"
+```
+
+### Viewing Logs
+
+```bash
+# Service logs
+docker service logs mystack_api
+
+# Filter by time
+docker service logs --since 1h mystack_api
+
+# Follow logs
+docker service logs -f mystack_api
+
+# All tasks
+docker service logs --tail 100 mystack_api
+```
+
+### Monitoring Commands
+
+```bash
+# Node status
+docker node ls
+
+# Service status
+docker service ls
+
+# Task status
+docker service ps mystack_api
+
+# Resource usage
+docker stats
+
+# Service inspect
+docker service inspect mystack_api --pretty
+```
+
+## Backup & Recovery
+
+### Backup Swarm State
+
+```bash
+# On manager node
+docker pull swaggercodebreaker/swarmctl
+docker run --rm -v /var/lib/docker/swarm:/ swarmctl export > swarm-backup.json
+
+# Or manual backup
+cp -r /var/lib/docker/swarm/raft ~/swarm-backup/
+```
+
+### Recovery
+
+```bash
+# Unlock swarm after restart (if encrypted)
+docker swarm unlock
+
+# Force new cluster (disaster recovery)
+docker swarm init --force-new-cluster
+
+# Restore from backup
+docker swarm init --force-new-cluster
+docker service create --name restore-app ...
+```
+
+## Common Operations
+
+### Node Management
+
+```bash
+# List nodes
+docker node ls
+
+# Inspect node
+docker node inspect <NODE_ID>
+
+# Drain node (for maintenance)
+docker node update --availability drain <NODE_ID>
+
+# Activate node
+docker node update --availability active <NODE_ID>
+
+# Add labels
+docker node update --label-add region=us-east <NODE_ID>
+
+# Remove node
+docker node rm <NODE_ID>
+```
+
+### Service Debugging
+
+```bash
+# View service tasks
+docker service ps mystack_api
+
+# View task details
+docker inspect <TASK_ID>
+
+# Run temporary container for debugging
+docker run --rm -it --network mystack_app-network \
+  myapp/api:latest sh
+
+# Check service logs
+docker service logs mystack_api
+
+# Execute command in running container
+docker exec -it <CONTAINER_ID> sh
+```
+
+### Network Debugging
+
+```bash
+# List networks
+docker network ls
+
+# Inspect overlay network
+docker network inspect mystack_app-network
+
+# Test connectivity
+docker run --rm --network mystack_app-network alpine ping api
+
+# DNS resolution
+docker run --rm --network mystack_app-network alpine nslookup api
+```
+
+## Production Checklist
+
+- [ ] At least 3 manager nodes for HA
+- [ ] Quorum maintained (odd number of managers)
+- [ ] Resources limited for all services
+- [ ] Health checks configured
+- [ ] Rolling update strategy defined
+- [ ] Rollback strategy configured
+- [ ] Secrets used for sensitive data
+- [ ] Configs for environment settings
+- [ ] Overlay networks properly segmented
+- [ ] Logging driver configured
+- [ ] Monitoring solution deployed
+- [ ] Backup strategy implemented
+- [ ] Node labels for placement constraints
+- [ ] Resource reservations set
+
+## Best Practices
+
+1. **Resource Planning**
+   ```yaml
+   deploy:
+     resources:
+       limits:
+         cpus: '1'
+         memory: 1G
+       reservations:
+         cpus: '0.5'
+         memory: 512M
+   ```
+
+2. **Rolling Updates**
+   ```yaml
+   deploy:
+     update_config:
+       parallelism: 1
+       delay: 10s
+       failure_action: rollback
+       monitor: 30s
+   ```
+
+3. **Placement Constraints**
+   ```yaml
+   deploy:
+     placement:
+       constraints:
+         - node.role == worker
+       preferences:
+         - spread: node.labels.zone
+   ```
+
+4. **Network Segmentation**
+   ```yaml
+   networks:
+     frontend:
+       driver: overlay
+     backend:
+       driver: overlay
+       internal: true
+   ```
+
+5. **Secrets Management**
+   ```yaml
+   secrets:
+     - db_password
+     - jwt_secret
+   ```
+
+## Troubleshooting
+
+### Service Won't Start
+
+```bash
+# Check task status
+docker service ps mystack_api --no-trunc
+
+# Check logs
+docker service logs mystack_api
+
+# Check node resources
+docker node ls
+docker stats
+
+# Check network
+docker network inspect mystack_app-network
+```
+
+### Task Keeps Restarting
+
+```bash
+# Check restart policy
+docker service inspect mystack_api --pretty
+
+# Check container logs
+docker service logs --tail 50 mystack_api
+
+# Check health check
+docker inspect <CONTAINER_ID> --format='{{.State.Health}}'
+```
+
+### Network Issues
+
+```bash
+# Verify overlay network
+docker network inspect mystack_app-network
+
+# Check DNS resolution
+docker run --rm --network mystack_app-network alpine nslookup api
+
+# Check connectivity
+docker run --rm --network mystack_app-network alpine ping api
+```
+
+## Related Skills
+
+| Skill | Purpose |
+|-------|---------|
+| `docker-compose` | Local development with Compose |
+| `docker-security` | Container security patterns |
+| `kubernetes` | Kubernetes orchestration |
+| `docker-monitoring` | Container monitoring setup |
--- a/.kilo/skills/docker-swarm/examples/ha-web-app.md
+++ b/.kilo/skills/docker-swarm/examples/ha-web-app.md
@@ -0,0 +1,519 @@
+# Docker Swarm Deployment Examples
+
+## Example: High Availability Web Application
+
+Complete example of deploying a production-ready web application with Docker Swarm.
+
+### docker-compose.yml (Swarm Stack)
+
+```yaml
+version: '3.8'
+
+services:
+  # Reverse Proxy with SSL
+  nginx:
+    image: nginx:alpine
+    ports:
+      - "80:80"
+      - "443:443"
+    configs:
+      - source: nginx_config
+        target: /etc/nginx/nginx.conf
+    secrets:
+      - ssl_cert
+      - ssl_key
+    networks:
+      - frontend
+    deploy:
+      replicas: 2
+      placement:
+        constraints:
+          - node.role == worker
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 256M
+    healthcheck:
+      test: ["CMD", "nginx", "-t"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  # API Service
+  api:
+    image: myapp/api:latest
+    environment:
+      - NODE_ENV=production
+      - DATABASE_URL=postgres://app:${DB_PASSWORD}@db:5432/app
+      - REDIS_URL=redis://cache:6379
+    configs:
+      - source: app_config
+        target: /app/config.json
+    secrets:
+      - jwt_secret
+    networks:
+      - frontend
+      - backend
+    deploy:
+      replicas: 3
+      update_config:
+        parallelism: 1
+        delay: 10s
+        failure_action: rollback
+        order: start-first
+      rollback_config:
+        parallelism: 1
+        delay: 10s
+      restart_policy:
+        condition: on-failure
+        delay: 5s
+        max_attempts: 3
+        window: 120s
+      placement:
+        constraints:
+          - node.role == worker
+        preferences:
+          - spread: node.id
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+        reservations:
+          cpus: '0.5'
+          memory: 512M
+    healthcheck:
+      test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+  # Background Worker
+  worker:
+    image: myapp/worker:latest
+    environment:
+      - NODE_ENV=production
+      - DATABASE_URL=postgres://app:${DB_PASSWORD}@db:5432/app
+    secrets:
+      - jwt_secret
+    networks:
+      - backend
+    deploy:
+      replicas: 2
+      restart_policy:
+        condition: on-failure
+        delay: 10s
+        max_attempts: 5
+      placement:
+        constraints:
+          - node.role == worker
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 512M
+
+  # Database (PostgreSQL with Replication)
+  db:
+    image: postgres:15-alpine
+    environment:
+      POSTGRES_DB: app
+      POSTGRES_USER: app
+      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
+    secrets:
+      - db_password
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+    networks:
+      - backend
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.labels.database == true
+      resources:
+        limits:
+          cpus: '2'
+          memory: 2G
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U app -d app"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  # Redis Cache
+  cache:
+    image: redis:7-alpine
+    command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru
+    volumes:
+      - redis-data:/data
+    networks:
+      - backend
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.labels.cache == true
+      resources:
+        limits:
+          cpus: '0.5'
+          memory: 512M
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  # Monitoring (Prometheus)
+  prometheus:
+    image: prom/prometheus:latest
+    configs:
+      - source: prometheus_config
+        target: /etc/prometheus/prometheus.yml
+    volumes:
+      - prometheus-data:/prometheus
+    networks:
+      - monitoring
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.role == manager
+    command:
+      - '--config.file=/etc/prometheus/prometheus.yml'
+      - '--storage.tsdb.retention.time=30d'
+
+  # Monitoring (Grafana)
+  grafana:
+    image: grafana/grafana:latest
+    ports:
+      - "3000:3000"
+    environment:
+      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
+    volumes:
+      - grafana-data:/var/lib/grafana
+    networks:
+      - monitoring
+    deploy:
+      replicas: 1
+      placement:
+        constraints:
+          - node.role == manager
+
+networks:
+  frontend:
+    driver: overlay
+    attachable: true
+  backend:
+    driver: overlay
+    internal: true
+  monitoring:
+    driver: overlay
+    attachable: true
+
+volumes:
+  postgres-data:
+  redis-data:
+  prometheus-data:
+  grafana-data:
+
+configs:
+  nginx_config:
+    file: ./configs/nginx.conf
+  app_config:
+    file: ./configs/app.json
+  prometheus_config:
+    file: ./configs/prometheus.yml
+
+secrets:
+  db_password:
+    file: ./secrets/db_password.txt
+  jwt_secret:
+    file: ./secrets/jwt_secret.txt
+  ssl_cert:
+    file: ./secrets/ssl_cert.pem
+  ssl_key:
+    file: ./secrets/ssl_key.pem
+```
+
+### Deployment Script
+
+```bash
+#!/bin/bash
+# deploy.sh
+
+set -e
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+NC='\033[0m'
+
+# Configuration
+STACK_NAME="myapp"
+COMPOSE_FILE="docker-compose.yml"
+
+echo "Starting deployment for ${STACK_NAME}..."
+
+# Check if running on Swarm
+if ! docker info | grep -q "Swarm: active"; then
+  echo -e "${RED}Error: Not running in Swarm mode${NC}"
+  echo "Initialize Swarm with: docker swarm init"
+  exit 1
+fi
+
+# Create secrets (if not exists)
+echo "Checking secrets..."
+for secret in db_password jwt_secret ssl_cert ssl_key; do
+  if ! docker secret inspect ${secret} > /dev/null 2>&1; then
+    if [ -f "./secrets/${secret}.txt" ]; then
+      docker secret create ${secret} ./secrets/${secret}.txt
+      echo -e "${GREEN}Created secret: ${secret}${NC}"
+    else
+      echo -e "${RED}Missing secret file: ./secrets/${secret}.txt${NC}"
+      exit 1
+    fi
+  else
+    echo "Secret ${secret} already exists"
+  fi
+done
+
+# Create configs
+echo "Creating configs..."
+docker config rm nginx_config 2>/dev/null || true
+docker config create nginx_config ./configs/nginx.conf
+
+docker config rm app_config 2>/dev/null || true
+docker config create app_config ./configs/app.json
+
+docker config rm prometheus_config 2>/dev/null || true
+docker config create prometheus_config ./configs/prometheus.yml
+
+# Deploy stack
+echo "Deploying stack..."
+docker stack deploy -c ${COMPOSE_FILE} ${STACK_NAME}
+
+# Wait for services to start
+echo "Waiting for services to start..."
+sleep 30
+
+# Show status
+docker stack services ${STACK_NAME}
+
+# Check health
+echo "Checking service health..."
+for service in nginx api worker db cache prometheus grafana; do
+  REPLICAS=$(docker service ls --filter name=${STACK_NAME}_${service} --format "{{.Replicas}}")
+  echo "${service}: ${REPLICAS}"
+done
+
+echo -e "${GREEN}Deployment complete!${NC}"
+echo "Check status: docker stack services ${STACK_NAME}"
+echo "View logs: docker service logs -f ${STACK_NAME}_api"
+```
+
+### Service Update Script
+
+```bash
+#!/bin/bash
+# update-service.sh
+
+set -e
+
+SERVICE_NAME=$1
+NEW_IMAGE=$2
+
+if [ -z "$SERVICE_NAME" ] || [ -z "$NEW_IMAGE" ]; then
+  echo "Usage: ./update-service.sh <service-name> <new-image>"
+  echo "Example: ./update-service.sh myapp_api myapp/api:v2"
+  exit 1
+fi
+
+FULL_SERVICE_NAME="${STACK_NAME}_${SERVICE_NAME}"
+
+echo "Updating ${FULL_SERVICE_NAME} to ${NEW_IMAGE}..."
+
+# Update service with rollback on failure
+docker service update \
+  --image ${NEW_IMAGE} \
+  --update-parallelism 1 \
+  --update-delay 10s \
+  --update-failure-action rollback \
+  --update-monitor 30s \
+  ${FULL_SERVICE_NAME}
+
+# Wait for update
+echo "Waiting for update to complete..."
+sleep 30
+
+# Check status
+docker service ps ${FULL_SERVICE_NAME}
+
+echo "Update complete!"
+```
+
+### Rollback Script
+
+```bash
+#!/bin/bash
+# rollback-service.sh
+
+set -e
+
+SERVICE_NAME=$1
+STACK_NAME="myapp"
+
+if [ -z "$SERVICE_NAME" ]; then
+  echo "Usage: ./rollback-service.sh <service-name>"
+  exit 1
+fi
+
+FULL_SERVICE_NAME="${STACK_NAME}_${SERVICE_NAME}"
+
+echo "Rolling back ${FULL_SERVICE_NAME}..."
+
+docker service rollback ${FULL_SERVICE_NAME}
+
+sleep 30
+
+docker service ps ${FULL_SERVICE_NAME}
+
+echo "Rollback complete!"
+```
+
+### Monitoring Dashboard (Grafana)
+
+```json
+{
+  "dashboard": {
+    "title": "Docker Swarm Overview",
+    "panels": [
+      {
+        "title": "Running Tasks",
+        "targets": [
+          {
+            "expr": "count(container_tasks_state{state=\"running\"})"
+          }
+        ]
+      },
+      {
+        "title": "CPU Usage per Service",
+        "targets": [
+          {
+            "expr": "rate(container_cpu_usage_seconds_total{name=~\".+\"}[5m]) * 100",
+            "legendFormat": "{{name}}"
+          }
+        ]
+      },
+      {
+        "title": "Memory Usage per Service",
+        "targets": [
+          {
+            "expr": "container_memory_usage_bytes{name=~\".+\"} / 1024 / 1024",
+            "legendFormat": "{{name}} MB"
+          }
+        ]
+      },
+      {
+        "title": "Network I/O",
+        "targets": [
+          {
+            "expr": "rate(container_network_receive_bytes_total{name=~\".+\"}[5m])",
+            "legendFormat": "{{name}} RX"
+          },
+          {
+            "expr": "rate(container_network_transmit_bytes_total{name=~\".+\"}[5m])",
+            "legendFormat": "{{name}} TX"
+          }
+        ]
+      },
+      {
+        "title": "Service Health",
+        "targets": [
+          {
+            "expr": "container_health_status{name=~\".+\"}"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+### Prometheus Configuration
+
+```yaml
+# prometheus.yml
+global:
+  scrape_interval: 15s
+  evaluation_interval: 15m
+
+alerting:
+  alertmanagers:
+    - static_configs:
+        - targets:
+          - alertmanager:9093
+
+rule_files:
+  - /etc/prometheus/alerts.yml
+
+scrape_configs:
+  - job_name: 'prometheus'
+    static_configs:
+      - targets: ['prometheus:9090']
+
+  - job_name: 'cadvisor'
+    static_configs:
+      - targets: ['cadvisor:8080']
+
+  - job_name: 'node'
+    static_configs:
+      - targets: ['node-exporter:9100']
+
+  - job_name: 'api'
+    static_configs:
+      - targets: ['api:3000']
+    metrics_path: '/metrics'
+```
+
+### Alert Rules
+
+```yaml
+# alerts.yml
+groups:
+  - name: swarm_alerts
+    rules:
+      - alert: ServiceDown
+        expr: count(container_tasks_state{state="running"}) == 0
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Service {{ $labels.service }} is down"
+          description: "No running tasks for service {{ $labels.service }}"
+
+      - alert: HighCpuUsage
+        expr: rate(container_cpu_usage_seconds_total[5m]) * 100 > 80
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High CPU usage on {{ $labels.name }}"
+          description: "Container {{ $labels.name }} CPU usage is {{ $value }}%"
+
+      - alert: HighMemoryUsage
+        expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) * 100 > 80
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High memory usage on {{ $labels.name }}"
+          description: "Container {{ $labels.name }} memory usage is {{ $value }}%"
+
+      - alert: ContainerRestart
+        expr: increase(container_restart_count[1h]) > 0
+        labels:
+          severity: warning
+        annotations:
+          summary: "Container {{ $labels.name }} restarted"
+          description: "Container {{ $labels.name }} restarted {{ $value }} times in the last hour"
+```
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -38,6 +38,7 @@ These agents are invoked automatically by `/pipeline` or manually via `@mention`
 | `@lead-developer` | Implements code | Status: testing (tests fail) |
 | `@frontend-developer` | UI implementation | When UI work needed |
 | `@backend-developer` | Node.js/Express/APIs | When backend needed |
+| `@devops-engineer` | Docker/Kubernetes/CI/CD | When deployment/infra needed |

 ### Quality Assurance
 | Agent | Role | When Invoked |
@@ -48,6 +49,12 @@ These agents are invoked automatically by `/pipeline` or manually via `@mention`
 | `@security-auditor` | Security audit | After performance |
 | `@visual-tester` | Visual regression | When UI changes |

+### DevOps & Infrastructure
+| Agent | Role | When Invoked |
+|-------|------|--------------|
+| `@devops-engineer` | Docker/Swarm/K8s deployment | When deployment needed |
+| `@security-auditor` | Container security scan | After deployment config |
+
 ### Cognitive Enhancement (New)
 | Agent | Role | When Invoked |
 |-------|------|--------------|
@@ -207,6 +214,46 @@ GITEA_TOKEN=your-token-here
 | `.kilo/skills/` | Skill modules |
 | `src/kilocode/` | TypeScript API for programmatic use |

+## Skills Reference
+
+### Containerization Skills
+| Skill | Purpose | Location |
+|-------|---------|----------|
+| `docker-compose` | Multi-container orchestration | `.kilo/skills/docker-compose/` |
+| `docker-swarm` | Production cluster deployment | `.kilo/skills/docker-swarm/` |
+| `docker-security` | Container security hardening | `.kilo/skills/docker-security/` |
+| `docker-monitoring` | Container monitoring/logging | `.kilo/skills/docker-monitoring/` |
+
+### Node.js Skills
+| Skill | Purpose | Location |
+|-------|---------|----------|
+| `nodejs-express-patterns` | Express routing, middleware | `.kilo/skills/nodejs-express-patterns/` |
+| `nodejs-auth-jwt` | JWT authentication | `.kilo/skills/nodejs-auth-jwt/` |
+| `nodejs-security-owasp` | OWASP security | `.kilo/skills/nodejs-security-owasp/` |
+
+### Database Skills
+| Skill | Purpose | Location |
+|-------|---------|----------|
+| `postgresql-patterns` | PostgreSQL patterns | `.kilo/skills/postgresql-patterns/` |
+| `sqlite-patterns` | SQLite patterns | `.kilo/skills/sqlite-patterns/` |
+| `clickhouse-patterns` | ClickHouse patterns | `.kilo/skills/clickhouse-patterns/` |
+
+### Go Skills
+| Skill | Purpose | Location |
+|-------|---------|----------|
+| `go-modules` | Go modules management | `.kilo/skills/go-modules/` |
+| `go-concurrency` | Goroutines and channels | `.kilo/skills/go-concurrency/` |
+| `go-testing` | Go testing patterns | `.kilo/skills/go-testing/` |
+| `go-security` | Go security patterns | `.kilo/skills/go-security/` |
+
+### Process Skills
+| Skill | Purpose | Location |
+|-------|---------|----------|
+| `planning-patterns` | CoT/ToT planning | `.kilo/skills/planning-patterns/` |
+| `memory-systems` | Memory management | `.kilo/skills/memory-systems/` |
+| `tool-use` | Tool usage patterns | `.kilo/skills/tool-use/` |
+| `research-cycle` | Self-improvement cycle | `.kilo/skills/research-cycle/` |
+
 ## Using the TypeScript API

 ```typescript