Files
APAW/.kilo/rules/process-continuity.md
NW 06fb0421ef fix(process-continuity): operator-free design for MCP Docker integration
- Resolve service_healthy deadlock by using service_started instead
- Fix 172.28.0.0/16 network collision by removing ipam config
- Add HybridGiteaClient (mcp → rest → bash fallback)
- Create .kilo/rules/process-continuity.md with 5 operator-free principles:
  1. No service_healthy conditions
  2. No hardcoded networks
  3. Automatic fallback chains
  4. Pre-flight validation
  5. Self-documenting failures
- Update docker-compose.yml with resilient config:
  - start_period: 60s, retries: 5, restart: on-failure:3
  - /tools healthcheck (guaranteed endpoint)
  - tmpfs for Node.js /tmp
  - Resource limits: 256M RAM, 0.5 CPU
- MCP/REST integration test passed (issue #109)

Refs: Milestone #67, Issues #107, #109
2026-05-08 22:31:59 +01:00

4.3 KiB

GNS-2: Process Continuity Rules

Problem

The pipeline repeatedly broke in Phase 8 (MCP Docker integration) because:

  1. service_healthy deadlock (docker-compose.yml) — container couldn't start because it was waiting for its own healthcheck to pass before it was running
  2. Network overlap — subnet 172.28.0.0/16 conflicted with existing Docker networks
  3. Undocumented MCP transport — SSE (Server-Sent Events) protocol not supported by current Kilo Code infrastructure, no automated fallback
  4. Operator dependency — process stopped when technical barrier hit, required human decisions

Root Cause

Failure Why it happened Operator-Free Fix
service_healthy deadlock Docker compose blocked startup waiting for healthcheck on a container that wasn't yet running Use condition: service_started for depends_on
Subnet 172.28.0.0/16 conflict Hardcoded IP overlap with host Docker networks Remove ipam config, let Docker auto-assign
SSE transport unsupported forgejo-mcp exposes MCP over SSE, current agent infrastructure uses HTTP REST + bash curl Hybrid client with MPC → REST fallback
/health endpoint mismatch Container used /health endpoint but MCP server had different URL Probe /tools (guaranteed endpoint) instead

Operator-Free Design Principles

1. No service_healthy Conditions

# PROBLEM: deadlock
depends_on:
  service:
    condition: service_healthy  # Container waits for itself

# FIX: allow startup, healthcheck as observer only  
depends_on:
  service:
    condition: service_started

2. No Hardcoded Networks

# PROBLEM: overlap
networks:
  gns-network:
    ipam:
      config:
        - subnet: 172.28.0.0/16  # May conflict

# FIX: Docker auto-assigns
networks:
  gns-network:
    driver: bridge

3. Automatic Fallback Chains

// Hybrid client: tries MCP first, falls back to REST, falls back to bash curl
try {
  result = await mcpClient.createIssue(...)
} catch (mcpError) {
  console.warn(`MCP failed: ${mcpError}`)
  try {
    result = await restClient.createIssue(...)
  } catch (restError) {
    console.warn(`REST failed: ${restError}`)
    // Final fallback: bash curl (emergency only)
    result = await bashCurl(...)
  }
}

4. Pre-flight Validation

Before starting containers, validate prerequisites:

# Check if port is free, if not use another
curl -f http://localhost:3001/health || PORT=3002

# Check network doesn't exist
docker network ls | grep gns-network && docker network rm gns-network

# Check env vars are set
[ -z "$FORGEJO_TOKEN" ] && echo "WARNING: FORGEJO_TOKEN not set, using dummy value"

5. Self-Documenting Failures

If process must stop, write explicit "why" and "what to do" to both:

  • Console output (human readable)
  • Gitea issue comment (machine readable, includes GNS_EVENT)
## 🚫 Agent Blocked

**Reason**: MCP server not reachable on localhost:3001
**Action**: Run `docker compose -f docker/mcp-gitea/docker-compose.yml up -d`
**Fallback**: Operations will use REST API until MCP is available

Implementation Checklist

For every new container/service:

  • Healthcheck probes a guaranteed endpoint (/tools, not /health if unstable)
  • No service_healthy conditions in depends_on
  • No hardcoded subnets or IPs
  • Environment variables have safe fallbacks for startup
  • Error boundaries in all async operations (try/catch)
  • Error messages include both "what happened" and "next step"
  • All operator-required steps are documented as checklist in issue body

GNS-2 Event Format for Failures

<!-- GNS_EVENT: {
  "type": "system_failure",
  "failure_point": "mcp_container_startup",
  "requires_operator": true,
  "reason": "FORGEJO_TOKEN not set, container cannot connect to Gitea; used dummy token",
  "recovery_steps": [
    "Set FORGEJO_TOKEN in docker/mcp-gitea/.env",
    "Restart: docker compose -f docker/mcp-gitea/docker-compose.yml up -d"
  ],
  "fallback_active": "REST API (gitea-client.ts)",
  "timestamp": "2026-05-08T22:23:00Z"
} -->

Reference