- SwarmServiceInfo.ports typed as string[]|null, normalised to [] in listSwarmServices()
- SwarmServiceInfo.labels typed as Record|null, normalised to {} in listSwarmServices()
- NodeInfo.labels typed as |null, normalised via .map() in Nodes.tsx before render
- ServiceRow now uses (svc.ports ?? []).length and .map() — no crash when null
- Image display wrapped in IIFE to avoid double-split problem
- agents/nodes arrays normalised with .map() guards before render
- gateway-proxy.ts: listSwarmServices() deserialises and patches null fields server-side
## 1. Fix /nodes Swarm Status Display
- Add SwarmStatusBanner component: clear green/red/loading state
- Shows nodeId, managerAddr, isManager badge
- Error state explains what to check (docker.sock mount)
- Header now shows 'swarm unreachable — check gateway' vs 'active'
- swarmOk now checks nodeId presence, not just data existence
## 2. Autonomous Agent Container
- New docker/Dockerfile.agent — builds Go agent binary from gateway/cmd/agent/
- New gateway/cmd/agent/main.go — standalone HTTP microservice:
* GET /health — liveness probe with idle time info
* POST /task — receives task, forwards to Gateway orchestrator
* GET /info — agent metadata (id, hostname, gateway url)
* Idle watchdog: calls /api/swarm/agents/{name}/stop after IdleTimeoutMinutes
* Connects to Swarm overlay network (goclaw-net) → reaches DB/Gateway by DNS
* Env: AGENT_ID, GATEWAY_URL, DATABASE_URL, IDLE_TIMEOUT_MINUTES
## 3. Swarm Manager Agent (auto-stop after 15min idle)
- New gateway/internal/api/swarm_manager.go:
* SwarmManager goroutine checks every 60s
* Scales idle GoClaw agent services to 0 replicas after 15 min
* Tracks lastActivity from task UpdatedAt timestamps
- New REST endpoints in gateway:
* GET /api/swarm/agents — list agents with idleMinutes
* POST /api/swarm/agents/{name}/start — scale up agent
* POST /api/swarm/agents/{name}/stop — scale to 0
* DELETE /api/swarm/services/{id} — remove service permanently
- SwarmManager started as background goroutine in main.go with context cancel
## 4. Docker Client Enhancements
- Added NetworkAttachment type and Networks field to ServiceSpec
- CreateAgentServiceFull(opts) — supports overlay networks, custom labels
- CreateAgentService() delegates to CreateAgentServiceFull for backward compat
- RemoveService(id) — DELETE /v1.44/services/{id}
- GetServiceLastActivity(id) — finds latest task UpdatedAt for idle detection
## 5. tRPC & Gateway Proxy
- New functions: removeSwarmService, listSwarmAgents, startSwarmAgent, stopSwarmAgent
- SwarmAgentInfo type with idleMinutes, lastActivity, desiredReplicas
- createAgentService now accepts networks[] parameter
- New tRPC endpoints: nodes.removeService, nodes.listAgents, nodes.startAgent, nodes.stopAgent
## 6. Nodes.tsx UI Overhaul
- SwarmStatusBanner component at top — no more silent 'connecting…'
- New 'Agents' tab with AgentManagerRow: idle time, auto-stop warning, start/stop/remove buttons
- IdleColor coding: green < 5m, yellow 5-10m, red 10m+ with countdown to auto-stop
- ServiceRow: added Remove button with confirmation dialog
- RemoveConfirmDialog component
- DeployAgentDialog: added overlay networks field, default env includes GATEWAY_URL
- All queries refetch after agent start/stop/remove
Problem: when LLM returned empty content or network error, the orchestrator
immediately stopped with (no response) — visible to user as blank reply.
Solution — 4-layer retry system:
## Go Gateway (gateway/internal/orchestrator/orchestrator.go)
- Extracted shared runLoop() used by Chat(), ChatWithEvents(), ChatWithEventsAndRetry()
- Added RetryPolicy struct: MaxLLMRetries (default 3), InitialDelay (2s),
MaxDelay (30s), RetryOnEmpty (true)
- callLLMWithRetry(): wraps every LLM call with exponential back-off:
* retries on HTTP/network error
* retries on empty choices array
* retries when content=="" AND finish_reason!="tool_calls" (soft empty)
* strips tools on attempt > 1 (avoids repeated tool-format errors)
* logs each attempt; total attempts = MaxLLMRetries + 1 (default: 4)
- Added ChatWithEventsAndRetry() with onRetry callback for client visibility
- SetRetryPolicy() for runtime override
## Config (gateway/config/config.go)
- New fields: MaxLLMRetries (GATEWAY_MAX_LLM_RETRIES, default 3)
RetryDelaySecs (GATEWAY_RETRY_DELAY_SECS, default 2)
## main.go — wires retry policy from config into orchestrator
## docker-compose.yml
- GATEWAY_REQUEST_TIMEOUT_SECS: 120 → 300 (accommodates up to 4 retries)
- GATEWAY_MAX_LLM_RETRIES=3, GATEWAY_RETRY_DELAY_SECS=2 env vars
## API (handlers.go)
- StartChatSession goroutine now uses ChatWithEventsAndRetry
- onRetry callback emits "thinking" DB event with content "⟳ Retry N: reason"
so the client sees retry progress in the console panel
## Frontend (client/src/lib/chatStore.ts + client/src/pages/Chat.tsx)
- ConsoleEntry gains content?: string and new type "retry"
- thinking events with content starting "⟳ Retry" → type=retry (amber)
- Chat ConsolePanel renders retry events in amber with RefreshCw icon
and shows the retry reason string underneath
1. AgentDetailModal – fix provider not being pre-selected on edit open:
- Add resolveProviderValue() that does exact → case-insensitive → partial
match between stored provider string and connectedProviders list
- Re-resolve provider in a second useEffect once providers load from API
- Add safety-net SelectItem for stored value not found in providers list
2. AgentCreateModal – refactor Deploy Agent form:
- Fix Provider + Model fields layout (grid-cols-2 with w-full truncate to
prevent overflow/merging)
- Add Wand2 'Auto-fill' button next to Agent Name field that calls
agentCompiler.compile (existing LLM endpoint) with name+description as
spec — fills role, model, temperature, systemPrompt automatically
- Add Sparkles hint text explaining the magic wand functionality
- Auto-select first provider/model when data loads
- All fields use font-mono + proper label spacing
3. Both modals – MaxTokens auto-fill from Ollama API:
- Add getOllamaModelInfo() in gateway-proxy.ts: calls Ollama /api/show,
extracts {arch}.context_length from model_info, returns contextLength +
parameterSize, family, quantization, capabilities
- Add ollama.modelInfo tRPC query endpoint in routers.ts (input: modelId)
- Both modals query trpc.ollama.modelInfo on model selection change
- Auto-set maxTokens to context_length from API (262144 for kimi-k2.5 etc.)
- Show 'max N from API' hint + clickable link to set full context window
- Loading spinner while fetching model info
- db.go: added SaveMetric(MetricInput) and SaveHistory(HistoryInput) methods
that write directly to MySQL; non-fatal (log-only on error)
- handlers.go (OrchestratorStream): after each SSE stream finishes, an async
goroutine saves agentMetrics (agentId, requestId, tokens, processingTimeMs,
model, toolsCalled, status) and agentHistory (userMessage, agentResponse);
both error and success paths covered; orchAgentID resolved from DB
- routers.ts (agents.chat): saveMetric() called for both success and error paths
in the Node.js direct-chat fallback (was only saving agentHistory before)
- Verified: agentMetrics row ID=2 shows processingTimeMs=2133, totalTokens=143,
model=minimax-m2.7, Cyrillic text stored correctly as UTF-8
- Chat.tsx: rewritten to use global chatStore singleton — SSE connection survives
page navigation; added StopCircle cancel button; scrolls only when near bottom
- chatStore.ts: new module-level singleton (EventTarget pattern) that holds all
conversation/console state; TextDecoder with stream:true for correct UTF-8
- handlers.go (ProvidersReload): now accepts decrypted key in request body from
Node.js so Go gateway can actually use the API key without sharing crypto logic
- providers.ts (activateProvider): sends decrypted key to gateway via
notifyGatewayReload(); seedDefaultProvider also calls notifyGatewayReload()
- seed.ts: on startup, after seeding, pushes active provider to gateway with
retry loop (5 retries × 3 s) to wait for gateway readiness
- index.ts (SSE proxy): TextDecoder('utf-8', {stream:true}) already correct;
confirmed Cyrillic text arrives ungarbled (e.g. 'Привет!' not '??????????')
Problems fixed:
1. 401 unauthorized on chat — OLLAMA_API_KEY was not set in containers
- Created docker/.env with real API key
- Added OLLAMA_BASE_URL + OLLAMA_API_KEY to control-center in docker-compose.yml
2. AgentDetailModal/AgentCreateModal showed hardcoded providers list
(Ollama, OpenAI, Anthropic, Mistral, Groq) regardless of what is configured
- Removed const PROVIDERS = [...] from both modals
- Now loads providers via trpc.config.providers (server-side)
- Only shows providers that are actually configured in env
3. Settings.tsx had API key hardcoded in frontend source code (security issue)
- API key removed from frontend
- New trpc.config.providers endpoint returns masked key (first 8 chars + ***)
- Shows red warning badge 'NO KEY — chat will fail' if key is missing
- Base URL read from server env, not hardcoded
New tRPC endpoint: config.providers
- Returns list of configured providers with name, baseUrl, hasKey, maskedKey
- Provider name auto-detected from URL (ollama.com → 'Ollama Cloud', etc.)
- Dashboard.tsx: removed 3 hardcoded mock constants (NODES/AGENTS/ACTIVITY_LOG)
- Swarm Nodes panel: real data from trpc.nodes.list (swarm nodes or containers)
- Container stats: live CPU%/MEM from trpc.nodes.stats, rendered as progress bars
- Active Agents panel: real agents from trpc.agents.list with isActive/isSystem/model/role
- Activity Feed: generated from active agents list (live agent names, models, timestamps)
- Metric cards: real counts from trpc.dashboard.stats (uptime, nodes, agents, gateway)
- All 3 panels have loading state (Loader2 spinner) and empty/error state
- Hero banner subtitle uses real stats.nodes and stats.agents counts
- Cluster Topology footer shows real uptime from dashboard.stats
- server/index.ts: documented as @deprecated legacy static-only entry point
- Added JSDoc block explaining this file is NOT the production server
- Points to server/_core/index.ts as the real server with tRPC/OAuth/seed
- Added console.log WARNING on startup to prevent accidental use
- File retained as historical artefact per Phase 17 decision
- todo.md: Phase 16 debt items closed as [x], Phase 17 section added
- ADR-001: Streaming LLM — status DEFERRED, Phase 18 plan documented
(Go Gateway stream:true + tRPC subscription + Chat.tsx EventSource)
- ADR-002: Authentication — status ACCEPTED as internal tool
(OAuth already partial; protectedProcedure path documented for future)
- Phase 9 routers.ts orchestrator migration verified as complete
- AgentDetailModal: load real models from API with loading indicator;
fallback to current agent model when API unavailable; show count badge
- AgentCreateModal: remove broken provider-filter on models list;
add loading indicator and disabled state during fetch; show count badge
- gateway/orchestrator: add resolveModel() — validates desired model
against LLM API before use; auto-fallback to first available model
to prevent 401/404 errors (fixes glm-5 unauthorized in chat)
- gateway/orchestrator: add ModelWarning field to ChatResult struct
- gateway-proxy.ts: add modelWarning field to GatewayChatResult
- Chat.tsx: display modelWarning as amber badge next to model name
- todo.md: add Phase 16 section with bug fixes and tech debt notes
- server/seed.ts: 6 default system agents seeded on first startup
- server/seed.test.ts: 18 vitest tests (69 total, all pass)
- server/_core/index.ts: seedDefaults() integrated into startup
- Deployed to production 2.59.219.61: all 6 agents confirmed in DB
- Gitea: committed and pushed (73a26d8)
- DB schema: all missing tables created (agentHistory, agentMetrics, agents, toolDefinitions + isSystem/isOrchestrator columns)
Реализовано:
- gateway/internal/docker/client.go: Docker API клиент через unix socket (/var/run/docker.sock)
- IsSwarmActive(), GetSwarmInfo(), ListNodes(), ListContainers(), GetContainerStats()
- CalcCPUPercent() для расчёта CPU%
- gateway/internal/api/handlers.go: новые endpoints
- GET /api/nodes: список Swarm нод или standalone Docker хост
- GET /api/nodes/stats: live CPU/RAM статистика контейнеров
- POST /api/tools/execute: выполнение инструментов
- gateway/cmd/gateway/main.go: зарегистрированы новые маршруты
- server/gateway-proxy.ts: добавлены getGatewayNodes() и getGatewayNodeStats()
- server/routers.ts: добавлен nodes router (nodes.list, nodes.stats)
- client/src/pages/Nodes.tsx: полностью переписан на реальные данные
- Auto-refresh: 10s для нод, 15s для статистики контейнеров
- Swarm mode: показывает все ноды кластера
- Standalone mode: показывает локальный Docker хост + контейнеры
- CPU/RAM gauges из реальных docker stats
- Error state при недоступном Gateway
- Loading skeleton
- server/nodes.test.ts: 14 новых vitest тестов
- Все 51 тест пройдены
Исправлено:
- Chat.tsx: убрана хардкодированная модель "qwen2.5:7b" из мутации — теперь оркестратор использует модель из конфига БД (minimax-m2.7)
- Chat.tsx: добавлен Streamdown для markdown рендеринга ответов оркестратора
- Подтверждено: tool calling работает — команда "Покажи файлы проекта" вызывает file_list и возвращает структуру проекта
- Подтверждено: model в header показывает "minimax-m2.7" из БД
- TypeScript: 0 ошибок (pnpm tsc --noEmit)
- Тесты: 24/24 passed
## Phase 1 (Fixed): Agent Management UI
- Исправлена авторизация: agents переведены на publicProcedure
- AgentDetailModal: 5 вкладок (General, LLM Params, Tools, History, Stats)
- Полное редактирование: model, provider, temperature, topP, maxTokens, frequencyPenalty, presencePenalty, systemPrompt
- Управление allowedTools и allowedDomains через теги
- AgentCreateModal: создание агентов с выбором модели из Ollama API
- Кнопка Metrics на каждой карточке агента
## Phase 2+3: Tool Binding System
- server/tools.ts: реестр из 10 инструментов (http_get, http_post, shell_exec, file_read, file_write, docker_list, docker_exec, docker_logs, browser_navigate, browser_screenshot)
- Безопасное выполнение: проверка allowedTools агента, accessControl из БД
- tools.execute tRPC endpoint
- Tools.tsx: страница управления инструментами с тест-выполнением
- Добавлен пункт "Инструменты" в sidebar навигацию
## Phase 4: Metrics & History
- AgentMetrics.tsx: детальная страница метрик по агенту
- Request Timeline: bar chart по часам (success/error)
- Conversation Log: история диалогов с пагинацией
- Raw Metrics Table: все метрики с токенами и временем
- Time range selector: 6h/24h/48h/7d
- Маршрут /agents/:id/metrics
## Tests: 24/24 passed
- server/auth.logout.test.ts (1)
- server/agents.test.ts (7)
- server/tools.test.ts (13)
- server/ollama.test.ts (3)