Problem: when LLM returned empty content or network error, the orchestrator
immediately stopped with (no response) — visible to user as blank reply.
Solution — 4-layer retry system:
## Go Gateway (gateway/internal/orchestrator/orchestrator.go)
- Extracted shared runLoop() used by Chat(), ChatWithEvents(), ChatWithEventsAndRetry()
- Added RetryPolicy struct: MaxLLMRetries (default 3), InitialDelay (2s),
MaxDelay (30s), RetryOnEmpty (true)
- callLLMWithRetry(): wraps every LLM call with exponential back-off:
* retries on HTTP/network error
* retries on empty choices array
* retries when content=="" AND finish_reason!="tool_calls" (soft empty)
* strips tools on attempt > 1 (avoids repeated tool-format errors)
* logs each attempt; total attempts = MaxLLMRetries + 1 (default: 4)
- Added ChatWithEventsAndRetry() with onRetry callback for client visibility
- SetRetryPolicy() for runtime override
## Config (gateway/config/config.go)
- New fields: MaxLLMRetries (GATEWAY_MAX_LLM_RETRIES, default 3)
RetryDelaySecs (GATEWAY_RETRY_DELAY_SECS, default 2)
## main.go — wires retry policy from config into orchestrator
## docker-compose.yml
- GATEWAY_REQUEST_TIMEOUT_SECS: 120 → 300 (accommodates up to 4 retries)
- GATEWAY_MAX_LLM_RETRIES=3, GATEWAY_RETRY_DELAY_SECS=2 env vars
## API (handlers.go)
- StartChatSession goroutine now uses ChatWithEventsAndRetry
- onRetry callback emits "thinking" DB event with content "⟳ Retry N: reason"
so the client sees retry progress in the console panel
## Frontend (client/src/lib/chatStore.ts + client/src/pages/Chat.tsx)
- ConsoleEntry gains content?: string and new type "retry"
- thinking events with content starting "⟳ Retry" → type=retry (amber)
- Chat ConsolePanel renders retry events in amber with RefreshCw icon
and shows the retry reason string underneath
Problems fixed:
1. 401 unauthorized on chat — OLLAMA_API_KEY was not set in containers
- Created docker/.env with real API key
- Added OLLAMA_BASE_URL + OLLAMA_API_KEY to control-center in docker-compose.yml
2. AgentDetailModal/AgentCreateModal showed hardcoded providers list
(Ollama, OpenAI, Anthropic, Mistral, Groq) regardless of what is configured
- Removed const PROVIDERS = [...] from both modals
- Now loads providers via trpc.config.providers (server-side)
- Only shows providers that are actually configured in env
3. Settings.tsx had API key hardcoded in frontend source code (security issue)
- API key removed from frontend
- New trpc.config.providers endpoint returns masked key (first 8 chars + ***)
- Shows red warning badge 'NO KEY — chat will fail' if key is missing
- Base URL read from server env, not hardcoded
New tRPC endpoint: config.providers
- Returns list of configured providers with name, baseUrl, hasKey, maskedKey
- Provider name auto-detected from URL (ollama.com → 'Ollama Cloud', etc.)