Files
open-webui-custom/backend
Classic298 efe5416f83 fix: reduce TTFT by caching model lookups in chat completion (#20886)
fix: reduce TTFT by caching model lookups in chat completion

Skip expensive get_all_models() calls when models are already cached
in app.state. This significantly reduces Time To First Token (TTFT)
for chat completions and embeddings requests.

Previously, every request called get_all_models() which fetches model
lists from all configured backends. Now we check the cache first and
only call get_all_models() on cache miss.

Affected endpoints:
- openai: generate_chat_completion, embeddings
- ollama: embed, embeddings

Fixes #20069

Co-authored-by: Michael <42099345+mickeytheseal@users.noreply.github.com>
2026-02-11 18:29:10 -06:00
..
2025-09-12 14:09:32 +08:00
2026-02-09 16:15:39 -06:00
2026-02-09 18:22:28 -06:00
2025-06-27 15:46:38 +04:00
2025-10-07 16:20:27 -05:00