Files
open-webui-custom/backend/open_webui
Classic298 efe5416f83 fix: reduce TTFT by caching model lookups in chat completion (#20886)
fix: reduce TTFT by caching model lookups in chat completion

Skip expensive get_all_models() calls when models are already cached
in app.state. This significantly reduces Time To First Token (TTFT)
for chat completions and embeddings requests.

Previously, every request called get_all_models() which fetches model
lists from all configured backends. Now we check the cache first and
only call get_all_models() on cache miss.

Affected endpoints:
- openai: generate_chat_completion, embeddings
- ollama: embed, embeddings

Fixes #20069

Co-authored-by: Michael <42099345+mickeytheseal@users.noreply.github.com>
2026-02-11 18:29:10 -06:00
..
2026-02-11 16:24:11 -06:00
2026-02-11 16:24:11 -06:00
2026-02-11 16:45:47 -06:00
2026-02-11 16:24:11 -06:00
2026-02-11 16:24:11 -06:00
2025-08-10 00:02:58 +04:00
2026-02-11 16:24:11 -06:00
2025-12-28 23:35:09 +04:00
2026-02-11 16:24:11 -06:00
2026-02-11 18:19:01 -06:00
2025-04-15 09:55:35 +02:00
2026-02-11 16:24:11 -06:00
2026-02-11 16:24:11 -06:00
2026-02-11 16:24:11 -06:00
2026-02-11 16:24:11 -06:00