open-webui-custom

Files

Classic298 efe5416f83 fix: reduce TTFT by caching model lookups in chat completion (#20886 )

fix: reduce TTFT by caching model lookups in chat completion

Skip expensive get_all_models() calls when models are already cached
in app.state. This significantly reduces Time To First Token (TTFT)
for chat completions and embeddings requests.

Previously, every request called get_all_models() which fetches model
lists from all configured backends. Now we check the cache first and
only call get_all_models() on cache miss.

Affected endpoints:
- openai: generate_chat_completion, embeddings
- ollama: embed, embeddings

Fixes #20069

Co-authored-by: Michael <42099345+mickeytheseal@users.noreply.github.com>

2026-02-11 18:29:10 -06:00