open-webui-custom/backend/open_webui at efe5416f8373455217572f50464860fdd9da63f3 - open-webui-custom - Gitea: Git with a cup of tea

OpenWebUI/open-webui-custom

Files

History

Classic298 efe5416f83 fix: reduce TTFT by caching model lookups in chat completion (#20886 )

fix: reduce TTFT by caching model lookups in chat completion

Skip expensive get_all_models() calls when models are already cached
in app.state. This significantly reduces Time To First Token (TTFT)
for chat completions and embeddings requests.

Previously, every request called get_all_models() which fetches model
lists from all configured backends. Now we check the cache first and
only call get_all_models() on cache miss.

Affected endpoints:
- openai: generate_chat_completion, embeddings
- ollama: embed, embeddings

Fixes #20069

Co-authored-by: Michael <42099345+mickeytheseal@users.noreply.github.com>

2026-02-11 18:29:10 -06:00

..

refac: mv backend files to /open_webui dir

2024-09-04 16:54:48 +02:00

chore: format

2026-02-11 16:24:11 -06:00

chore: format

2026-02-11 16:24:11 -06:00

refac

2026-02-11 16:45:47 -06:00

chore: format

2026-02-11 16:24:11 -06:00

fix: reduce TTFT by caching model lookups in chat completion (#20886 )

2026-02-11 18:29:10 -06:00

chore: format

2026-02-11 16:24:11 -06:00

refac

2025-08-10 00:02:58 +04:00

chore: format

2026-02-11 16:24:11 -06:00

rm: outdated tests

2025-12-28 23:35:09 +04:00

chore: format

2026-02-11 16:24:11 -06:00

refac

2026-02-11 18:19:01 -06:00

__init__.py

Update __init__.py

2025-04-15 09:55:35 +02:00

alembic.ini

fix: Alembic CLI commands from failing

2025-08-15 04:17:47 -04:00

config.py

chore: format

2026-02-11 16:24:11 -06:00

constants.py

feat/enh: optional password validation

2025-11-20 17:44:49 -05:00

env.py

chore: format

2026-02-11 16:24:11 -06:00

functions.py

chore: format

2026-02-11 16:24:11 -06:00

main.py

Fix idle in transaction leaks in Open WebUI (#20868 )

2026-02-11 18:20:03 -06:00

tasks.py

chore: format

2026-02-11 16:24:11 -06:00