fix: reduce TTFT by caching model lookups in chat completion
Skip expensive get_all_models() calls when models are already cached
in app.state. This significantly reduces Time To First Token (TTFT)
for chat completions and embeddings requests.
Previously, every request called get_all_models() which fetches model
lists from all configured backends. Now we check the cache first and
only call get_all_models() on cache miss.
Affected endpoints:
- openai: generate_chat_completion, embeddings
- ollama: embed, embeddings
Fixes#20069
Co-authored-by: Michael <42099345+mickeytheseal@users.noreply.github.com>