The sharePublic prop in editor components (Knowledge, Tools, Skills,
Prompts, Models) incorrectly included an "|| edit" / "|| write_access"
condition, allowing users with write access to see and use the "Public"
sharing option regardless of their actual public sharing permission.
Additionally, all backend access/update endpoints only verified write
authorization but did not check the corresponding sharing.public_*
permission, allowing direct API calls to bypass frontend restrictions
entirely.
Frontend: removed the edit/write_access bypass from sharePublic in all
five editor components so visibility is gated solely by the user's
sharing.public_* permission or admin role.
Backend: added has_public_read_access_grant checks to the access/update
endpoints in knowledge.py, tools.py, prompts.py, skills.py, models.py,
and notes.py. Public grants are silently stripped when the user lacks
the corresponding permission.
Fixes#21356
* fix: prevent worker death during document upload by using run_coroutine_threadsafe
Replace asyncio.run() with asyncio.run_coroutine_threadsafe() in
save_docs_to_vector_db() to prevent uvicorn worker health check failures.
The issue: asyncio.run() creates a new event loop and blocks the thread
completely, preventing the worker from responding to health checks during
long-running embedding operations (>5 seconds default timeout).
The fix: Schedule the async embedding work on the main event loop using
run_coroutine_threadsafe(). This keeps the main loop responsive to health
check pings while the sync caller waits for the result.
Changes:
- main.py: Store main event loop reference in app.state.main_loop at startup
- retrieval.py: Use run_coroutine_threadsafe() instead of asyncio.run()
https://claude.ai/code/session_01UQSYvSTkXb57sFb7M85Kcw
* add env var
---------
Co-authored-by: Claude <noreply@anthropic.com>
fix: reduce TTFT by caching model lookups in chat completion
Skip expensive get_all_models() calls when models are already cached
in app.state. This significantly reduces Time To First Token (TTFT)
for chat completions and embeddings requests.
Previously, every request called get_all_models() which fetches model
lists from all configured backends. Now we check the cache first and
only call get_all_models() on cache miss.
Affected endpoints:
- openai: generate_chat_completion, embeddings
- ollama: embed, embeddings
Fixes#20069
Co-authored-by: Michael <42099345+mickeytheseal@users.noreply.github.com>
Each access to request.app.state.config.<KEY> triggers a synchronous
Redis GET. In get_all_models_responses() and get_merged_models(), the
config keys OPENAI_API_BASE_URLS, OPENAI_API_KEYS, and
OPENAI_API_CONFIGS were read on every loop iteration — resulting
in some cases in 200-300 Redis round-trips for OPENAI_API_BASE_URLS alone.
Read each config value once into a local variable at the start of the
function and reuse it throughout.