fix: remove invalid expunge call on Pydantic FileModel
Files.get_file_by_id() returns a Pydantic FileModel, not an SQLAlchemy
ORM object. Calling db.expunge() on a Pydantic model fails with
UnmappedInstanceError since it lacks _sa_instance_state.
The expunge was also unnecessary because subsequent DB updates already
use fresh sessions via get_db() context manager.
Fixes#20925
Remove Depends(get_session) from POST /query endpoint to prevent database connections from being held during embedding API calls (1-5+ seconds).
The Memories.get_memories_by_user_id() function manages its own short-lived session internally, releasing the connection before the slow EMBEDDING_FUNCTION() call begins.
Remove Depends(get_session) from POST /create endpoint to prevent database connections from being held during embedding API calls (1-5+ seconds).
The has_permission() and Knowledges.insert_new_knowledge() functions manage their own short-lived sessions internally, releasing connections before the slow embed_knowledge_base_metadata() call begins.
Remove Depends(get_session) from POST /{id}/update endpoint to prevent database connections from being held during embedding API calls (1-5+ seconds).
All database operations (get_knowledge_by_id, has_access, has_permission, update_knowledge_by_id, get_file_metadatas_by_id) manage their own short-lived sessions internally, releasing connections before and after the slow embed_knowledge_base_metadata() call.
Remove Depends(get_session) from the /v1/completions endpoint to prevent database connections from being held during the entire duration of LLM calls.
Previously, the database session was acquired at request start and held until the response completed. Under concurrent load, this exhausted the connection pool, causing QueuePool timeout errors.
The fix allows Models.get_model_by_id() and has_access() to manage their own short-lived sessions internally, releasing the connection immediately after authorization checks complete.
Remove Depends(get_session) from the /v1/chat/completions endpoint to prevent database connections from being held during the entire duration of LLM calls.
Previously, the database session was acquired at request start and held until the streaming response completed. Under concurrent load, this exhausted the connection pool, causing QueuePool timeout errors.
The fix allows Models.get_model_by_id() and has_access() to manage their own short-lived sessions internally, releasing the connection immediately after authorization checks complete.
Remove Depends(get_session) from the /api/chat endpoint to prevent database connections from being held during the entire duration of LLM calls (30-60+ seconds for streaming responses).
Previously, the database session was acquired at request start and held until the streaming response completed. Under concurrent load, this exhausted the connection pool, causing QueuePool timeout errors for other database operations.
The fix allows Models.get_model_by_id() and has_access() to manage their own short-lived sessions internally, releasing the connection immediately after the quick authorization checks complete - before the slow external LLM API call begins.
Remove Depends(get_session) from the /chat/completions endpoint to prevent database connections from being held during the entire duration of LLM calls (30-60+ seconds for streaming responses).
Previously, the database session was acquired at request start and held until the streaming response completed. Under concurrent load, this exhausted the connection pool, causing QueuePool timeout errors for other database operations.
The fix allows Models.get_model_by_id() and has_access() to manage their own short-lived sessions internally, releasing the connection immediately after the quick authorization checks complete - before the slow external LLM API call begins.
Remove Depends(get_session) from POST /add endpoint to prevent database connections from being held during embedding API calls (1-5+ seconds).
The Memories.insert_new_memory() function manages its own short-lived session internally, releasing the connection before the slow EMBEDDING_FUNCTION() call begins.
Remove Depends(get_session) from POST /metadata/reindex endpoint to prevent database connections from being held during N embedding API calls.
This endpoint is CRITICAL as it loops through ALL knowledge bases and calls embed_knowledge_base_metadata() for each one. With the original code, a single connection would be held for the entire duration (potentially minutes for large deployments), completely exhausting the pool.
The Knowledges.get_knowledge_bases() function manages its own short-lived session, releasing the connection before the embedding loop begins.
Remove Depends(get_session) from POST /process/files/batch endpoint to prevent database connections from being held during batch embedding API calls (5-60+ seconds for large batches).
The save_docs_to_vector_db() function makes external embedding API calls. Post-embedding file updates (Files.update_file_by_id) manage their own short-lived sessions internally, releasing connections promptly.
Remove Depends(get_session) from POST /reset to prevent catastrophic connection pool exhaustion.
This endpoint was holding a SINGLE database connection while executing N PARALLEL embedding API calls via asyncio.gather(). For a user with 100 memories, this meant one connection blocked for potentially MINUTES (100 calls * 1-5 seconds each, even in parallel due to rate limits).
A single user triggering /reset could completely starve the connection pool, causing QueuePool timeout errors across the entire application.
The Memories.get_memories_by_user_id() function now manages its own short-lived session, releasing the connection immediately before the massive parallel embedding operation begins.
Refactored the file processing status streaming endpoint to avoid holding
a database connection for the entire stream duration (up to 2 hours).
Changes:
- Each status poll now creates its own short-lived database session instead
of capturing the request's session in the generator closure
- Increased poll interval from 0.5s to 1s, halving database queries with
negligible UX impact
This prevents a single file status stream from blocking a connection pool
slot for hours, which could contribute to pool exhaustion under load.