Commit Graph

212 Commits

Author SHA1 Message Date
Classic298
00b3583dc2 fix: fix reindex not working due to unnecessary dupe check (#20857)
* Update retrieval.py

* Update knowledge.py

* Update retrieval.py

* Update knowledge.py
2026-01-21 18:36:08 -05:00
Timothy Jaeryang Baek
ecbdef732b enh: PDF_LOADER_MODE 2026-01-21 23:51:36 +04:00
Classic298
182d5e8591 fix(db): release connection before embedding in process_files_batch (#20576)
Remove Depends(get_session) from POST /process/files/batch endpoint to prevent database connections from being held during batch embedding API calls (5-60+ seconds for large batches).

The save_docs_to_vector_db() function makes external embedding API calls. Post-embedding file updates (Files.update_file_by_id) manage their own short-lived sessions internally, releasing connections promptly.
2026-01-11 23:32:56 +04:00
G30
4b4743b497 feat: enforce permissions in backend (#20471)
* feat: enforce image generation permissions in backend

* feat: enforce web search permissions in backend

* feat: enforce audio (tts/stt) permissions in backend
2026-01-08 02:48:35 +04:00
Timothy Jaeryang Baek
1d08376860 refac 2026-01-05 18:55:44 +04:00
Timothy Jaeryang Baek
d3ab9f4b96 fix: failed hash in files 2026-01-05 18:21:00 +04:00
Classic298
614cb56420 feat: Add configurable DDGS backend selection with UI support (#20366)
* init

* Update WebSearch.svelte

* reorder
2026-01-05 03:05:56 +04:00
Timothy Jaeryang Baek
dc2c2f2295 refac 2026-01-03 19:48:37 +04:00
Timothy Jaeryang Baek
c324359580 feat: chunk min size target for md header splitter
Co-Authored-By: Classic298 <27028174+Classic298@users.noreply.github.com>
2026-01-03 19:47:29 +04:00
Timothy Jaeryang Baek
f7f8a263b9 feat: JINA_API_BASE_URL 2026-01-01 02:17:47 +04:00
Timothy Jaeryang Baek
89ad1c68d1 enh: FIRECRAWL_TIMEOUT 2026-01-01 02:07:22 +04:00
Classic298
431632d530 fix: normalize local CrossEncoder reranking scores for relevance threshold (#20228)
* Update utils.py

* Update retrieval.py

* Update utils.py

* Update retrieval.py

* add env var

* rename to SENTENCE_TRANSFORMERS_CROSS_ENCODER_SIGMOID_ACTIVATION_FUNCTION
2025-12-31 15:48:31 -05:00
Classic298
201c38a08a fix: prevent delete_entries_from_collection crash when file is None (#20274)
Add null check after Files.get_file_by_id() before accessing file.hash. Raises HTTP 404 instead of crashing with AttributeError when file doesn't exist.
2025-12-31 02:31:26 -05:00
Classic298
46f867cda6 fix: prevent save_docs_to_vector_db crash on empty result.ids (#20275)
Add check that result.ids exists and has length > 0 before accessing result.ids[0]. Prevents IndexError when query returns empty results.
2025-12-31 02:31:05 -05:00
Timothy Jaeryang Baek
08bf4670ec refac 2025-12-30 19:38:45 +04:00
Timothy Jaeryang Baek
18a33a079b refac 2025-12-30 19:33:30 +04:00
Timothy Jaeryang Baek
d3a682759f enh: ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER 2025-12-30 19:31:59 +04:00
Timothy Jaeryang Baek
b1d0f00d8c refac/enh: db session sharing 2025-12-29 00:21:18 +04:00
Timothy Jaeryang Baek
c96549eaa7 refac 2025-12-21 18:08:36 +04:00
Classic298
4fd790f7dd feat: Apply WEB_SEARCH_CONCURRENT_REQUESTS to all search engines using semaphore (#20070)
* sequential

* zero default

* fix
2025-12-21 07:18:00 -05:00
Classic298
48ccb1e170 fix: consolidate psql cleanup logic and fix web add with cleanup (#20072)
* sequential

* consolidate logic and fix for web add

* Update WebSearch.svelte

* Update retrieval.py

* Update retrieval.py

* Update WebSearch.svelte
2025-12-21 07:14:29 -05:00
okamototk
37085ed42b chore: update langchain 1.2.0 (#19991)
* chore: update langchain 1.2.0

* chore: format
2025-12-20 08:50:44 -05:00
Classic298
2e7c7d635d fix: prevent ExternalReranker from blocking event loop during RAG queries (#20049)
* fix: prevent ExternalReranker from blocking event loop during RAG queries (#120)

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
Fixes #19900

* Merge pull request open-webui#19030 from open-webui/dev (#122)

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
Fixes #19900

---------

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-20 08:43:40 -05:00
Timothy Jaeryang Baek
afaa404fe4 enh: mineru api timeout 2025-12-20 17:39:33 +04:00
Classic298
823b9a6dd9 chore/perf: Remove old SRC level log env vars with no impact (#20045)
* Update openai.py

* Update env.py

* Merge pull request open-webui#19030 from open-webui/dev (#119)

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-20 08:16:14 -05:00
Boris Bocquet
bc681f8258 feat : new environment variable SEARXNG_LANGUAGE , in the persistent config, that you can also edit in Admin > Web Search pannel in case you choose Searxng. This is used in the request to searxng as the "search language" (arguement "language"). Before this feature, it was set to en-US only. Now default is "all". (#19909) 2025-12-14 12:38:47 -05:00
Timothy Jaeryang Baek
b02397e460 feat: WEB_LOADER_TIMEOUT 2025-12-08 11:49:27 -05:00
Classic298
1779090bdb fix: add missing env var parameter pass through for enable async embedding (#19748)
* Add enable_async parameter to embedding function

* Add enable_async parameter to RAG configuration
2025-12-04 14:59:09 -05:00
Henne
a7e614ca4c feat: Adds document intelligence model configuration (#19692)
* Adds document intelligence model configuration

Enables the configuration of the Document Intelligence model to be used by the RAG pipeline.

This allows users to specify the model they want to use for document processing, providing flexibility and control over the extraction process.

* Added Titel to Document Intelligence Model Config

Added Titel to Document Intelligence Model Config
2025-12-02 14:41:09 -05:00
Timothy Jaeryang Baek
6ce9afd95d refac 2025-12-02 09:21:03 -05:00
Timothy Jaeryang Baek
4370dee79e fix: async save docs to vector db 2025-11-25 17:19:33 -05:00
Timothy Jaeryang Baek
8b2015a97b refac 2025-11-25 16:28:06 -05:00
Timothy Jaeryang Baek
6235243b62 refac 2025-11-25 05:07:53 -05:00
Timothy Jaeryang Baek
488631db98 refac 2025-11-25 02:05:27 -05:00
Timothy Jaeryang Baek
2328dc284e feat/enh: async embedding processing setting
Co-Authored-By: Classic298 <27028174+Classic298@users.noreply.github.com>
2025-11-25 01:55:43 -05:00
Timothy Jaeryang Baek
9c19d0abd4 refac/breaking: docling params 2025-11-24 16:01:13 -05:00
Timothy Jaeryang Baek
48d1e67e79 chore: format 2025-11-23 20:15:52 -05:00
Classic298
902c6cfbea perf: 50x performance improvement for external embeddings (#19296)
* Update utils.py (#77)

Co-authored-by: Claude <noreply@anthropic.com>

* refactor: address code review feedback for embedding performance improvements (#92)

Co-authored-by: Claude <noreply@anthropic.com>

* fix: prevent sentence transformers from blocking async event loop (#95)

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-22 20:54:59 -05:00
Jacob Leksan
07ef295a77 feat: Adding file metadata to hybrid search (#19095)
* Added metadata to hybrid search

* And config and env plus refac

* consistency

---------

Co-authored-by: Tim Baek <tim@openwebui.com>
2025-11-18 15:29:07 -05:00
Timothy Jaeryang Baek
42071cb8e8 refac 2025-11-18 15:27:26 -05:00
Sang Lê
64747f7f79 Add Azure Search (#19104)
Co-authored-by: Tim Baek <tim@openwebui.com>
2025-11-13 19:12:34 -05:00
Classic298
ad17d35ac4 feat: Add custom API endpoint and user info headers for Perplexity Search (#31) (#19147)
Co-authored-by: Claude <noreply@anthropic.com>
2025-11-12 22:53:54 -05:00
Timothy Jaeryang Baek
413fa27b18 refac 2025-11-09 21:09:59 -05:00
Timothy Jaeryang Baek
a65cc196a5 refac: batch file processing
Co-Authored-By: Sihyeon Jang <24850223+sihyeonn@users.noreply.github.com>
2025-11-09 21:06:21 -05:00
Timothy Jaeryang Baek
e69c2cf3f6 refac 2025-11-09 16:12:38 -05:00
Timothy Jaeryang Baek
25c7f101f2 enh: optionally add user headers external websearch
Co-Authored-By: Classic298 <27028174+Classic298@users.noreply.github.com>
2025-11-09 16:09:29 -05:00
Timothy Jaeryang Baek
e2b9942648 feat: Optionally forward user headers to external document loader
Co-Authored-By: Classic298 <27028174+Classic298@users.noreply.github.com>
2025-11-06 00:05:46 -05:00
Timothy Jaeryang Baek
415b93c7c3 enh: configurable mistral ocr base url 2025-11-05 23:25:51 -05:00
Timothy Jaeryang Baek
a4fd26b478 enh/fix: google pse referer header 2025-11-04 13:50:07 -05:00
palazski
288b323df8 feat: use MINERU_PARAMS json field for mineru settings 2025-10-15 22:59:59 +03:00