Commit Graph

96 Commits

Author SHA1 Message Date
Athanasios Oikonomou
657162e96d feat(ocr): add support for Docling OCR engine and language configuration
This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.

Fixes #13133
2025-05-03 00:32:06 +03:00
Tim Jaeryang Baek
e87f2669fa
Merge pull request #13191 from tth37/feat_firecrawl_search_engine
feat: Add Firecrawl search engine
2025-04-29 08:38:28 -07:00
Tim Jaeryang Baek
7b863465a9
Merge pull request #13311 from stephen304/yacy-support
feat: Yacy search support
2025-04-29 08:35:10 -07:00
Stephen Smith
240d91d38d Add yacy config for user/pass, automatically add yacy json api path 2025-04-26 22:28:30 -04:00
Stephen Smith
0f73b96616 first pass at yacy support copied from searxng 2025-04-26 14:07:13 -04:00
tth37
92dbeb1939 feat: Add Firecrawl search engine 2025-04-24 14:57:28 +08:00
Timothy Jaeryang Baek
732d7aee70 enh: sentence transformers env vars
Co-Authored-By: DrZoidberg09 <96449693+drzoidberg09@users.noreply.github.com>
2025-04-24 01:55:18 +09:00
Timothy Jaeryang Baek
09874ab83d fix: FireCrawlLoader 2025-04-24 01:40:34 +09:00
Timothy Jaeryang Baek
43efff0fe6 refac 2025-04-22 23:22:50 +09:00
Tim Jaeryang Baek
87844a8042
Merge pull request #12822 from tth37/feat_external_search_loader
feat: Support for Self-Hosted/External Web Search/Loader Engines
2025-04-18 23:51:27 -07:00
Youggls
9669cd3454 fix: use run_in_threadpool for search_web to prevent blocking
Used fastapi's run_in_threadpool function to execute the search_web function,
preventing the synchronous function from blocking the entire web search process.
2025-04-17 17:23:20 +08:00
tth37
85f8e91288 feat: Allow admin editing external search/loader settings 2025-04-14 18:19:26 +08:00
Timothy Jaeryang Baek
70718dda90 refac 2025-04-13 22:31:43 -07:00
tth37
839ba22c90 feat: Backend for Self-Hosted/External Web Search/Loader Engines 2025-04-14 01:49:05 +08:00
Timothy Jaeryang Baek
888b468576 fix 2025-04-12 23:00:34 -07:00
Timothy Jaeryang Baek
4dafbbccfc fix: rag template display issue 2025-04-12 22:55:24 -07:00
tth37
8d53f1e770 fix: small bugs on updated web/rag settings 2025-04-13 12:55:50 +08:00
Timothy Jaeryang Baek
48a23ce3fe refac: web/rag config 2025-04-12 16:33:36 -07:00
tth37
5eac5960ef feat: Add frontend configuration for web loader 2025-04-12 17:13:30 +08:00
Youggls
3e2a6df1fb feat: Add sougou web search API for backend, add config panel in for frontend. 2025-04-10 14:51:44 +08:00
Timothy Jaeryang Baek
914eb49767 chore: include accelerate dependency 2025-04-06 17:44:05 -07:00
Timothy Jaeryang Baek
cbe2056587 fix: audio file upload response issue 2025-04-06 17:31:50 -07:00
Timothy Jaeryang Baek
f243e523a6 refac 2025-04-06 15:52:38 -07:00
Timothy Jaeryang Baek
155dbd5a66 refac 2025-04-06 15:45:48 -07:00
Timothy Jaeryang Baek
9825d03602
Merge pull request #12507 from Ithanil/fix_web_result_collection_source_ids
fix: fix web results all getting the same source id when using embedding and retrieval
2025-04-06 15:43:21 -07:00
Jan Kessler
a506a1a61e
only keep URLs as sources for which the content could actually be retrieved 2025-04-06 20:31:12 +02:00
Jan Kessler
4476060044
fix web results all getting the same source id when using embedding and retrieval 2025-04-06 15:51:05 +02:00
Marko Henning
3b2b6e183d Added missing parameter for query_doc_with_hybrid_search. 2025-04-04 15:30:57 +02:00
Timothy Jaeryang Baek
94bf49440d enh: unload hybrid model if set to False 2025-04-02 18:15:14 -07:00
Patrick Wachter
1ac6879268
Add Mistral OCR integration and configuration support 2025-04-01 14:24:33 +02:00
Timothy Jaeryang Baek
cafc5413f5 refac 2025-03-31 14:13:27 -07:00
Timothy Jaeryang Baek
d542881ee4 refac 2025-03-30 21:55:20 -07:00
Timothy Jaeryang Baek
433b5bddc1
Merge pull request #8594 from jayteaftw/main
feat: Support for instruct/prefixing embeddings
2025-03-30 21:54:44 -07:00
Timothy Jaeryang Baek
4a79320253 chore: format 2025-03-27 01:40:28 -07:00
Timothy Jaeryang Baek
9d834a8e90
Merge branch 'dev' into k_reranker 2025-03-26 20:50:31 -07:00
Marko Henning
41a4cf7106 Added new k_reranker parameter 2025-03-06 10:47:57 +01:00
Fabio Polito
9aa407dbd2 feat: merge with main 2025-03-05 22:04:34 +00:00
Timothy Jaeryang Baek
efe8c4ca69 chore: format 2025-03-01 07:28:00 -08:00
Timothy Jaeryang Baek
d0ddb0637e enh: web embed bypass embedding and retrieval support 2025-02-27 16:34:05 -08:00
Timothy Jaeryang Baek
1b56a8f3cb
Merge pull request #10864 from kurtdami/perplexity_integration
feat: add perplexity integration to web search
2025-02-27 13:51:03 -08:00
kurtdami
b061775932 feat: add perplexity integration to web search 2025-02-27 00:30:48 -08:00
Timothy Jaeryang Baek
57010901e6 enh: bypass embedding and retrieval 2025-02-26 15:42:19 -08:00
Timothy Jaeryang Baek
78a8ef8e66 refac: audio file handling 2025-02-26 13:09:52 -08:00
Timothy Jaeryang Baek
3be5e3129b
Merge pull request #10752 from NovoNordisk-OpenSource/yvedeng/standardize-logging
refactor: replace print statements with logging
2025-02-25 10:53:02 -08:00
Yifang Deng
0e5d5ecb81
refactor: replace print statements with logging for better error tracking 2025-02-25 15:53:55 +01:00
hurxxxx
4cc3102758 feat: onedrive file picker integration 2025-02-25 01:47:07 +09:00
Timothy Jaeryang Baek
b14e75dd6c feat: added Trust Proxy Environment switch in Web Search admin settings tab.
Co-Authored-By: harry zhou <67385896+harryzhou2000@users.noreply.github.com>
2025-02-21 13:40:11 -08:00
Timothy Jaeryang Baek
ab1b910d80
Merge pull request #10486 from Micca/feature/document_intelligence_support
Feat: Adding Support for Azure AI Document Intelligence for Content Extraction (Revised)
2025-02-21 10:56:18 -08:00
Timothy Jaeryang Baek
81715f6553 enh: RAG full context mode 2025-02-18 21:14:58 -08:00
Rory
10e0c81de9 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/retrieval/web/utils.py
#	backend/open_webui/routers/retrieval.py
2025-02-17 21:53:39 -06:00