Commit Graph

286 Commits

Author SHA1 Message Date
Stephen Smith
240d91d38d Add yacy config for user/pass, automatically add yacy json api path 2025-04-26 22:28:30 -04:00
Stephen Smith
0f73b96616 first pass at yacy support copied from searxng 2025-04-26 14:07:13 -04:00
tth37
92dbeb1939 feat: Add Firecrawl search engine 2025-04-24 14:57:28 +08:00
tth37
8f7195ceda fix: FireCrawlLoader default mode to scrape 2025-04-24 01:17:35 +08:00
Tim Jaeryang Baek
91e758f3ec
Merge pull request #13165 from feddersen-group/perf/parallel_knowledge_search
perf: all knowledge searches in parallel in non-hybrid mode
2025-04-23 10:01:06 -07:00
Timothy Jaeryang Baek
09874ab83d fix: FireCrawlLoader 2025-04-24 01:40:34 +09:00
Alexander Grimm
d182155fac ~ call knowledge searches in parallel in non-hybrid mode 2025-04-23 09:20:51 +00:00
Tim Jaeryang Baek
faa3cac0e4
Merge pull request #13107 from tth37/fix_tavily_max_results
fix: `max_results` in Tavily search handler
2025-04-22 23:47:36 -07:00
tth37
bc315bd530 fix: max_results in Tavily search api 2025-04-21 20:59:47 +08:00
Athanasios Oikonomou
1e291aff25 feat: Add abstract base class for vector database integration
- Created `VectorDBBase` as an abstract base class to standardize vector database operations.
- Added required methods for common vector database operations: `has_collection`, `delete_collection`, `insert`, `upsert`, `search`, `query`, `get`, `delete`, `reset`.
- The base class can now be extended by any vector database implementation (e.g., Qdrant, Pinecone) to ensure a consistent API across different database systems.
2025-04-21 08:27:27 +03:00
ayan4m1
039dec6820 fix: pass header to Tika if PDF_EXTRACT_IMAGES is true 2025-04-20 17:36:40 +02:00
Athanasios Oikonomou
e000c56ef7 feat(vector-db): add support for Pinecone client
Adds Pinecone as a supported vector database option.

- Implements `PineconeClient` with support for common operations: `add`, `query`, `delete`, `reset`.
- Emulates namespace support using metadata filtering (`collection_name` prefix).
- Dynamically configures Pinecone via the following env vars:
  - `PINECONE_API_KEY`
  - `PINECONE_ENVIRONMENT`
  - `PINECONE_INDEX_NAME`
  - `PINECONE_DIMENSION`
  - `PINECONE_METRIC`
  - `PINECONE_CLOUD`
- Integrates cleanly with the vector DB abstraction layer.
- Includes markdown documentation under `docs/getting-started/env-configuration.md`.

BREAKING CHANGE: None
2025-04-20 11:08:51 +03:00
Tim Jaeryang Baek
87844a8042
Merge pull request #12822 from tth37/feat_external_search_loader
feat: Support for Self-Hosted/External Web Search/Loader Engines
2025-04-18 23:51:27 -07:00
Juan Calderon-Perez
6188c0c5b7 Add suport for Qdrant GRPC 2025-04-17 01:13:49 -04:00
Juan Calderon-Perez
b4d0d840d1
Fix formatting of qdrant.py 2025-04-15 08:56:51 -04:00
Athanasios Oikonomou
575c12f80c feat: add QDRANT_ON_DISK configuration option for Qdrant integration
This commit will allow configuring the on_disk client parameter, to reduce the memory usage.
https://qdrant.tech/documentation/concepts/storage/?q=mmap#configuring-memmap-storage
Default is false, keeping vectors in memory.
2025-04-15 01:40:57 +03:00
tth37
008fec80c1 fix: Update external search/loader method to POST 2025-04-14 18:17:27 +08:00
tth37
22f0365cef format 2025-04-14 02:05:58 +08:00
tth37
839ba22c90 feat: Backend for Self-Hosted/External Web Search/Loader Engines 2025-04-14 01:49:05 +08:00
Timothy Jaeryang Baek
91a455a284 chore: format 2025-04-12 16:35:11 -07:00
Timothy Jaeryang Baek
48a23ce3fe refac: web/rag config 2025-04-12 16:33:36 -07:00
Tim Jaeryang Baek
62ef0bad6f
Merge pull request #12680 from lucyknada/patch-1
fix #12678
2025-04-10 08:46:41 -07:00
Timothy Jaeryang Baek
63e5200e2f refac 2025-04-10 08:46:12 -07:00
Youggls
3e2a6df1fb feat: Add sougou web search API for backend, add config panel in for frontend. 2025-04-10 14:51:44 +08:00
lucy
bc295546cd
fix #12678 2025-04-10 07:23:34 +02:00
Tim Jaeryang Baek
2575dac4ed
Merge pull request #12604 from maurerle/ddg_improve_stacktrace
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
**fix** improve stack trace of duckduckgo exception
2025-04-08 13:03:57 -07:00
Robert Norberg
2337b36609
add debug logging to RAG utils 2025-04-08 12:08:32 -04:00
Florian Maurer
760ea3f4af
duckduckgo: backend api has been deprecated since december
also increase duckduckgo-search version

see 3ee8e08b1c
2025-04-08 14:02:06 +02:00
Florian Maurer
337c7caafa
improve stack trace of duckduckgo exception
* fix search_results out of scope
* ddgs.text does already always return a list
2025-04-08 13:52:23 +02:00
Timothy Jaeryang Baek
65ed76abe1 refac: embedding prefix 2025-04-06 17:17:24 -07:00
Timothy Jaeryang Baek
ef787e4a79
Merge pull request #12486 from FabioPolito24/text-file-handling-docling
fix: text file handling with docling
2025-04-05 09:55:51 -07:00
Fabio Polito
cd0a1b4852 fix: fix for text file handling with docling 2025-04-05 16:44:08 +00:00
Juan Calderon-Perez
324550423c
Fix formatting issues 2025-04-05 10:03:24 -04:00
Phlogi
8cf8121812
Update utils.py
Avoid running any tasks for collections that failed to fetch data (have assigned None)
2025-04-05 10:41:21 +02:00
Patrick Wachter
0ac00b9256
refactor: update import path for MistralLoader 2025-04-02 13:56:10 +02:00
Patrick Wachter
c5a8d2f857
refactor: update MistralLoader documentation and adjust parameters for signed URL retrieval 2025-04-01 20:14:34 +02:00
Patrick Wachter
93d7702e8c
refactor: move MistralLoader to a separate module and just use the requests package instead of mistralai 2025-04-01 20:14:34 +02:00
Patrick Wachter
1ac6879268
Add Mistral OCR integration and configuration support 2025-04-01 14:24:33 +02:00
Timothy Jaeryang Baek
391dd33da3 chore: format 2025-03-31 17:59:21 -07:00
Timothy Jaeryang Baek
3ba12e7a43
Merge pull request #12239 from Phlogi/dev-threads-on-hybrid
perf: parallelize hybrid search
2025-03-31 17:06:32 -07:00
Timothy Jaeryang Baek
cafc5413f5 refac 2025-03-31 14:13:27 -07:00
Phlogi
9c64310db5
Run hybrid_search in parallel 2025-03-31 16:43:37 +02:00
Timothy Jaeryang Baek
4b75966401 refac: embedding prefix var naming 2025-03-30 21:55:15 -07:00
Timothy Jaeryang Baek
433b5bddc1
Merge pull request #8594 from jayteaftw/main
feat: Support for instruct/prefixing embeddings
2025-03-30 21:54:44 -07:00
Timothy Jaeryang Baek
50b8dec3ac fix/refac: hybrid search 2025-03-30 20:48:22 -07:00
Timothy Jaeryang Baek
ce0d82b55f
Merge pull request #12132 from Phlogi/dev-fetch-documents-once
Avoid multiple data fetching
2025-03-30 20:44:32 -07:00
Junaid Pinjari
e782e7d3a7 Fix: CSV loader encoding issue using autodetect_encoding=True 2025-03-29 13:14:53 +05:30
Phlogi
04bf9ddab2
Avoid multiple data fetching 2025-03-27 19:05:20 +01:00
Timothy Jaeryang Baek
4a79320253 chore: format 2025-03-27 01:40:28 -07:00
Timothy Jaeryang Baek
7490bc9100
Merge branch 'dev' into fix-db-order 2025-03-26 20:55:42 -07:00
Timothy Jaeryang Baek
9d834a8e90
Merge branch 'dev' into k_reranker 2025-03-26 20:50:31 -07:00
Marko Henning
7531b7dcaa Satisfy github format check 2025-03-25 19:09:17 +01:00
Iván Baldo
115e46a6a2 Fix: Tika 3.1.0.0 sends a lot of blank lines which degrades the RAG results, strip them. 2025-03-25 14:53:14 -03:00
Marko Henning
94d9d3d590 Fix: Normalze all database distances to score in [0, 1] 2025-03-25 16:46:14 +01:00
Timothy Jaeryang Baek
38d524f6a0 chore: format 2025-03-24 11:35:32 -07:00
Jonathan Flower
bdd236fa3a improved error handling for deleting collections that do not exist in chromadb 2025-03-22 09:59:06 -04:00
Timothy Jaeryang Baek
8aa6dade41
Merge pull request #11876 from mahenning/fix--rag-sorting
Fix: wrong citation order for chromadb, wrong order for hybrid search
2025-03-20 17:54:22 -07:00
Timothy Jaeryang Baek
9b20ef4922 refac 2025-03-20 14:01:47 -07:00
genjuro
07098c6352 perf: set shorter timeout for playwright and make it configurable 2025-03-20 15:28:09 +08:00
Marko Henning
5f48af5b91 Revert the ordering change with chromadb, not necessary with reranker results 2025-03-19 17:04:45 +01:00
Marko Henning
ec8fc727b8 Fix wrong order for chromadb 2025-03-19 16:06:10 +01:00
leilibj
3e8546135d
fix: correct incorrect usage of log.exception method 2025-03-19 13:04:34 +08:00
Marko Henning
5ab789e83e Add documentation on chroma special case 2025-03-18 16:44:58 +01:00
Marko Henning
ba676b7ed6 Use k_reranker also for result merge, and add special sorting use case for ChromaDB 2025-03-18 16:25:24 +01:00
Marko Henning
f13948d805 Fixed typo 2025-03-18 12:14:59 +01:00
Marko Henning
c877b59cbc Address edge case with k < k_reranker, sort results for cutting off 2025-03-18 11:31:17 +01:00
orenzhang
c761e4fd08
feat(trace): opentelemetry instrument 2025-03-10 22:27:31 +08:00
Fabio Polito
9d6743824e fix: fix params DoclingLoader 2025-03-09 16:12:14 +00:00
Fabio Polito
0aa42615f9 Merge remote-tracking branch 'upstream/dev' into docling_context_extraction_engine
merge upstream
2025-03-08 18:52:51 +00:00
Timothy Jaeryang Baek
22b88f9593
Merge pull request #11324 from kela4/main
fix: opensearch vector db query structures, result mapping, filters, bulk query actions, knn_vector usage
2025-03-08 12:19:38 -04:00
Luke
7917128ed3 enh: enable configuration for tavily extract depth 2025-03-08 00:43:02 -05:00
Fabio Polito
e3eef58310 feat: merge with dev 2025-03-07 00:22:47 +00:00
Luke
987954c817 feat: Add Tavily extract web loader integration 2025-03-06 18:15:18 -05:00
Katharina
6cb0c0339a fix: opensearch vector db query structures, result mapping, filters, bulk query actions, knn_vector usage 2025-03-06 23:49:54 +01:00
Fabio Polito
98857184ff Merge remote-tracking branch 'upstream/dev' into docling_context_extraction_engine
merge with dev branch
2025-03-06 12:12:50 +00:00
Marko Henning
41a4cf7106 Added new k_reranker parameter 2025-03-06 10:47:57 +01:00
Timothy Jaeryang Baek
d4fca9dabf chore: format 2025-03-05 19:17:41 -08:00
Fabio Polito
0716f96da8 style: change style in DoclingLoader 2025-03-05 23:15:55 +00:00
Fabio Polito
9aa407dbd2 feat: merge with main 2025-03-05 22:04:34 +00:00
ofek
a8f205213c fixed es bugs 2025-03-05 23:19:56 +02:00
Fabio Polito
a44b35e99e fix: fix DoclingLoader input params 2025-03-05 17:53:45 +00:00
Timothy Jaeryang Baek
7b442e4be0
Merge pull request #11141 from Youggls/dev
fix: correct parameter name for MilvusClient instantiation
2025-03-04 00:54:49 -08:00
Timothy Jaeryang Baek
39ea59edc8 chore: format 2025-03-04 00:32:27 -08:00
Perry Li
67ed61d022
fixbug: correct parameter name for MilvusClient instantiation
Replace incorrect parameter 'database=MILVUS_DB' with valid 'db_name=MILVUS_DB'
2025-03-04 16:02:19 +08:00
ofek
737dfd2763 added elasticsearch support 2025-03-03 23:39:42 +02:00
Timothy Jaeryang Baek
6471f12668
Merge pull request #11033 from dtaivpp/main
fix: Changed to use collection_name and fixed bulk indexing missing index.
2025-03-01 16:00:13 -08:00
David Tippett
f3c4c2b8e3
Changed to use colleciton name and fixed bulk indexing missing index. 2025-03-01 13:26:19 -05:00
Timothy Jaeryang Baek
d0ddb0637e enh: web embed bypass embedding and retrieval support 2025-02-27 16:34:05 -08:00
Timothy Jaeryang Baek
1b56a8f3cb
Merge pull request #10864 from kurtdami/perplexity_integration
feat: add perplexity integration to web search
2025-02-27 13:51:03 -08:00
kurtdami
b061775932 feat: add perplexity integration to web search 2025-02-27 00:30:48 -08:00
Timothy Jaeryang Baek
ce7cf62a55 refac: dedup 2025-02-26 23:51:39 -08:00
Timothy Jaeryang Baek
ddb30589e3 chore: format
HIDE MODELS
2025-02-26 22:18:18 -08:00
Timothy Jaeryang Baek
57010901e6 enh: bypass embedding and retrieval 2025-02-26 15:42:19 -08:00
Timothy Jaeryang Baek
34aeaaf020 refac 2025-02-26 13:54:26 -08:00
Timothy Jaeryang Baek
46ac6f2b29 fix 2025-02-26 12:53:07 -08:00
Timothy Jaeryang Baek
33d3558ca9
Merge pull request #10817 from NovoNordisk-OpenSource/ivaroli/adding-json-as-supported-file-type
fix: Using the TextLoader instead of Tika for JSON files
2025-02-26 12:49:29 -08:00
Ívar Óli Sigurðsson
c5a09cdd21 adding a comma 2025-02-26 15:27:03 +01:00
Ívar Óli Sigurðsson
661711164a Adding json as a known source for Tika 2025-02-26 15:11:21 +01:00
Timothy Jaeryang Baek
3be5e3129b
Merge pull request #10752 from NovoNordisk-OpenSource/yvedeng/standardize-logging
refactor: replace print statements with logging
2025-02-25 10:53:02 -08:00
Yifang Deng
0e5d5ecb81
refactor: replace print statements with logging for better error tracking 2025-02-25 15:53:55 +01:00
Timothy Jaeryang Baek
ab1b910d80
Merge pull request #10486 from Micca/feature/document_intelligence_support
Feat: Adding Support for Azure AI Document Intelligence for Content Extraction (Revised)
2025-02-21 10:56:18 -08:00
Timothy Jaeryang Baek
93d486d50e revert: faulty dedup code 2025-02-20 11:02:45 -08:00
Timothy Jaeryang Baek
eeb00a5ca2 chore: format 2025-02-20 01:01:29 -08:00
Youggls
0fb3c08181 feat: Add Firecrawl web loader integration 2025-02-19 16:54:44 +08:00
Timothy Jaeryang Baek
c073b8b4ee refac 2025-02-18 23:49:27 -08:00
Timothy Jaeryang Baek
5465cabd40 refac 2025-02-18 21:17:09 -08:00
Timothy Jaeryang Baek
81715f6553 enh: RAG full context mode 2025-02-18 21:14:58 -08:00
Timothy Jaeryang Baek
1bbecd46c8
Merge pull request #10052 from roryeckel/playwright
Support Playwright RAG Web Loader: Revised
2025-02-18 19:57:48 -08:00
Timothy Jaeryang Baek
4ef7aff663 refac 2025-02-18 19:35:22 -08:00
mikhail-khludnev
925bfe840b dedupe results from multiple queries 2025-02-18 20:10:57 +03:00
Rory
10e0c81de9 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/retrieval/web/utils.py
#	backend/open_webui/routers/retrieval.py
2025-02-17 21:53:39 -06:00
Rory
bc82f48ebf refac: RAG_WEB_LOADER -> RAG_WEB_LOADER_ENGINE 2025-02-17 21:43:32 -06:00
Timothy Jaeryang Baek
ba6cde8a87 fix: include_domain does NOT exist 2025-02-17 19:20:49 -08:00
Timothy Jaeryang Baek
dbe5d1ca08 refac 2025-02-17 18:16:23 -08:00
Timothy Jaeryang Baek
ca0b7217d2 enh: full context web search 2025-02-17 18:14:26 -08:00
Rory
66c2acc08d Merge branch 'dev' into playwright 2025-02-15 22:14:16 -06:00
Timothy Jaeryang Baek
b0ad5cd863
Merge pull request #10076 from crizCraig/local_date
fix: return local date from `getFormattedDate`
2025-02-15 20:10:56 -08:00
Timothy Jaeryang Baek
3d0c06ccee refac: duckduckgo 2025-02-15 16:45:56 -08:00
Craig Quiter
e67eb89e05 style: black format 2025-02-15 10:53:16 -08:00
Rory
8e9b00a017 Fix docstring 2025-02-14 22:48:15 -06:00
Rory
aa2b764d74 Finalize incomplete merge to update playwright branch
Introduced feature parity for trust_env
2025-02-14 22:32:45 -06:00
Rory
4da220c513 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/config.py
#	backend/open_webui/main.py
#	backend/open_webui/retrieval/web/utils.py
#	backend/open_webui/routers/retrieval.py
#	backend/open_webui/utils/middleware.py
#	pyproject.toml
2025-02-14 20:48:22 -06:00
Guofeng Yi
b38acc8559
Merge branch 'dev' into feate-webloader-support-proxy 2025-02-15 09:50:02 +08:00
Timothy Jaeryang Baek
3e543691a4
Merge pull request #9988 from Yimi81/feat-support-async-load
feat: websearch support async docs load
2025-02-14 14:10:46 -08:00
LiuC0j
5ca39eb9fd
Update tavily.py 2025-02-14 14:56:01 +01:00
Fabio Polito
2419ef06a0 feat: docling support for document preprocessing 2025-02-14 12:08:03 +00:00
Yimi81
d3f71930f0 web loader support proxy 2025-02-14 07:15:09 +00:00
Yimi81
ceef600223 support async load for websearch 2025-02-14 07:05:10 +00:00
xring
27d395ba06 feat: add web search via SerpApi 2025-02-14 12:24:58 +08:00
Timothy Jaeryang Baek
5626426c31 chore: format 2025-02-12 23:28:57 -08:00
Rory
40d4db97e6 Merge remote-tracking branch 'upstream/dev' into playwright 2025-02-12 22:32:44 -06:00
Timothy Jaeryang Baek
a5bba20915
Merge pull request #9837 from silverriver/patch-1
feat Make Google PSE search return more than 10 google search results
2025-02-11 21:36:53 -08:00
Silver
7e08373ae5
Update google_pse.py to return results more than 10 2025-02-12 13:01:09 +08:00
Timothy Jaeryang Baek
8906a2e260
Merge pull request #9803 from BochaAI/main
add Bocha
2025-02-11 21:01:04 -08:00
luckyman-yan
31360fe991 add Bocha 2025-02-10 16:44:47 +08:00
Timothy Jaeryang Baek
60095598ec chore: format 2025-02-09 22:20:47 -08:00
Rory
2c711d8365 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/requirements.txt
2025-02-09 23:52:21 -06:00
Timothy Jaeryang Baek
d5a815b19c
Merge pull request #9693 from vinsdragonis/main
fix: Fixed error occurring when using OpenSearch as a vector db
2025-02-09 13:06:19 -08:00
Mazurek Michal
35f3824932 feat: Implement Document Intelligence as Content Extraction Engine 2025-02-07 13:44:47 +01:00
binxn
88db4ca7ba
Update jina_search.py
Updated Jina's search function in order to use POST and make use of the result count passed by the user

Note: Jina supports a max of 10 result count
2025-02-06 14:30:27 +01:00
Vineeth B V
7c78facfd9
Update opensearch.py 2025-02-06 13:36:11 +05:30
Vineeth B V
fd6b039859
Added a query method for OpenSearch vector db.
- This PR aims to address the error 400: "**'OpenSearchClient' object has no attribute 'query'**".
- With the implemented query() method, this issue should be resolved and allow uploaded documents to be vectorized and retrieved based on the given query.
2025-02-06 12:04:14 +05:30
Rory
ec6fe9939b Merge remote-tracking branch 'upstream/dev' into playwright 2025-02-05 17:47:58 -06:00
JT
40dea3fbe1
Merge branch 'dev' into main 2025-02-05 15:15:24 -08:00
jayteaftw
157c781b0a Merge branch 'main' of https://github.com/jayteaftw/open-webui 2025-02-05 14:07:59 -08:00
jayteaftw
6d2f87e904 Added server side Prefixing 2025-02-05 14:03:16 -08:00
Timothy Jaeryang Baek
e41a2682f5 chore: format 2025-02-05 00:07:45 -08:00
Timothy Jaeryang Baek
f6f8c08cb0
Merge pull request #9068 from df-cgdm/main
**feat** Add user related headers when calling an external embedding api
2025-02-05 00:05:44 -08:00
Timothy Jaeryang Baek
5cda8a57e7
Merge pull request #9337 from abdalrohman/exa_integration
feat: implement Exa search engine integration
2025-02-04 14:00:06 -08:00
JT
81102f4be2
Merge branch 'open-webui:main' into main 2025-02-04 13:06:04 -08:00