Commit Graph

288 Commits

Author SHA1 Message Date
Derek Wischusen
42be1f956a Add Azure OpenAI embedding support 2025-05-19 22:58:04 -04:00
Marcelo Mendoza
d6ad96affb fix: use get method for title and snippet in search results 2025-05-19 17:24:47 +02:00
Timothy Jaeryang Baek
6692fb2181 chore: format 2025-05-17 01:00:37 +04:00
Kiet Trinh
418ac1a8da refac: Rename Qdrant multi-tenancy variable for improved clarity and consistency 2025-05-15 09:09:24 +00:00
Kiet Trinh
485bd7666c fix: Update Qdrant multi-tenancy variable name for consistency in configuration 2025-05-15 08:02:58 +00:00
LoiTra
184d8dfd7e
feat: Implement Qdrant multi-tenancy support with collection management and tenant isolation 2025-05-15 11:28:06 +07:00
Timothy Jaeryang Baek
b143c71da2 refac: AIOHTTP_CLIENT_SESSION_SSL 2025-05-14 23:33:52 +04:00
Timothy Jaeryang Baek
42382b5167 fix 2025-05-14 22:46:01 +04:00
Timothy Jaeryang Baek
8732b64b6b feat: external document loader support
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
2025-05-14 22:28:40 +04:00
Timothy Jaeryang Baek
de70d0cb64 feat: docling do picture description support 2025-05-14 21:26:49 +04:00
hwzhuhao
6f869ded43 feat:Add vector type and vector factory class for vector database integration 2025-05-14 21:30:50 +08:00
Timothy Jaeryang Baek
6b5f99bf66 fix: external reranker
Some checks failed
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
Python CI / Format Backend (3.11.x) (push) Has been cancelled
Python CI / Format Backend (3.12.x) (push) Has been cancelled
2025-05-10 19:33:34 +04:00
Timothy Jaeryang Baek
c61790b355 chore: format 2025-05-10 19:00:01 +04:00
Timothy Jaeryang Baek
d5fd3b3600 feat: external reranker
Co-Authored-By: Brendan Campbell <20541191+bcambs09@users.noreply.github.com>
2025-05-10 18:25:20 +04:00
PVBLIC Foundation
3f58a17e47
Update pinecone.py
•	Removed the unused Pinecone REST‐client import; we now only import ServerlessSpec and the gRPC client.
	•	Enhanced close()
	•	Call self.client.close() to explicitly shut down the underlying gRPC channel.
	•	Log success or a warning on failure.
	•	Still tear down the thread‐pool executor afterward.
	•	Context‐manager support
	•	Added __enter__()/__exit__() so you can do:

with PineconeClient() as client:
    client.insert(...)
# automatically calls client.close()
2025-05-10 06:07:27 -07:00
PVBLIC Foundation
12c2138982
Update pinecone.py
Refactor and added debug
2025-05-09 18:15:22 -07:00
PVBLIC Foundation
b38711a581
Update pinecone.py 2025-05-08 16:02:47 -07:00
PVBLIC Foundation
04b9065f08
Update pinecone.py
Now supports batched insert, upsert, and delete operations using a default batch size of 100, reducing API strain and improving throughput. All blocking calls to the Pinecone API are wrapped in asyncio.to_thread(...), ensuring async safety and preventing event loop blocking. The implementation includes zero-vector handling for efficient metadata-only queries, normalized cosine distance scores for accurate ranking, and protections against empty input operations. Logs for batch durations have been streamlined to minimize noise, while preserving key info-level success logs.
2025-05-08 15:53:30 -07:00
Matt Harrison
2df9f7fb4d fix: remove import for os module in milvus.py 2025-05-08 00:28:24 -04:00
Matt Harrison
731251d11a refac: streamline Milvus index type handling using configuration options 2025-05-07 23:39:56 -04:00
Matt Harrison
5e46c27806 refac: enhance MilvusClient with dynamic index type and improved logging 2025-05-07 21:51:28 -04:00
Timothy Jaeryang Baek
6359cb55fe chore: format 2025-05-07 02:01:03 +04:00
Tim Jaeryang Baek
ea07e242f5
Merge pull request #13528 from Classic298/dev
feat: Enhance YouTube Transcription Loader for multi-language support
2025-05-07 00:44:45 +04:00
Classic298
1dcbec71ec
Update youtube.py 2025-05-06 17:14:00 +02:00
Classic298
87dcbd198c
Update youtube.py 2025-05-06 17:11:03 +02:00
Classic298
d7927506f1
Update youtube.py 2025-05-06 17:06:21 +02:00
Classic298
f65dc715f9
Update youtube.py 2025-05-06 16:30:18 +02:00
Classic298
c69278c13c
Update youtube.py 2025-05-06 16:24:27 +02:00
Classic298
a129e0954e
Update youtube.py 2025-05-06 16:22:40 +02:00
Classic298
5e1cb76b93
Update youtube.py 2025-05-06 16:16:58 +02:00
Timothy Jaeryang Baek
e63b8b3879 refac
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
2025-05-06 00:46:32 +04:00
Timothy Jaeryang Baek
27da31dc83 fix: tikaloader extract images 2025-05-05 23:40:34 +04:00
Classic298
67a612fe24
Update youtube.py 2025-05-05 20:40:48 +02:00
Classic298
791dd24ace
Update youtube.py 2025-05-05 20:08:25 +02:00
Classic298
9cf3381381
Update youtube.py 2025-05-05 20:07:52 +02:00
Classic298
b0d74a59f1
Update youtube.py 2025-05-05 20:07:37 +02:00
Classic298
1a30b3746e
Update youtube.py 2025-05-05 20:03:00 +02:00
Classic298
0a3817ed86
Update youtube.py 2025-05-05 20:00:10 +02:00
Classic298
0a845db8ec
Update youtube.py 2025-05-05 19:57:21 +02:00
Classic298
7680ac2517
Update youtube.py 2025-05-05 19:57:06 +02:00
Timothy Jaeryang Baek
4cfb99248d chore: format 2025-05-03 23:48:24 +04:00
Athanasios Oikonomou
657162e96d feat(ocr): add support for Docling OCR engine and language configuration
This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.

Fixes #13133
2025-05-03 00:32:06 +03:00
Tim Jaeryang Baek
7d184c3a14
Merge pull request #13085 from ayan4m1/fix/tika-image-ocr
Some checks failed
Deploy to HuggingFace Spaces / check-secret (push) Has been cancelled
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Has been cancelled
Python CI / Format Backend (3.11.x) (push) Has been cancelled
Python CI / Format Backend (3.12.x) (push) Has been cancelled
Frontend Build / Format & Build Frontend (push) Has been cancelled
Frontend Build / Frontend Unit Tests (push) Has been cancelled
Deploy to HuggingFace Spaces / deploy (push) Has been cancelled
Create and publish Docker images with specific build args / merge-main-images (push) Has been cancelled
Create and publish Docker images with specific build args / merge-cuda-images (push) Has been cancelled
Create and publish Docker images with specific build args / merge-ollama-images (push) Has been cancelled
fix: pass extractInlineImages header to Tika if PDF_EXTRACT_IMAGES is true
2025-05-02 03:47:51 -07:00
Tim Jaeryang Baek
61580e9490
Merge pull request #13404 from NoMoreFood/dev
fix: Use SHA256 For Query Result Computation
2025-05-01 04:55:16 -07:00
Bryan Berns
32257089f9 Use SHA256 For Query Result Computation 2025-05-01 03:56:20 -04:00
Alexander Grimm
da9966aca1 ~ truncate vectors for pgvector if too big 2025-04-30 05:35:17 +00:00
Tim Jaeryang Baek
4ee5dd58b7
Merge pull request #13177 from tth37/fix_firecrawl_loader_default_mode
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
fix: FireCrawlLoader default mode to scrape
2025-04-29 08:39:06 -07:00
Tim Jaeryang Baek
e87f2669fa
Merge pull request #13191 from tth37/feat_firecrawl_search_engine
feat: Add Firecrawl search engine
2025-04-29 08:38:28 -07:00
Tim Jaeryang Baek
7b863465a9
Merge pull request #13311 from stephen304/yacy-support
feat: Yacy search support
2025-04-29 08:35:10 -07:00
Stephen Smith
ea16426a8d Remove unused kwargs in yacy, update comments. 2025-04-27 00:41:46 -04:00
Stephen Smith
f9b9217e98 Set Yacy search to text 2025-04-26 23:13:31 -04:00
Stephen Smith
e6d43d70f3 Don't request nav and pass count to Yacy 2025-04-26 23:08:16 -04:00
Stephen Smith
240d91d38d Add yacy config for user/pass, automatically add yacy json api path 2025-04-26 22:28:30 -04:00
Stephen Smith
0f73b96616 first pass at yacy support copied from searxng 2025-04-26 14:07:13 -04:00
tth37
92dbeb1939 feat: Add Firecrawl search engine 2025-04-24 14:57:28 +08:00
tth37
8f7195ceda fix: FireCrawlLoader default mode to scrape 2025-04-24 01:17:35 +08:00
Tim Jaeryang Baek
91e758f3ec
Merge pull request #13165 from feddersen-group/perf/parallel_knowledge_search
perf: all knowledge searches in parallel in non-hybrid mode
2025-04-23 10:01:06 -07:00
Timothy Jaeryang Baek
09874ab83d fix: FireCrawlLoader 2025-04-24 01:40:34 +09:00
Alexander Grimm
d182155fac ~ call knowledge searches in parallel in non-hybrid mode 2025-04-23 09:20:51 +00:00
Tim Jaeryang Baek
faa3cac0e4
Merge pull request #13107 from tth37/fix_tavily_max_results
fix: `max_results` in Tavily search handler
2025-04-22 23:47:36 -07:00
tth37
bc315bd530 fix: max_results in Tavily search api 2025-04-21 20:59:47 +08:00
Athanasios Oikonomou
1e291aff25 feat: Add abstract base class for vector database integration
- Created `VectorDBBase` as an abstract base class to standardize vector database operations.
- Added required methods for common vector database operations: `has_collection`, `delete_collection`, `insert`, `upsert`, `search`, `query`, `get`, `delete`, `reset`.
- The base class can now be extended by any vector database implementation (e.g., Qdrant, Pinecone) to ensure a consistent API across different database systems.
2025-04-21 08:27:27 +03:00
ayan4m1
039dec6820 fix: pass header to Tika if PDF_EXTRACT_IMAGES is true 2025-04-20 17:36:40 +02:00
Athanasios Oikonomou
e000c56ef7 feat(vector-db): add support for Pinecone client
Adds Pinecone as a supported vector database option.

- Implements `PineconeClient` with support for common operations: `add`, `query`, `delete`, `reset`.
- Emulates namespace support using metadata filtering (`collection_name` prefix).
- Dynamically configures Pinecone via the following env vars:
  - `PINECONE_API_KEY`
  - `PINECONE_ENVIRONMENT`
  - `PINECONE_INDEX_NAME`
  - `PINECONE_DIMENSION`
  - `PINECONE_METRIC`
  - `PINECONE_CLOUD`
- Integrates cleanly with the vector DB abstraction layer.
- Includes markdown documentation under `docs/getting-started/env-configuration.md`.

BREAKING CHANGE: None
2025-04-20 11:08:51 +03:00
Tim Jaeryang Baek
87844a8042
Merge pull request #12822 from tth37/feat_external_search_loader
feat: Support for Self-Hosted/External Web Search/Loader Engines
2025-04-18 23:51:27 -07:00
Juan Calderon-Perez
6188c0c5b7 Add suport for Qdrant GRPC 2025-04-17 01:13:49 -04:00
Juan Calderon-Perez
b4d0d840d1
Fix formatting of qdrant.py 2025-04-15 08:56:51 -04:00
Athanasios Oikonomou
575c12f80c feat: add QDRANT_ON_DISK configuration option for Qdrant integration
This commit will allow configuring the on_disk client parameter, to reduce the memory usage.
https://qdrant.tech/documentation/concepts/storage/?q=mmap#configuring-memmap-storage
Default is false, keeping vectors in memory.
2025-04-15 01:40:57 +03:00
tth37
008fec80c1 fix: Update external search/loader method to POST 2025-04-14 18:17:27 +08:00
tth37
22f0365cef format 2025-04-14 02:05:58 +08:00
tth37
839ba22c90 feat: Backend for Self-Hosted/External Web Search/Loader Engines 2025-04-14 01:49:05 +08:00
Timothy Jaeryang Baek
91a455a284 chore: format 2025-04-12 16:35:11 -07:00
Timothy Jaeryang Baek
48a23ce3fe refac: web/rag config 2025-04-12 16:33:36 -07:00
Tim Jaeryang Baek
62ef0bad6f
Merge pull request #12680 from lucyknada/patch-1
fix #12678
2025-04-10 08:46:41 -07:00
Timothy Jaeryang Baek
63e5200e2f refac 2025-04-10 08:46:12 -07:00
Youggls
3e2a6df1fb feat: Add sougou web search API for backend, add config panel in for frontend. 2025-04-10 14:51:44 +08:00
lucy
bc295546cd
fix #12678 2025-04-10 07:23:34 +02:00
Tim Jaeryang Baek
2575dac4ed
Merge pull request #12604 from maurerle/ddg_improve_stacktrace
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
**fix** improve stack trace of duckduckgo exception
2025-04-08 13:03:57 -07:00
Robert Norberg
2337b36609
add debug logging to RAG utils 2025-04-08 12:08:32 -04:00
Florian Maurer
760ea3f4af
duckduckgo: backend api has been deprecated since december
also increase duckduckgo-search version

see 3ee8e08b1c
2025-04-08 14:02:06 +02:00
Florian Maurer
337c7caafa
improve stack trace of duckduckgo exception
* fix search_results out of scope
* ddgs.text does already always return a list
2025-04-08 13:52:23 +02:00
Timothy Jaeryang Baek
65ed76abe1 refac: embedding prefix 2025-04-06 17:17:24 -07:00
Timothy Jaeryang Baek
ef787e4a79
Merge pull request #12486 from FabioPolito24/text-file-handling-docling
fix: text file handling with docling
2025-04-05 09:55:51 -07:00
Fabio Polito
cd0a1b4852 fix: fix for text file handling with docling 2025-04-05 16:44:08 +00:00
Juan Calderon-Perez
324550423c
Fix formatting issues 2025-04-05 10:03:24 -04:00
Phlogi
8cf8121812
Update utils.py
Avoid running any tasks for collections that failed to fetch data (have assigned None)
2025-04-05 10:41:21 +02:00
Patrick Wachter
0ac00b9256
refactor: update import path for MistralLoader 2025-04-02 13:56:10 +02:00
Patrick Wachter
c5a8d2f857
refactor: update MistralLoader documentation and adjust parameters for signed URL retrieval 2025-04-01 20:14:34 +02:00
Patrick Wachter
93d7702e8c
refactor: move MistralLoader to a separate module and just use the requests package instead of mistralai 2025-04-01 20:14:34 +02:00
Patrick Wachter
1ac6879268
Add Mistral OCR integration and configuration support 2025-04-01 14:24:33 +02:00
Timothy Jaeryang Baek
391dd33da3 chore: format 2025-03-31 17:59:21 -07:00
Timothy Jaeryang Baek
3ba12e7a43
Merge pull request #12239 from Phlogi/dev-threads-on-hybrid
perf: parallelize hybrid search
2025-03-31 17:06:32 -07:00
Timothy Jaeryang Baek
cafc5413f5 refac 2025-03-31 14:13:27 -07:00
Phlogi
9c64310db5
Run hybrid_search in parallel 2025-03-31 16:43:37 +02:00
Timothy Jaeryang Baek
4b75966401 refac: embedding prefix var naming 2025-03-30 21:55:15 -07:00
Timothy Jaeryang Baek
433b5bddc1
Merge pull request #8594 from jayteaftw/main
feat: Support for instruct/prefixing embeddings
2025-03-30 21:54:44 -07:00
Timothy Jaeryang Baek
50b8dec3ac fix/refac: hybrid search 2025-03-30 20:48:22 -07:00
Timothy Jaeryang Baek
ce0d82b55f
Merge pull request #12132 from Phlogi/dev-fetch-documents-once
Avoid multiple data fetching
2025-03-30 20:44:32 -07:00
Junaid Pinjari
e782e7d3a7 Fix: CSV loader encoding issue using autodetect_encoding=True 2025-03-29 13:14:53 +05:30
Phlogi
04bf9ddab2
Avoid multiple data fetching 2025-03-27 19:05:20 +01:00