Commit Graph

171 Commits

Author SHA1 Message Date
Michael Poluektov
038fc48ac0 replace == None with is None 2024-08-14 13:39:53 +01:00
Alexandre GODARD
7a8f8960c5
Update main.py
Fix typo in update_reranking_model
2024-08-13 17:51:25 +02:00
José Luis Di Biase
23c9122458 chore RAG: adding languages known extension for erlang, elixir, haskell and jsx/tsx
Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar>
2024-07-18 17:48:39 -03:00
Timothy J. Baek
dbc352f01b refac: documents file handling 2024-07-15 13:05:38 +02:00
Timothy Jaeryang Baek
7e6c5193d6
Merge pull request #3688 from leobenkel/no-trace-when-success
fix: Remove the tracestack when the collection already exists
2024-07-07 09:00:23 -07:00
Leo Benkel
a73a9c7310 Remove the tracestack when the collection already exists 2024-07-06 23:20:41 +02:00
Timothy J. Baek
a392865615 refac 2024-07-01 17:11:09 -07:00
Timothy Jaeryang Baek
3c1ea24374
Merge pull request #3582 from nickovs/tika-document-text
feat: Support Tika for document text extraction
2024-07-01 17:07:40 -07:00
Nicko van Someren
7aa35a3757 Added HTML and Typescript UI components to support configration of text extraction engine.
Updated RAG /config and /config/update endpoints to support UI updates.

Fixed .dockerignore to prevent Python venv from being copied into Docker image.
2024-07-01 12:10:59 -06:00
Jun Siang Cheah
a48ac6a209 refac: lazily load sentence_transformers to reduce start up memory usage 2024-07-01 08:13:56 +08:00
Nicko van Someren
9cf622d981 Added support for using Apache Tika as a document loader.
Added persistent configuration options to configure use and location of Tika service.

Updated backend.apps.rag.main:get_loader() to make use of Tika document loader.
2024-06-30 15:49:15 -06:00
Timothy J. Baek
3f5f410453 refac 2024-06-27 11:29:59 -07:00
Timothy Jaeryang Baek
fd96c9c68d
Merge pull request #3380 from Yash-1511/main
feat: add jina_search as new websearch provider
2024-06-22 15:19:38 -07:00
Yash-1511
7c9fb9199e feat: add jina_search as new websearch provider 2024-06-22 20:06:15 +05:30
Timothy J. Baek
83986620ee refac 2024-06-18 14:15:08 -07:00
Timothy J. Baek
9e7b7a895e refac: file upload 2024-06-18 13:50:18 -07:00
Timothy J. Baek
b1d83fc42c chore: format 2024-06-17 14:32:23 -07:00
Que Nguyen
a3ac9ee774
Refactor main.py
Rename RAG_WEB_SEARCH_WHITE_LIST_DOMAINS to RAG_WEB_SEARCH_DOMAIN_FILTER_LIST
2024-06-17 14:31:44 +07:00
Que Nguyen
a02ba52de8
Merge branch 'dev' into searxng 2024-06-15 23:44:31 +07:00
Yash-1511
b9da72560a feat: add tavily web search in web search provider 2024-06-14 20:44:11 +05:30
Que Nguyen
7b5f434a07
Implement domain whitelisting for web search results 2024-06-13 07:14:48 +07:00
Timothy J. Baek
1163745a03 revert 2024-06-12 11:08:05 -07:00
Timothy J. Baek
c0ca447041 chore: format 2024-06-12 01:37:53 -07:00
Timothy Jaeryang Baek
5d3db15eca
Merge pull request #3049 from que-nguyen/dev
Refactor URL validation function
2024-06-12 01:36:34 -07:00
Timothy J. Baek
e8fc522eba chore: format 2024-06-12 00:18:22 -07:00
Que Nguyen
eb7bba81fe
Refactor URL validation function
- The check for private IP addresses often did not yield the expected results, especially with errors like: `[Errno -2] Name or service not known`.
- Removed the check for private IP addresses in the URL validation process.
- Simplified the `validate_url` function to focus on validating the URL format and checking the existence of the URL using a HEAD request.
2024-06-12 08:15:04 +07:00
Timothy Jaeryang Baek
d709038b5b
Merge pull request #3029 from Yash-1511/main
feat: add DuckDuckGo search functionality using duckduckgo_search library
2024-06-11 09:53:26 -07:00
Que Nguyen
3bec60b80c
Fixed the issue where a single URL error disrupts the data loading process in Web Search mode
To address the unresolved issue in the LangChain library where a single URL error disrupts the data loading process, the lazy_load method in the WebBaseLoader class has been modified. The enhanced method now handles exceptions appropriately, logging errors and continuing with the remaining URLs.
2024-06-11 22:06:14 +07:00
Yash-1511
83f9475584 feat: add DuckDuckGo search functionality using duckduckgo_search library 2024-06-11 19:49:08 +05:30
teampen
14d33f0fcc Merge branch 'add-serply' into dev 2024-06-09 21:40:50 -04:00
teampen
efb4a710c8 adding Serply as an alternative web search 2024-06-09 20:44:34 -04:00
mindspawn
6f9148ac4c
Update main.py 2024-06-07 21:41:30 -07:00
mindspawn
4ecc1c06d3
Update main.py 2024-06-07 21:18:04 -07:00
Timothy J. Baek
0495f01acb feat: reset upload dir 2024-06-03 21:45:36 -07:00
Jun Siang Cheah
0cb8163321 feat: add RAG_EMBEDDING_OPENAI_BATCH_SIZE to batch multiple embeddings 2024-06-02 15:34:31 +01:00
Timothy J. Baek
a53796270f refac: web search config 2024-06-01 20:08:08 -07:00
Timothy J. Baek
fbdfb7e4fa refac: web search 2024-06-01 19:57:00 -07:00
Timothy J. Baek
999d2bc21b refac: web search 2024-06-01 19:52:12 -07:00
Timothy J. Baek
912a704fdc refac: web search settings 2024-06-01 19:40:48 -07:00
Timothy J. Baek
ea6b8984ab refac: web search 2024-06-01 19:03:56 -07:00
Timothy J. Baek
74a8deb19f refac 2024-05-27 14:25:36 -07:00
Timothy J. Baek
4685f523b6 refac 2024-05-27 12:48:08 -07:00
Jun Siang Cheah
276b7b90b8 Merge remote-tracking branch 'upstream/dev' into feat/backend-web-search 2024-05-26 11:31:23 +01:00
Timothy J. Baek
84bfebd05e fix 2024-05-26 01:17:57 -07:00
Jun Siang Cheah
224a578e6b Merge remote-tracking branch 'upstream/dev' into feat/backend-web-search 2024-05-20 19:53:23 +01:00
Jun Siang Cheah
eb509c460a Merge remote-tracking branch 'origin/dev' into feat/backend-web-search 2024-05-20 18:01:29 +01:00
Timothy J. Baek
322db31dc9 fix: rag 2024-05-20 07:22:43 -07:00
Timothy J. Baek
5376525777 refac 2024-05-19 06:51:32 -07:00
Timothy J. Baek
400bfa5a02 fix: rag config.json 2024-05-17 19:53:38 -07:00
Jun Siang Cheah
5e1c408937 Merge branch 'dev' into feat/backend-web-search 2024-05-14 14:03:23 +08:00