Commit Graph

207 Commits

Author SHA1 Message Date
Timothy J. Baek
c7a9b5ccfa refac: chat completion middleware 2024-07-01 19:33:58 -07:00
Timothy J. Baek
a392865615 refac 2024-07-01 17:11:09 -07:00
Timothy Jaeryang Baek
3c1ea24374
Merge pull request #3582 from nickovs/tika-document-text
feat: Support Tika for document text extraction
2024-07-01 17:07:40 -07:00
Nicko van Someren
7aa35a3757 Added HTML and Typescript UI components to support configration of text extraction engine.
Updated RAG /config and /config/update endpoints to support UI updates.

Fixed .dockerignore to prevent Python venv from being copied into Docker image.
2024-07-01 12:10:59 -06:00
Jun Siang Cheah
a48ac6a209 refac: lazily load sentence_transformers to reduce start up memory usage 2024-07-01 08:13:56 +08:00
Nicko van Someren
9cf622d981 Added support for using Apache Tika as a document loader.
Added persistent configuration options to configure use and location of Tika service.

Updated backend.apps.rag.main:get_loader() to make use of Tika document loader.
2024-06-30 15:49:15 -06:00
Timothy J. Baek
3f5f410453 refac 2024-06-27 11:29:59 -07:00
Timothy J. Baek
6ee94c5e97 chore: format 2024-06-22 16:15:19 -07:00
Timothy Jaeryang Baek
fd96c9c68d
Merge pull request #3380 from Yash-1511/main
feat: add jina_search as new websearch provider
2024-06-22 15:19:38 -07:00
Yash-1511
7c9fb9199e feat: add jina_search as new websearch provider 2024-06-22 20:06:15 +05:30
Que Nguyen
9e87012489
Fix: Rename 'whitelist' to 'filter_list' in function 2024-06-19 18:22:29 +07:00
Timothy J. Baek
20e4f6cc16 refac 2024-06-18 14:55:18 -07:00
Timothy J. Baek
83986620ee refac 2024-06-18 14:15:08 -07:00
Timothy J. Baek
9e7b7a895e refac: file upload 2024-06-18 13:50:18 -07:00
Timothy J. Baek
b1d83fc42c chore: format 2024-06-17 14:32:23 -07:00
Que Nguyen
c487385980
Set filter_list as optional param in serpstack.py 2024-06-17 14:38:11 +07:00
Que Nguyen
bcb84235b1
Set filter_list as optional param in serply.py 2024-06-17 14:37:52 +07:00
Que Nguyen
6b8290fa6d
Set filter_list as optional param in serper.py 2024-06-17 14:37:26 +07:00
Que Nguyen
9c446d9fb4
Set filter_list as optional param in searxng.py 2024-06-17 14:36:56 +07:00
Que Nguyen
3cc0e3ecb6
Refactor rag/main.py
Renamed function get_filtered_results
2024-06-17 14:36:26 +07:00
Que Nguyen
d8beed13b4
Set filter_list as optional param in google_pse.py 2024-06-17 14:35:27 +07:00
Que Nguyen
7d2ad8c4bf
Set filter_list as optional param in duckduckgo.py 2024-06-17 14:34:59 +07:00
Que Nguyen
a02139ba9d
Set filter_list as optional param in brave.py 2024-06-17 14:34:17 +07:00
Que Nguyen
a3ac9ee774
Refactor main.py
Rename RAG_WEB_SEARCH_WHITE_LIST_DOMAINS to RAG_WEB_SEARCH_DOMAIN_FILTER_LIST
2024-06-17 14:31:44 +07:00
Que Nguyen
a02ba52de8
Merge branch 'dev' into searxng 2024-06-15 23:44:31 +07:00
Yash-1511
b9da72560a feat: add tavily web search in web search provider 2024-06-14 20:44:11 +05:30
Que Nguyen
7b5f434a07
Implement domain whitelisting for web search results 2024-06-13 07:14:48 +07:00
Timothy J. Baek
c794d59fd5 revert: do not change the default 2024-06-12 11:47:19 -07:00
Timothy Jaeryang Baek
90dadf0bec
Merge pull request #3073 from que-nguyen/searxng
Set searxng language to auto and enable safesearch (moderate).
2024-06-12 11:26:10 -07:00
Timothy J. Baek
1163745a03 revert 2024-06-12 11:08:05 -07:00
Que Nguyen
305ec59d76
Set searxng language as 'auto' and enable safesearch (moderate).
Configure searxng with language param set to auto and add "safesearch": 1 (moderate) for safer web results.
2024-06-12 21:33:33 +07:00
Timothy J. Baek
c0ca447041 chore: format 2024-06-12 01:37:53 -07:00
Timothy Jaeryang Baek
5d3db15eca
Merge pull request #3049 from que-nguyen/dev
Refactor URL validation function
2024-06-12 01:36:34 -07:00
Timothy J. Baek
e8fc522eba chore: format 2024-06-12 00:18:22 -07:00
Que Nguyen
eb7bba81fe
Refactor URL validation function
- The check for private IP addresses often did not yield the expected results, especially with errors like: `[Errno -2] Name or service not known`.
- Removed the check for private IP addresses in the URL validation process.
- Simplified the `validate_url` function to focus on validating the URL format and checking the existence of the URL using a HEAD request.
2024-06-12 08:15:04 +07:00
Timothy Jaeryang Baek
d709038b5b
Merge pull request #3029 from Yash-1511/main
feat: add DuckDuckGo search functionality using duckduckgo_search library
2024-06-11 09:53:26 -07:00
Que Nguyen
3bec60b80c
Fixed the issue where a single URL error disrupts the data loading process in Web Search mode
To address the unresolved issue in the LangChain library where a single URL error disrupts the data loading process, the lazy_load method in the WebBaseLoader class has been modified. The enhanced method now handles exceptions appropriately, logging errors and continuing with the remaining URLs.
2024-06-11 22:06:14 +07:00
Yash-1511
83f9475584 feat: add DuckDuckGo search functionality using duckduckgo_search library 2024-06-11 19:49:08 +05:30
Timothy J. Baek
bd5a8567ef refac: tools & rag 2024-06-11 01:10:24 -07:00
Timothy J. Baek
644f0fe6c3 chore: version bump 2024-06-10 13:52:35 -07:00
teampen
14d33f0fcc Merge branch 'add-serply' into dev 2024-06-09 21:40:50 -04:00
teampen
efb4a710c8 adding Serply as an alternative web search 2024-06-09 20:44:34 -04:00
Timothy J. Baek
f2b9a5f5bf refac: rag 2024-06-09 03:01:25 -07:00
mindspawn
6f9148ac4c
Update main.py 2024-06-07 21:41:30 -07:00
mindspawn
4ecc1c06d3
Update main.py 2024-06-07 21:18:04 -07:00
Timothy J. Baek
0495f01acb feat: reset upload dir 2024-06-03 21:45:36 -07:00
Timothy J. Baek
61867c1545 Update searxng.py 2024-06-03 17:02:50 -07:00
Timothy J. Baek
4068a421bf fix 2024-06-03 17:00:35 -07:00
Timothy Jaeryang Baek
768941bded
Merge pull request #2785 from cheahjs/feat/openai-embeddings-batch
feat: add RAG_EMBEDDING_OPENAI_BATCH_SIZE to batch multiple embeddings
2024-06-03 13:50:14 -07:00
Jun Siang Cheah
7fefbb316d fix: add backwards compat with older searxng urls 2024-06-03 21:13:10 +01:00