Commit Graph

230 Commits

Author SHA1 Message Date
Aarni Koskela
61bb1f1dc8 fix: do not use hardware ID in document ID generation 2024-05-07 11:42:05 +03:00
Timothy Jaeryang Baek
635951b55c
Merge branch 'dev' into feat/backend-web-search 2024-05-06 16:26:44 -07:00
Timothy J. Baek
64ed0d1089 refac: include source name to citation 2024-05-06 16:16:26 -07:00
Timothy J. Baek
4c490132ba refac: styling 2024-05-06 16:16:26 -07:00
Jun Siang Cheah
0872bea790 feat: show RAG query results as citations 2024-05-06 16:14:10 -07:00
Timothy J. Baek
cecb87b8c2 feat: web_loader_ssl_verification setting 2024-05-06 14:50:55 -07:00
Timothy J. Baek
95f579cabe feat: rag ssl verification env var
Co-Authored-By: Tobias Steidle <tobias.steidle@softwaredev.de>
2024-05-06 13:12:08 -07:00
Jun Siang Cheah
8b3e370a6e fix: run formatter 2024-05-06 17:11:04 +08:00
Jun Siang Cheah
83f086ccdd fix: do not return raw search exception due to API keys in URLs 2024-05-06 17:09:04 +08:00
Jun Siang Cheah
99e4edd364 feat: add websearch endpoint to RAG API
fix: google PSE endpoint uses GET

fix: google PSE returns link, not url

fix: serper wrong field
2024-05-06 17:09:04 +08:00
Jun Siang Cheah
501ff7a98b feat: backend implementation of various search APIs 2024-05-06 12:28:41 +08:00
tabacoWang
fffd283b0c fix:
fix: Change the type from int to float
2024-05-02 13:45:19 +08:00
Timothy J. Baek
0595c04909 feat: youtube rag 2024-05-01 17:17:00 -07:00
Yanyutin753
c0bb32d768 📌 fixed a bug where RAG would not reply after not reading the file correctly 2024-04-30 13:51:30 +08:00
Timothy Jaeryang Baek
1afc49c1e4
Merge pull request #1862 from cheahjs/feat/filter-local-rag-fetch
feat: add ENABLE_LOCAL_WEB_FETCH to protect against SSRF attacks
2024-04-29 15:51:17 -07:00
Jun Siang Cheah
1c4e63f71e feat: add ENABLE_LOCAL_WEB_FETCH to protect against SSRF attacks 2024-04-29 20:55:17 +01:00
Steven Kreitzer
5b8fd14470 fix: various api rag results 2024-04-29 12:17:36 -05:00
Yanyutin753
b0245a7eff feat added environment variables and sync.yml 2024-04-28 06:54:26 +08:00
Timothy J. Baek
ce9a5d12e0 refac: rag pipeline 2024-04-27 15:38:50 -04:00
Timothy J. Baek
8f1563a7a5 fix: typo 2024-04-27 15:03:49 -04:00
Timothy J. Baek
9be56d68e0 refac: naming convention 2024-04-27 15:02:57 -04:00
Timothy J. Baek
cebf733b9d refac: naming convention 2024-04-26 14:41:39 -04:00
Steven Kreitzer
69822e4c25 fix: sort ranking hybrid 2024-04-26 07:56:41 -05:00
Steven Kreitzer
9755cd5baa feat: toggle hybrid search 2024-04-25 17:51:38 -05:00
Timothy J. Baek
984dbf13ab revert: original rag pipeline 2024-04-25 17:03:00 -04:00
Steven Kreitzer
1c1d2c254d fix: query collection api call 2024-04-25 13:38:18 -05:00
Steven Kreitzer
72090fab88 chore: update log line 2024-04-25 13:28:31 -05:00
Steven Kreitzer
c9c9660459 fix: address comment in pr #1687 2024-04-25 07:50:42 -05:00
Steven Kreitzer
c0259aad67 feat: hybrid search and reranking support 2024-04-24 07:55:10 -05:00
Steven Kreitzer
4e0b32b505 feat: hybrid search 2024-04-22 18:33:43 -05:00
Steven Kreitzer
f3e5700d49 feat: move to native sentence_transformer 2024-04-22 14:20:41 -05:00
Timothy J. Baek
713934edb6 refac 2024-04-20 15:21:52 -05:00
Timothy J. Baek
710850e442 refac: audio 2024-04-20 15:15:59 -05:00
Timothy J. Baek
741ed5dc4c fix 2024-04-14 19:56:33 -04:00
Timothy J. Baek
b1b72441bb feat: openai embeddings integration 2024-04-14 19:48:15 -04:00
Timothy J. Baek
b48e73fa43 feat: openai embeddings support 2024-04-14 19:15:39 -04:00
Timothy J. Baek
36ce157907 fix: integration 2024-04-14 18:47:45 -04:00
Timothy J. Baek
9cdb5bf9fe feat: frontend integration 2024-04-14 18:31:40 -04:00
Timothy J. Baek
2952e61167 feat: external embeddings support 2024-04-14 17:55:00 -04:00
Timothy Jaeryang Baek
b9cadff16b
Merge pull request #1419 from lainedfles/embedding-model-fix-and-manual-update
feat: improve embedding model update & resolve network dependency
2024-04-10 01:10:07 -07:00
Timothy J. Baek
582d11f191 refac: RAG_EMBEDDING_MODEL_PATH removed 2024-04-10 00:59:05 -07:00
Timothy J. Baek
cb2158a794 fix 2024-04-10 00:51:16 -07:00
Timothy J. Baek
abfcceecef refac 2024-04-10 00:46:09 -07:00
Timothy J. Baek
f4b87ecb23 refac 2024-04-10 00:33:45 -07:00
Steven Kreitzer
0bae789d39
fix: support batching chromadb 2024-04-09 10:13:29 -05:00
lainedfles
506a061387
Merge branch 'dev' into embedding-model-fix-and-manual-update 2024-04-08 14:57:54 -06:00
Jannik S
3b3d0cce1e
Merge branch 'dev' into dockerfile-optimisation 2024-04-08 09:15:00 +02:00
Timothy J. Baek
e61e1b079f fix: file upload issue 2024-04-04 17:38:59 -07:00
Self Denial
9f82f5abba Formatting... 2024-04-04 12:09:48 -06:00
Self Denial
075fbedb02 More format fixes 2024-04-04 12:07:42 -06:00
Self Denial
bcf79c8366 Format fixes 2024-04-04 12:02:48 -06:00
Self Denial
3b66aa55c0 Improve embedding model update & resolve network dependency
* Add config variable RAG_EMBEDDING_MODEL_AUTO_UPDATE to control update behavior
* Add RAG utils embedding_model_get_path() function to output the filesystem path in addition to update of the model using huggingface_hub
* Update and utilize existing RAG functions in main: get_embedding_model() & update_embedding_model()
* Add GUI setting to execute manual update process
2024-04-04 11:01:23 -06:00
Mmx233
947c392f72
fix: manually check the docs' filename 2024-04-03 23:37:13 +08:00
Jannik Streidl
9bcb37ea10 fixes and updates 2024-04-02 14:47:52 +02:00
Jannik S
099b1d066b
Revert "Merge Updates & Dockerfile improvements" (#3)
This reverts commit 9763d885be.
2024-04-02 11:28:04 +02:00
lainedfles
9763d885be
Merge Updates & Dockerfile improvements 2024-04-02 11:25:20 +02:00
Timothy J. Baek
5558514ff1 fix 2024-04-01 15:23:12 -07:00
KoreLogic Disclosures
6c96361402
Suggested mitigation for KL-CAN-2024-002. 2024-04-01 15:55:14 -05:00
Timothy J. Baek
a6c154d839 feat: rag context logging 2024-03-31 14:02:31 -07:00
Self Denial
144c9059a3 Improve logging. Move print() statements to appropiate log().
Add COMFYUI and WEBHOOK logging and associated environment variable
control. Add WEBHOOK payload & request debug logs.
2024-03-31 13:17:29 -06:00
Timothy J. Baek
3688955c77 fix: encoding issue 2024-03-25 23:50:52 -07:00
Timothy J. Baek
6307adfba1 feat: better error handling 2024-03-25 23:47:08 -07:00
Doug Danat
c91a5d8b1f switch to using BeautifulSoup HTML loader so title is also captured 2024-03-25 11:26:18 +01:00
Doug Danat
784a6ec85e include html langchain loader for RAG 2024-03-25 09:50:53 +01:00
Timothy Jaeryang Baek
371dfc1143
Merge branch 'dev' into debug_print 2024-03-24 18:04:03 -05:00
Timothy J. Baek
ff8a55a861 refac: rag api 2024-03-24 00:41:41 -07:00
Timothy J. Baek
7e0ea8f77d feat: RAG text ingestion(store) api 2024-03-24 00:40:27 -07:00
Jannik Streidl
fdef2abdfb cuda fix 2024-03-22 12:48:48 +01:00
Self Denial
e6dd0bfbe0 Migrate to python logging module with env var control. 2024-03-20 17:11:36 -06:00
Jannik Streidl
1f6739337b docker improvements & changed universal device type env for different models used 2024-03-20 08:44:09 +01:00
Timothy J. Baek
91efd6cb63 fix: file upload encoding issue 2024-03-15 23:52:37 -07:00
Timothy J. Baek
072b499a50 fix: backslash rag content issue 2024-03-15 13:34:52 -07:00
Timothy J. Baek
8df6b137cb fix: rag 2024-03-10 18:40:50 -07:00
Timothy J. Baek
98948814fd feat: toggle pdf ocr 2024-03-10 13:32:34 -07:00
Timothy J. Baek
c49491e516 refac: rag to backend 2024-03-08 22:34:47 -08:00
Timothy J. Baek
7e5e2c42c9 refac: rag routes 2024-03-08 19:26:39 -08:00
Timothy J. Baek
b88c64f80e fix: ocr issue 2024-03-06 17:54:42 -08:00
Timothy J. Baek
bb98c10abb revert: ocr feature 2024-03-06 17:04:40 -08:00
Timothy Jaeryang Baek
8fb5f54751
Merge pull request #1050 from jannikstdl/rag-pdf-ocr
feat: added ocr functionality to the pdf loader
2024-03-06 00:45:33 -05:00
Jannik Streidl
089a63e0c6 feat: added ocr functionality to the pdf loader 2024-03-05 22:25:25 +01:00
Firat Birlik
6782e95c75 recreate rag collection is now optional and only used for web requests 2024-03-04 10:00:06 -06:00
Firat Birlik
5d4ff85228 recreate rag collection instead of falling back to stale version 2024-03-03 21:25:00 -06:00
Timothy J. Baek
47a05a47b4 feat: add rag top k value setting 2024-03-02 18:56:57 -08:00
Ased Mammad
b473ad574f fix: RAG scan unsupported mimetype
This fixes an issue with RAG that stops loading documents as soon
as it reaches a file with unsupported mimetype.
2024-02-23 14:27:31 +03:30
Timothy J. Baek
7c127c35fc feat: dynamic embedding model load 2024-02-19 11:05:45 -08:00
Jannik Streidl
acf999013b storing vectordb in project cache folder + device types 2024-02-19 07:51:17 +01:00
Timothy J. Baek
0cb0358485 refac: more descriptive var names 2024-02-18 11:16:10 -08:00
Jannik S
4b88e7e44f
Merge branch 'main' into choose-embedding-model 2024-02-18 09:20:54 +01:00
Jannik Streidl
bc3dd34d8b collection query fix 2024-02-18 09:17:43 +01:00
Timothy J. Baek
07b451995e feat: reset rag template 2024-02-17 22:49:18 -08:00
Timothy J. Baek
5270efa9e5 feat: editable rag template 2024-02-17 22:41:03 -08:00
Timothy J. Baek
ccf08fb91e feat: editable chunk params 2024-02-17 22:29:52 -08:00
Timothy J. Baek
a94e4161f7 fix: file content type issue 2024-02-17 21:31:46 -08:00
Timothy J. Baek
e07001e5f6 feat: rag folder scan support 2024-02-17 21:06:08 -08:00
Jannik Streidl
1846c1e80d choose embedding model when using docker 2024-02-17 19:38:29 +01:00
Tim Farrell
08e8e922fd Endpoint role-checking was redundantly applied but FastAPI provides a nice abstraction mechanic...so I applied it. There should be no logical changes in this code; only simpler, cleaner ways for doing the same thing. 2024-02-08 18:05:01 -06:00
Timothy J. Baek
683650ec00 feat: collection rag integration 2024-02-03 15:57:06 -08:00
Timothy J. Baek
00803c92f2 feat: doc tagging 2024-02-03 14:44:49 -08:00
Timothy J. Baek
50f7b20ac2 refac 2024-02-01 13:35:41 -08:00
Timothy J. Baek
28226a6f97 feat: web rag support 2024-01-26 22:17:28 -08:00
Timothy J. Baek
4e468dc58c refac 2024-01-25 00:24:49 -08:00
Timothy Jaeryang Baek
fa5918ad13
Merge branch 'main' into main 2024-01-25 00:13:12 -08:00
Marclass
8bfda730d9 add excel document support 2024-01-23 14:03:22 -07:00
Timothy Jaeryang Baek
ca943d0795
Merge pull request #549 from Marclass/main
Bugfix: Fix toast error popup when front end can't figure out file type.
2024-01-22 23:13:53 -08:00
Timothy Jaeryang Baek
7054f02891
Merge pull request #466 from baumandm/feat/epub-support
feat: Add epub support
2024-01-22 23:12:46 -08:00
Marclass
7eea3ef313 copy list of file ext from backend to front end 2024-01-23 00:00:07 -07:00
Marclass
35ace57784 add rst document for RAG 2024-01-19 10:48:04 -07:00
Dave Bauman
f559068186
feat: Add epub support 2024-01-19 12:23:59 -05:00
Marclass
aa1d386042 Allow any file to be used for RAG.
Changed RAG parser to prefer file extensions over MIME content types. If the type of file is not recognized assume it's a text file.
2024-01-18 20:41:14 -07:00
Marclass
6070e6bcd1 add svelte type to RAG 2024-01-17 20:10:34 -07:00
Marclass
cf6b3fa48a remove html type and add js/css 2024-01-17 00:34:22 -07:00
Marclass
43d8466677 feat: Add RAG support for various programming languages
Enables RAG for golang, python, java, sh, bat, powershell, cmd, js, css, c/c++/c#, sql, logs, ini, perl, r, dart, docker, env, php, haskell, lua, conf, plsql, ruby, db2, scalla, bash, swift, vue, html, xml, and other arbitrary text files.
2024-01-17 00:09:47 -07:00
Timothy J. Baek
c1ec604f21 feat: rag md support 2024-01-09 15:24:53 -08:00
Timothy J. Baek
54c4e0761a feat: documents file upload 2024-01-08 01:26:15 -08:00
Timothy J. Baek
57c050326c feat: docx support 2024-01-07 13:56:01 -08:00
Timothy J. Baek
9a63376e55 feat: file upload error handling 2024-01-07 09:33:34 -08:00
Timothy J. Baek
b37b157638 feat: reset vectordb storage support 2024-01-07 09:15:45 -08:00
Timothy J. Baek
d4b2578f6e feat: rag csv support 2024-01-07 09:05:52 -08:00
Timothy J. Baek
d6a1bf1406 refac: file upload 2024-01-07 09:00:30 -08:00
Timothy J. Baek
ffd0a5a2a0 Update main.py 2024-01-07 08:34:05 -08:00
Timothy J. Baek
c68bb3b950 docker: slim 2024-01-07 08:28:35 -08:00
Timothy J. Baek
464d0fb016 fix: update langchain.document_loaders 2024-01-07 02:49:13 -08:00
Timothy J. Baek
70d2571be1 feat: rag backend auth 2024-01-07 02:46:12 -08:00
Timothy J. Baek
142269374f feat: vectordb query error handling 2024-01-07 01:59:00 -08:00
Timothy J. Baek
ad3d69be30 refac 2024-01-07 01:54:58 -08:00
Timothy J. Baek
9634e2da3e feat: full integration 2024-01-07 01:40:36 -08:00
Timothy J. Baek
fef4725d56 feat: frontend file upload support 2024-01-07 00:57:10 -08:00
Timothy J. Baek
cd86c36953 feat: pdf data load 2024-01-06 23:40:51 -08:00
Timothy J. Baek
784b369cc9 feat: chromadb vector store api 2024-01-06 22:59:22 -08:00
Timothy J. Baek
b2c9f6dff8 feat: rag api endpoint 2024-01-06 22:07:20 -08:00