Commit Graph

146 Commits

Author SHA1 Message Date
Timothy J. Baek
b1b72441bb feat: openai embeddings integration 2024-04-14 19:48:15 -04:00
Timothy J. Baek
b48e73fa43 feat: openai embeddings support 2024-04-14 19:15:39 -04:00
Timothy J. Baek
36ce157907 fix: integration 2024-04-14 18:47:45 -04:00
Timothy J. Baek
9cdb5bf9fe feat: frontend integration 2024-04-14 18:31:40 -04:00
Timothy J. Baek
2952e61167 feat: external embeddings support 2024-04-14 17:55:00 -04:00
Timothy Jaeryang Baek
b9cadff16b
Merge pull request #1419 from lainedfles/embedding-model-fix-and-manual-update
feat: improve embedding model update & resolve network dependency
2024-04-10 01:10:07 -07:00
Timothy J. Baek
582d11f191 refac: RAG_EMBEDDING_MODEL_PATH removed 2024-04-10 00:59:05 -07:00
Timothy J. Baek
cb2158a794 fix 2024-04-10 00:51:16 -07:00
Timothy J. Baek
abfcceecef refac 2024-04-10 00:46:09 -07:00
Timothy J. Baek
f4b87ecb23 refac 2024-04-10 00:33:45 -07:00
Steven Kreitzer
0bae789d39
fix: support batching chromadb 2024-04-09 10:13:29 -05:00
lainedfles
506a061387
Merge branch 'dev' into embedding-model-fix-and-manual-update 2024-04-08 14:57:54 -06:00
Jannik S
3b3d0cce1e
Merge branch 'dev' into dockerfile-optimisation 2024-04-08 09:15:00 +02:00
Timothy J. Baek
e61e1b079f fix: file upload issue 2024-04-04 17:38:59 -07:00
Self Denial
9f82f5abba Formatting... 2024-04-04 12:09:48 -06:00
Self Denial
075fbedb02 More format fixes 2024-04-04 12:07:42 -06:00
Self Denial
bcf79c8366 Format fixes 2024-04-04 12:02:48 -06:00
Self Denial
3b66aa55c0 Improve embedding model update & resolve network dependency
* Add config variable RAG_EMBEDDING_MODEL_AUTO_UPDATE to control update behavior
* Add RAG utils embedding_model_get_path() function to output the filesystem path in addition to update of the model using huggingface_hub
* Update and utilize existing RAG functions in main: get_embedding_model() & update_embedding_model()
* Add GUI setting to execute manual update process
2024-04-04 11:01:23 -06:00
Mmx233
947c392f72
fix: manually check the docs' filename 2024-04-03 23:37:13 +08:00
Jannik Streidl
9bcb37ea10 fixes and updates 2024-04-02 14:47:52 +02:00
Jannik S
099b1d066b
Revert "Merge Updates & Dockerfile improvements" (#3)
This reverts commit 9763d885be.
2024-04-02 11:28:04 +02:00
lainedfles
9763d885be
Merge Updates & Dockerfile improvements 2024-04-02 11:25:20 +02:00
Timothy J. Baek
5558514ff1 fix 2024-04-01 15:23:12 -07:00
KoreLogic Disclosures
6c96361402
Suggested mitigation for KL-CAN-2024-002. 2024-04-01 15:55:14 -05:00
Timothy J. Baek
a6c154d839 feat: rag context logging 2024-03-31 14:02:31 -07:00
Self Denial
144c9059a3 Improve logging. Move print() statements to appropiate log().
Add COMFYUI and WEBHOOK logging and associated environment variable
control. Add WEBHOOK payload & request debug logs.
2024-03-31 13:17:29 -06:00
Timothy J. Baek
3688955c77 fix: encoding issue 2024-03-25 23:50:52 -07:00
Timothy J. Baek
6307adfba1 feat: better error handling 2024-03-25 23:47:08 -07:00
Doug Danat
c91a5d8b1f switch to using BeautifulSoup HTML loader so title is also captured 2024-03-25 11:26:18 +01:00
Doug Danat
784a6ec85e include html langchain loader for RAG 2024-03-25 09:50:53 +01:00
Timothy Jaeryang Baek
371dfc1143
Merge branch 'dev' into debug_print 2024-03-24 18:04:03 -05:00
Timothy J. Baek
ff8a55a861 refac: rag api 2024-03-24 00:41:41 -07:00
Timothy J. Baek
7e0ea8f77d feat: RAG text ingestion(store) api 2024-03-24 00:40:27 -07:00
Jannik Streidl
fdef2abdfb cuda fix 2024-03-22 12:48:48 +01:00
Self Denial
e6dd0bfbe0 Migrate to python logging module with env var control. 2024-03-20 17:11:36 -06:00
Jannik Streidl
1f6739337b docker improvements & changed universal device type env for different models used 2024-03-20 08:44:09 +01:00
Timothy J. Baek
91efd6cb63 fix: file upload encoding issue 2024-03-15 23:52:37 -07:00
Timothy J. Baek
072b499a50 fix: backslash rag content issue 2024-03-15 13:34:52 -07:00
Timothy J. Baek
8df6b137cb fix: rag 2024-03-10 18:40:50 -07:00
Timothy J. Baek
98948814fd feat: toggle pdf ocr 2024-03-10 13:32:34 -07:00
Timothy J. Baek
c49491e516 refac: rag to backend 2024-03-08 22:34:47 -08:00
Timothy J. Baek
7e5e2c42c9 refac: rag routes 2024-03-08 19:26:39 -08:00
Timothy J. Baek
b88c64f80e fix: ocr issue 2024-03-06 17:54:42 -08:00
Timothy J. Baek
bb98c10abb revert: ocr feature 2024-03-06 17:04:40 -08:00
Timothy Jaeryang Baek
8fb5f54751
Merge pull request #1050 from jannikstdl/rag-pdf-ocr
feat: added ocr functionality to the pdf loader
2024-03-06 00:45:33 -05:00
Jannik Streidl
089a63e0c6 feat: added ocr functionality to the pdf loader 2024-03-05 22:25:25 +01:00
Firat Birlik
6782e95c75 recreate rag collection is now optional and only used for web requests 2024-03-04 10:00:06 -06:00
Firat Birlik
5d4ff85228 recreate rag collection instead of falling back to stale version 2024-03-03 21:25:00 -06:00
Timothy J. Baek
47a05a47b4 feat: add rag top k value setting 2024-03-02 18:56:57 -08:00
Ased Mammad
b473ad574f fix: RAG scan unsupported mimetype
This fixes an issue with RAG that stops loading documents as soon
as it reaches a file with unsupported mimetype.
2024-02-23 14:27:31 +03:30
Timothy J. Baek
7c127c35fc feat: dynamic embedding model load 2024-02-19 11:05:45 -08:00
Jannik Streidl
acf999013b storing vectordb in project cache folder + device types 2024-02-19 07:51:17 +01:00
Timothy J. Baek
0cb0358485 refac: more descriptive var names 2024-02-18 11:16:10 -08:00
Jannik S
4b88e7e44f
Merge branch 'main' into choose-embedding-model 2024-02-18 09:20:54 +01:00
Jannik Streidl
bc3dd34d8b collection query fix 2024-02-18 09:17:43 +01:00
Timothy J. Baek
07b451995e feat: reset rag template 2024-02-17 22:49:18 -08:00
Timothy J. Baek
5270efa9e5 feat: editable rag template 2024-02-17 22:41:03 -08:00
Timothy J. Baek
ccf08fb91e feat: editable chunk params 2024-02-17 22:29:52 -08:00
Timothy J. Baek
a94e4161f7 fix: file content type issue 2024-02-17 21:31:46 -08:00
Timothy J. Baek
e07001e5f6 feat: rag folder scan support 2024-02-17 21:06:08 -08:00
Jannik Streidl
1846c1e80d choose embedding model when using docker 2024-02-17 19:38:29 +01:00
Tim Farrell
08e8e922fd Endpoint role-checking was redundantly applied but FastAPI provides a nice abstraction mechanic...so I applied it. There should be no logical changes in this code; only simpler, cleaner ways for doing the same thing. 2024-02-08 18:05:01 -06:00
Timothy J. Baek
683650ec00 feat: collection rag integration 2024-02-03 15:57:06 -08:00
Timothy J. Baek
00803c92f2 feat: doc tagging 2024-02-03 14:44:49 -08:00
Timothy J. Baek
50f7b20ac2 refac 2024-02-01 13:35:41 -08:00
Timothy J. Baek
28226a6f97 feat: web rag support 2024-01-26 22:17:28 -08:00
Timothy J. Baek
4e468dc58c refac 2024-01-25 00:24:49 -08:00
Timothy Jaeryang Baek
fa5918ad13
Merge branch 'main' into main 2024-01-25 00:13:12 -08:00
Marclass
8bfda730d9 add excel document support 2024-01-23 14:03:22 -07:00
Timothy Jaeryang Baek
ca943d0795
Merge pull request #549 from Marclass/main
Bugfix: Fix toast error popup when front end can't figure out file type.
2024-01-22 23:13:53 -08:00
Timothy Jaeryang Baek
7054f02891
Merge pull request #466 from baumandm/feat/epub-support
feat: Add epub support
2024-01-22 23:12:46 -08:00
Marclass
7eea3ef313 copy list of file ext from backend to front end 2024-01-23 00:00:07 -07:00
Marclass
35ace57784 add rst document for RAG 2024-01-19 10:48:04 -07:00
Dave Bauman
f559068186
feat: Add epub support 2024-01-19 12:23:59 -05:00
Marclass
aa1d386042 Allow any file to be used for RAG.
Changed RAG parser to prefer file extensions over MIME content types. If the type of file is not recognized assume it's a text file.
2024-01-18 20:41:14 -07:00
Marclass
6070e6bcd1 add svelte type to RAG 2024-01-17 20:10:34 -07:00
Marclass
cf6b3fa48a remove html type and add js/css 2024-01-17 00:34:22 -07:00
Marclass
43d8466677 feat: Add RAG support for various programming languages
Enables RAG for golang, python, java, sh, bat, powershell, cmd, js, css, c/c++/c#, sql, logs, ini, perl, r, dart, docker, env, php, haskell, lua, conf, plsql, ruby, db2, scalla, bash, swift, vue, html, xml, and other arbitrary text files.
2024-01-17 00:09:47 -07:00
Timothy J. Baek
c1ec604f21 feat: rag md support 2024-01-09 15:24:53 -08:00
Timothy J. Baek
54c4e0761a feat: documents file upload 2024-01-08 01:26:15 -08:00
Timothy J. Baek
57c050326c feat: docx support 2024-01-07 13:56:01 -08:00
Timothy J. Baek
9a63376e55 feat: file upload error handling 2024-01-07 09:33:34 -08:00
Timothy J. Baek
b37b157638 feat: reset vectordb storage support 2024-01-07 09:15:45 -08:00
Timothy J. Baek
d4b2578f6e feat: rag csv support 2024-01-07 09:05:52 -08:00
Timothy J. Baek
d6a1bf1406 refac: file upload 2024-01-07 09:00:30 -08:00
Timothy J. Baek
ffd0a5a2a0 Update main.py 2024-01-07 08:34:05 -08:00
Timothy J. Baek
c68bb3b950 docker: slim 2024-01-07 08:28:35 -08:00
Timothy J. Baek
464d0fb016 fix: update langchain.document_loaders 2024-01-07 02:49:13 -08:00
Timothy J. Baek
70d2571be1 feat: rag backend auth 2024-01-07 02:46:12 -08:00
Timothy J. Baek
142269374f feat: vectordb query error handling 2024-01-07 01:59:00 -08:00
Timothy J. Baek
ad3d69be30 refac 2024-01-07 01:54:58 -08:00
Timothy J. Baek
9634e2da3e feat: full integration 2024-01-07 01:40:36 -08:00
Timothy J. Baek
fef4725d56 feat: frontend file upload support 2024-01-07 00:57:10 -08:00
Timothy J. Baek
cd86c36953 feat: pdf data load 2024-01-06 23:40:51 -08:00
Timothy J. Baek
784b369cc9 feat: chromadb vector store api 2024-01-06 22:59:22 -08:00
Timothy J. Baek
b2c9f6dff8 feat: rag api endpoint 2024-01-06 22:07:20 -08:00