Commit Graph

118 Commits

Author SHA1 Message Date
Doug Danat
c91a5d8b1f switch to using BeautifulSoup HTML loader so title is also captured 2024-03-25 11:26:18 +01:00
Doug Danat
784a6ec85e include html langchain loader for RAG 2024-03-25 09:50:53 +01:00
Timothy Jaeryang Baek
371dfc1143
Merge branch 'dev' into debug_print 2024-03-24 18:04:03 -05:00
Timothy J. Baek
ff8a55a861 refac: rag api 2024-03-24 00:41:41 -07:00
Timothy J. Baek
7e0ea8f77d feat: RAG text ingestion(store) api 2024-03-24 00:40:27 -07:00
Jannik Streidl
fdef2abdfb cuda fix 2024-03-22 12:48:48 +01:00
Self Denial
e6dd0bfbe0 Migrate to python logging module with env var control. 2024-03-20 17:11:36 -06:00
Jannik Streidl
1f6739337b docker improvements & changed universal device type env for different models used 2024-03-20 08:44:09 +01:00
Timothy J. Baek
91efd6cb63 fix: file upload encoding issue 2024-03-15 23:52:37 -07:00
Timothy J. Baek
072b499a50 fix: backslash rag content issue 2024-03-15 13:34:52 -07:00
Timothy J. Baek
8df6b137cb fix: rag 2024-03-10 18:40:50 -07:00
Timothy J. Baek
98948814fd feat: toggle pdf ocr 2024-03-10 13:32:34 -07:00
Timothy J. Baek
c49491e516 refac: rag to backend 2024-03-08 22:34:47 -08:00
Timothy J. Baek
7e5e2c42c9 refac: rag routes 2024-03-08 19:26:39 -08:00
Timothy J. Baek
b88c64f80e fix: ocr issue 2024-03-06 17:54:42 -08:00
Timothy J. Baek
bb98c10abb revert: ocr feature 2024-03-06 17:04:40 -08:00
Timothy Jaeryang Baek
8fb5f54751
Merge pull request #1050 from jannikstdl/rag-pdf-ocr
feat: added ocr functionality to the pdf loader
2024-03-06 00:45:33 -05:00
Jannik Streidl
089a63e0c6 feat: added ocr functionality to the pdf loader 2024-03-05 22:25:25 +01:00
Firat Birlik
6782e95c75 recreate rag collection is now optional and only used for web requests 2024-03-04 10:00:06 -06:00
Firat Birlik
5d4ff85228 recreate rag collection instead of falling back to stale version 2024-03-03 21:25:00 -06:00
Timothy J. Baek
47a05a47b4 feat: add rag top k value setting 2024-03-02 18:56:57 -08:00
Ased Mammad
b473ad574f fix: RAG scan unsupported mimetype
This fixes an issue with RAG that stops loading documents as soon
as it reaches a file with unsupported mimetype.
2024-02-23 14:27:31 +03:30
Timothy J. Baek
7c127c35fc feat: dynamic embedding model load 2024-02-19 11:05:45 -08:00
Jannik Streidl
acf999013b storing vectordb in project cache folder + device types 2024-02-19 07:51:17 +01:00
Timothy J. Baek
0cb0358485 refac: more descriptive var names 2024-02-18 11:16:10 -08:00
Jannik S
4b88e7e44f
Merge branch 'main' into choose-embedding-model 2024-02-18 09:20:54 +01:00
Jannik Streidl
bc3dd34d8b collection query fix 2024-02-18 09:17:43 +01:00
Timothy J. Baek
07b451995e feat: reset rag template 2024-02-17 22:49:18 -08:00
Timothy J. Baek
5270efa9e5 feat: editable rag template 2024-02-17 22:41:03 -08:00
Timothy J. Baek
ccf08fb91e feat: editable chunk params 2024-02-17 22:29:52 -08:00
Timothy J. Baek
a94e4161f7 fix: file content type issue 2024-02-17 21:31:46 -08:00
Timothy J. Baek
e07001e5f6 feat: rag folder scan support 2024-02-17 21:06:08 -08:00
Jannik Streidl
1846c1e80d choose embedding model when using docker 2024-02-17 19:38:29 +01:00
Tim Farrell
08e8e922fd Endpoint role-checking was redundantly applied but FastAPI provides a nice abstraction mechanic...so I applied it. There should be no logical changes in this code; only simpler, cleaner ways for doing the same thing. 2024-02-08 18:05:01 -06:00
Timothy J. Baek
683650ec00 feat: collection rag integration 2024-02-03 15:57:06 -08:00
Timothy J. Baek
00803c92f2 feat: doc tagging 2024-02-03 14:44:49 -08:00
Timothy J. Baek
50f7b20ac2 refac 2024-02-01 13:35:41 -08:00
Timothy J. Baek
28226a6f97 feat: web rag support 2024-01-26 22:17:28 -08:00
Timothy J. Baek
4e468dc58c refac 2024-01-25 00:24:49 -08:00
Timothy Jaeryang Baek
fa5918ad13
Merge branch 'main' into main 2024-01-25 00:13:12 -08:00
Marclass
8bfda730d9 add excel document support 2024-01-23 14:03:22 -07:00
Timothy Jaeryang Baek
ca943d0795
Merge pull request #549 from Marclass/main
Bugfix: Fix toast error popup when front end can't figure out file type.
2024-01-22 23:13:53 -08:00
Timothy Jaeryang Baek
7054f02891
Merge pull request #466 from baumandm/feat/epub-support
feat: Add epub support
2024-01-22 23:12:46 -08:00
Marclass
7eea3ef313 copy list of file ext from backend to front end 2024-01-23 00:00:07 -07:00
Marclass
35ace57784 add rst document for RAG 2024-01-19 10:48:04 -07:00
Dave Bauman
f559068186
feat: Add epub support 2024-01-19 12:23:59 -05:00
Marclass
aa1d386042 Allow any file to be used for RAG.
Changed RAG parser to prefer file extensions over MIME content types. If the type of file is not recognized assume it's a text file.
2024-01-18 20:41:14 -07:00
Marclass
6070e6bcd1 add svelte type to RAG 2024-01-17 20:10:34 -07:00
Marclass
cf6b3fa48a remove html type and add js/css 2024-01-17 00:34:22 -07:00
Marclass
43d8466677 feat: Add RAG support for various programming languages
Enables RAG for golang, python, java, sh, bat, powershell, cmd, js, css, c/c++/c#, sql, logs, ini, perl, r, dart, docker, env, php, haskell, lua, conf, plsql, ruby, db2, scalla, bash, swift, vue, html, xml, and other arbitrary text files.
2024-01-17 00:09:47 -07:00
Timothy J. Baek
c1ec604f21 feat: rag md support 2024-01-09 15:24:53 -08:00
Timothy J. Baek
54c4e0761a feat: documents file upload 2024-01-08 01:26:15 -08:00
Timothy J. Baek
57c050326c feat: docx support 2024-01-07 13:56:01 -08:00
Timothy J. Baek
9a63376e55 feat: file upload error handling 2024-01-07 09:33:34 -08:00
Timothy J. Baek
b37b157638 feat: reset vectordb storage support 2024-01-07 09:15:45 -08:00
Timothy J. Baek
d4b2578f6e feat: rag csv support 2024-01-07 09:05:52 -08:00
Timothy J. Baek
d6a1bf1406 refac: file upload 2024-01-07 09:00:30 -08:00
Timothy J. Baek
ffd0a5a2a0 Update main.py 2024-01-07 08:34:05 -08:00
Timothy J. Baek
c68bb3b950 docker: slim 2024-01-07 08:28:35 -08:00
Timothy J. Baek
464d0fb016 fix: update langchain.document_loaders 2024-01-07 02:49:13 -08:00
Timothy J. Baek
70d2571be1 feat: rag backend auth 2024-01-07 02:46:12 -08:00
Timothy J. Baek
142269374f feat: vectordb query error handling 2024-01-07 01:59:00 -08:00
Timothy J. Baek
ad3d69be30 refac 2024-01-07 01:54:58 -08:00
Timothy J. Baek
9634e2da3e feat: full integration 2024-01-07 01:40:36 -08:00
Timothy J. Baek
fef4725d56 feat: frontend file upload support 2024-01-07 00:57:10 -08:00
Timothy J. Baek
cd86c36953 feat: pdf data load 2024-01-06 23:40:51 -08:00
Timothy J. Baek
784b369cc9 feat: chromadb vector store api 2024-01-06 22:59:22 -08:00
Timothy J. Baek
b2c9f6dff8 feat: rag api endpoint 2024-01-06 22:07:20 -08:00