Timothy Jaeryang Baek
7490bc9100
Merge branch 'dev' into fix-db-order
2025-03-26 20:55:42 -07:00
Timothy Jaeryang Baek
9d834a8e90
Merge branch 'dev' into k_reranker
2025-03-26 20:50:31 -07:00
Marko Henning
7531b7dcaa
Satisfy github format check
2025-03-25 19:09:17 +01:00
Iván Baldo
115e46a6a2
Fix: Tika 3.1.0.0 sends a lot of blank lines which degrades the RAG results, strip them.
2025-03-25 14:53:14 -03:00
Marko Henning
94d9d3d590
Fix: Normalze all database distances to score in [0, 1]
2025-03-25 16:46:14 +01:00
Timothy Jaeryang Baek
38d524f6a0
chore: format
2025-03-24 11:35:32 -07:00
Jonathan Flower
bdd236fa3a
improved error handling for deleting collections that do not exist in chromadb
2025-03-22 09:59:06 -04:00
Timothy Jaeryang Baek
8aa6dade41
Merge pull request #11876 from mahenning/fix--rag-sorting
...
Fix: wrong citation order for chromadb, wrong order for hybrid search
2025-03-20 17:54:22 -07:00
Timothy Jaeryang Baek
9b20ef4922
refac
2025-03-20 14:01:47 -07:00
genjuro
07098c6352
perf: set shorter timeout for playwright and make it configurable
2025-03-20 15:28:09 +08:00
Marko Henning
5f48af5b91
Revert the ordering change with chromadb, not necessary with reranker results
2025-03-19 17:04:45 +01:00
Marko Henning
ec8fc727b8
Fix wrong order for chromadb
2025-03-19 16:06:10 +01:00
leilibj
3e8546135d
fix: correct incorrect usage of log.exception method
2025-03-19 13:04:34 +08:00
Marko Henning
5ab789e83e
Add documentation on chroma special case
2025-03-18 16:44:58 +01:00
Marko Henning
ba676b7ed6
Use k_reranker also for result merge, and add special sorting use case for ChromaDB
2025-03-18 16:25:24 +01:00
Marko Henning
f13948d805
Fixed typo
2025-03-18 12:14:59 +01:00
Marko Henning
c877b59cbc
Address edge case with k < k_reranker, sort results for cutting off
2025-03-18 11:31:17 +01:00
orenzhang
c761e4fd08
feat(trace): opentelemetry instrument
2025-03-10 22:27:31 +08:00
Fabio Polito
9d6743824e
fix: fix params DoclingLoader
2025-03-09 16:12:14 +00:00
Fabio Polito
0aa42615f9
Merge remote-tracking branch 'upstream/dev' into docling_context_extraction_engine
...
merge upstream
2025-03-08 18:52:51 +00:00
Timothy Jaeryang Baek
22b88f9593
Merge pull request #11324 from kela4/main
...
fix: opensearch vector db query structures, result mapping, filters, bulk query actions, knn_vector usage
2025-03-08 12:19:38 -04:00
Luke
7917128ed3
enh: enable configuration for tavily extract depth
2025-03-08 00:43:02 -05:00
Fabio Polito
e3eef58310
feat: merge with dev
2025-03-07 00:22:47 +00:00
Luke
987954c817
feat: Add Tavily extract web loader integration
2025-03-06 18:15:18 -05:00
Katharina
6cb0c0339a
fix: opensearch vector db query structures, result mapping, filters, bulk query actions, knn_vector usage
2025-03-06 23:49:54 +01:00
Fabio Polito
98857184ff
Merge remote-tracking branch 'upstream/dev' into docling_context_extraction_engine
...
merge with dev branch
2025-03-06 12:12:50 +00:00
Marko Henning
41a4cf7106
Added new k_reranker parameter
2025-03-06 10:47:57 +01:00
Timothy Jaeryang Baek
d4fca9dabf
chore: format
2025-03-05 19:17:41 -08:00
Fabio Polito
0716f96da8
style: change style in DoclingLoader
2025-03-05 23:15:55 +00:00
Fabio Polito
9aa407dbd2
feat: merge with main
2025-03-05 22:04:34 +00:00
ofek
a8f205213c
fixed es bugs
2025-03-05 23:19:56 +02:00
Fabio Polito
a44b35e99e
fix: fix DoclingLoader input params
2025-03-05 17:53:45 +00:00
Timothy Jaeryang Baek
7b442e4be0
Merge pull request #11141 from Youggls/dev
...
fix: correct parameter name for MilvusClient instantiation
2025-03-04 00:54:49 -08:00
Timothy Jaeryang Baek
39ea59edc8
chore: format
2025-03-04 00:32:27 -08:00
Perry Li
67ed61d022
fixbug: correct parameter name for MilvusClient instantiation
...
Replace incorrect parameter 'database=MILVUS_DB' with valid 'db_name=MILVUS_DB'
2025-03-04 16:02:19 +08:00
ofek
737dfd2763
added elasticsearch support
2025-03-03 23:39:42 +02:00
Timothy Jaeryang Baek
6471f12668
Merge pull request #11033 from dtaivpp/main
...
fix: Changed to use collection_name and fixed bulk indexing missing index.
2025-03-01 16:00:13 -08:00
David Tippett
f3c4c2b8e3
Changed to use colleciton name and fixed bulk indexing missing index.
2025-03-01 13:26:19 -05:00
Timothy Jaeryang Baek
d0ddb0637e
enh: web embed bypass embedding and retrieval support
2025-02-27 16:34:05 -08:00
Timothy Jaeryang Baek
1b56a8f3cb
Merge pull request #10864 from kurtdami/perplexity_integration
...
feat: add perplexity integration to web search
2025-02-27 13:51:03 -08:00
kurtdami
b061775932
feat: add perplexity integration to web search
2025-02-27 00:30:48 -08:00
Timothy Jaeryang Baek
ce7cf62a55
refac: dedup
2025-02-26 23:51:39 -08:00
Timothy Jaeryang Baek
ddb30589e3
chore: format
...
HIDE MODELS
2025-02-26 22:18:18 -08:00
Timothy Jaeryang Baek
57010901e6
enh: bypass embedding and retrieval
2025-02-26 15:42:19 -08:00
Timothy Jaeryang Baek
34aeaaf020
refac
2025-02-26 13:54:26 -08:00
Timothy Jaeryang Baek
46ac6f2b29
fix
2025-02-26 12:53:07 -08:00
Timothy Jaeryang Baek
33d3558ca9
Merge pull request #10817 from NovoNordisk-OpenSource/ivaroli/adding-json-as-supported-file-type
...
fix: Using the TextLoader instead of Tika for JSON files
2025-02-26 12:49:29 -08:00
Ívar Óli Sigurðsson
c5a09cdd21
adding a comma
2025-02-26 15:27:03 +01:00
Ívar Óli Sigurðsson
661711164a
Adding json as a known source for Tika
2025-02-26 15:11:21 +01:00
Timothy Jaeryang Baek
3be5e3129b
Merge pull request #10752 from NovoNordisk-OpenSource/yvedeng/standardize-logging
...
refactor: replace print statements with logging
2025-02-25 10:53:02 -08:00
Yifang Deng
0e5d5ecb81
refactor: replace print statements with logging for better error tracking
2025-02-25 15:53:55 +01:00
Timothy Jaeryang Baek
ab1b910d80
Merge pull request #10486 from Micca/feature/document_intelligence_support
...
Feat: Adding Support for Azure AI Document Intelligence for Content Extraction (Revised)
2025-02-21 10:56:18 -08:00
Timothy Jaeryang Baek
93d486d50e
revert: faulty dedup code
2025-02-20 11:02:45 -08:00
Timothy Jaeryang Baek
eeb00a5ca2
chore: format
2025-02-20 01:01:29 -08:00
Youggls
0fb3c08181
feat: Add Firecrawl web loader integration
2025-02-19 16:54:44 +08:00
Timothy Jaeryang Baek
c073b8b4ee
refac
2025-02-18 23:49:27 -08:00
Timothy Jaeryang Baek
5465cabd40
refac
2025-02-18 21:17:09 -08:00
Timothy Jaeryang Baek
81715f6553
enh: RAG full context mode
2025-02-18 21:14:58 -08:00
Timothy Jaeryang Baek
1bbecd46c8
Merge pull request #10052 from roryeckel/playwright
...
Support Playwright RAG Web Loader: Revised
2025-02-18 19:57:48 -08:00
Timothy Jaeryang Baek
4ef7aff663
refac
2025-02-18 19:35:22 -08:00
mikhail-khludnev
925bfe840b
dedupe results from multiple queries
2025-02-18 20:10:57 +03:00
Rory
10e0c81de9
Merge remote-tracking branch 'upstream/dev' into playwright
...
# Conflicts:
# backend/open_webui/retrieval/web/utils.py
# backend/open_webui/routers/retrieval.py
2025-02-17 21:53:39 -06:00
Rory
bc82f48ebf
refac: RAG_WEB_LOADER -> RAG_WEB_LOADER_ENGINE
2025-02-17 21:43:32 -06:00
Timothy Jaeryang Baek
ba6cde8a87
fix: include_domain does NOT exist
2025-02-17 19:20:49 -08:00
Timothy Jaeryang Baek
dbe5d1ca08
refac
2025-02-17 18:16:23 -08:00
Timothy Jaeryang Baek
ca0b7217d2
enh: full context web search
2025-02-17 18:14:26 -08:00
Rory
66c2acc08d
Merge branch 'dev' into playwright
2025-02-15 22:14:16 -06:00
Timothy Jaeryang Baek
b0ad5cd863
Merge pull request #10076 from crizCraig/local_date
...
fix: return local date from `getFormattedDate`
2025-02-15 20:10:56 -08:00
Timothy Jaeryang Baek
3d0c06ccee
refac: duckduckgo
2025-02-15 16:45:56 -08:00
Craig Quiter
e67eb89e05
style: black format
2025-02-15 10:53:16 -08:00
Rory
8e9b00a017
Fix docstring
2025-02-14 22:48:15 -06:00
Rory
aa2b764d74
Finalize incomplete merge to update playwright branch
...
Introduced feature parity for trust_env
2025-02-14 22:32:45 -06:00
Rory
4da220c513
Merge remote-tracking branch 'upstream/dev' into playwright
...
# Conflicts:
# backend/open_webui/config.py
# backend/open_webui/main.py
# backend/open_webui/retrieval/web/utils.py
# backend/open_webui/routers/retrieval.py
# backend/open_webui/utils/middleware.py
# pyproject.toml
2025-02-14 20:48:22 -06:00
Guofeng Yi
b38acc8559
Merge branch 'dev' into feate-webloader-support-proxy
2025-02-15 09:50:02 +08:00
Timothy Jaeryang Baek
3e543691a4
Merge pull request #9988 from Yimi81/feat-support-async-load
...
feat: websearch support async docs load
2025-02-14 14:10:46 -08:00
LiuC0j
5ca39eb9fd
Update tavily.py
2025-02-14 14:56:01 +01:00
Fabio Polito
2419ef06a0
feat: docling support for document preprocessing
2025-02-14 12:08:03 +00:00
Yimi81
d3f71930f0
web loader support proxy
2025-02-14 07:15:09 +00:00
Yimi81
ceef600223
support async load for websearch
2025-02-14 07:05:10 +00:00
xring
27d395ba06
feat: add web search via SerpApi
2025-02-14 12:24:58 +08:00
Timothy Jaeryang Baek
5626426c31
chore: format
2025-02-12 23:28:57 -08:00
Rory
40d4db97e6
Merge remote-tracking branch 'upstream/dev' into playwright
2025-02-12 22:32:44 -06:00
Timothy Jaeryang Baek
a5bba20915
Merge pull request #9837 from silverriver/patch-1
...
feat Make Google PSE search return more than 10 google search results
2025-02-11 21:36:53 -08:00
Silver
7e08373ae5
Update google_pse.py to return results more than 10
2025-02-12 13:01:09 +08:00
Timothy Jaeryang Baek
8906a2e260
Merge pull request #9803 from BochaAI/main
...
add Bocha
2025-02-11 21:01:04 -08:00
luckyman-yan
31360fe991
add Bocha
2025-02-10 16:44:47 +08:00
Timothy Jaeryang Baek
60095598ec
chore: format
2025-02-09 22:20:47 -08:00
Rory
2c711d8365
Merge remote-tracking branch 'upstream/dev' into playwright
...
# Conflicts:
# backend/requirements.txt
2025-02-09 23:52:21 -06:00
Timothy Jaeryang Baek
d5a815b19c
Merge pull request #9693 from vinsdragonis/main
...
fix: Fixed error occurring when using OpenSearch as a vector db
2025-02-09 13:06:19 -08:00
Mazurek Michal
35f3824932
feat: Implement Document Intelligence as Content Extraction Engine
2025-02-07 13:44:47 +01:00
binxn
88db4ca7ba
Update jina_search.py
...
Updated Jina's search function in order to use POST and make use of the result count passed by the user
Note: Jina supports a max of 10 result count
2025-02-06 14:30:27 +01:00
Vineeth B V
7c78facfd9
Update opensearch.py
2025-02-06 13:36:11 +05:30
Vineeth B V
fd6b039859
Added a query method for OpenSearch vector db.
...
- This PR aims to address the error 400: "**'OpenSearchClient' object has no attribute 'query'**".
- With the implemented query() method, this issue should be resolved and allow uploaded documents to be vectorized and retrieved based on the given query.
2025-02-06 12:04:14 +05:30
Rory
ec6fe9939b
Merge remote-tracking branch 'upstream/dev' into playwright
2025-02-05 17:47:58 -06:00
JT
40dea3fbe1
Merge branch 'dev' into main
2025-02-05 15:15:24 -08:00
jayteaftw
157c781b0a
Merge branch 'main' of https://github.com/jayteaftw/open-webui
2025-02-05 14:07:59 -08:00
jayteaftw
6d2f87e904
Added server side Prefixing
2025-02-05 14:03:16 -08:00
Timothy Jaeryang Baek
e41a2682f5
chore: format
2025-02-05 00:07:45 -08:00
Timothy Jaeryang Baek
f6f8c08cb0
Merge pull request #9068 from df-cgdm/main
...
**feat** Add user related headers when calling an external embedding api
2025-02-05 00:05:44 -08:00
Timothy Jaeryang Baek
5cda8a57e7
Merge pull request #9337 from abdalrohman/exa_integration
...
feat: implement Exa search engine integration
2025-02-04 14:00:06 -08:00
JT
81102f4be2
Merge branch 'open-webui:main' into main
2025-02-04 13:06:04 -08:00
jvinolus
7b8e5d4e7c
Fixed errors and added more support
2025-02-04 13:04:36 -08:00
M.Abdulrahman Alnaseer
2bb6b49f11
feat: implement Exa search engine integration
2025-02-04 21:13:16 +03:00
Timothy Jaeryang Baek
3adfa29f7d
chore: format
2025-02-03 21:56:35 -08:00
Rory
7bac1a170d
Merge remote-tracking branch 'upstream/dev' into playwright
...
# Conflicts:
# backend/open_webui/retrieval/web/utils.py
2025-02-03 22:32:46 -06:00
Rory
1b581b714f
Moving code out of playwright branch
2025-02-03 18:47:26 -06:00
Rory
3db6b4352f
fix: Filter out invalid RAG web URLs (continued)
2025-02-03 18:18:49 -06:00
Rory
121a13d4ed
fix: Filter to valid RAG web search URLs
2025-02-03 17:37:20 -06:00
Rory
f837d2cdbb
Merge branch 'dev' of https://github.com/open-webui/open-webui
...
# Conflicts:
# src/lib/i18n/locales/sr-RS/translation.json
2025-02-02 20:31:27 -06:00
Rory
8da33721d5
Support PLAYWRIGHT_WS_URI
2025-02-02 17:58:09 -06:00
Rory
a84e488a4e
Fix playwright in docker by updating unstructured
2025-02-01 22:58:28 -06:00
Sajid Ali
7b31c75271
Milvus: new optional config var, MILVUS_TOKEN
...
modified: backend/open_webui/config.py
modified: backend/open_webui/retrieval/vector/dbs/milvus.py
2025-01-31 17:01:00 -05:00
Rory
2452e271cd
Refine RAG_WEB_LOADER
2025-01-30 20:31:31 -06:00
Didier FOURNOUT
6ca295ec59
Add user related headers when calling an external embedding api
2025-01-29 10:55:52 +00:00
Rory
4e8b390682
Add RAG_WEB_LOADER + Playwright mode + improve stability of search
2025-01-28 23:03:15 -06:00
Timothy Jaeryang Baek
7a70fd1312
fix: bing search
2025-01-20 22:52:19 -08:00
Timothy Jaeryang Baek
bdc60e7850
chore: format backend
2025-01-19 11:59:07 -08:00
jvinolus
47b8412695
Initialize support for prefixing embeddings
2025-01-15 17:05:04 -08:00
Sajid Ali
7a95df008e
Milvus: add new config var, MILVUS_DB
...
modified: backend/open_webui/config.py
modified: backend/open_webui/retrieval/vector/dbs/milvus.py
2025-01-14 15:48:15 -05:00
Timothy Jaeryang Baek
942fd384de
refac: chroma
2025-01-08 13:18:14 -08:00
Timothy Jaeryang Baek
c79b975ad0
refac: chroma
2025-01-08 00:21:50 -08:00
Timothy Jaeryang Baek
0e7c3d4eb8
Merge pull request #8379 from qiaozhi199/main
...
Fix the issue of inaccurate answers after enabling RAG query generation
2025-01-07 23:53:31 -08:00
Jason Kidd
b3a52be401
fix: Pgvector vector column size check was failing on initialization of database
2025-01-07 09:15:13 -08:00
zhiguo.qiao
91f22a8a8d
Return the top k results with the highest similarity.
2025-01-07 17:41:30 +08:00
Timothy Jaeryang Baek
0e805e7dc4
Merge pull request #8298 from jk-f5/feat/pg_vector_size
...
feat: Allow setting the initial vector length on pgvector document_chunk table
2025-01-03 13:03:53 -08:00
Yaroslav Halchenko
8f1953e667
[DATALAD RUNCMD] run codespell throughout fixing few left typos automagically
...
=== Do not change lines below ===
{
"chain": [],
"cmd": "codespell -w",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
2025-01-03 15:07:21 -05:00
Jason Kidd
70b74b5217
feat: Allow setting the initial vector length on pgvector document_chunk table
2025-01-03 09:18:59 -08:00
Timothy Jaeryang Baek
fd0170c179
revert
2024-12-30 16:55:29 -08:00
Timothy Jaeryang Baek
9b56b64cfa
Merge pull request #8212 from ashm-dev/main
...
feat: Small optimization
2024-12-30 16:00:18 -08:00
shamil
a0aee4ff28
feat: Small optimization
2024-12-30 13:45:20 +03:00
Vishwanath Martur
00e6ffe83c
Fix offline docker container startup issue
...
Related to #7207
Modify the code to allow the docker container to start in an offline environment for versions >= 0.4.0.
* **backend/open_webui/retrieval/utils.py**
- Import `OFFLINE_MODE` from `open_webui.env`.
- Set `local_files_only` to `True` when `OFFLINE_MODE` is enabled in `snapshot_kwargs`.
* **backend/open_webui/env.py**
- Add logic to set `HF_HUB_OFFLINE` environment variable to `1` when `OFFLINE_MODE` is enabled.
* **README.md**
- Document setting `HF_HUB_OFFLINE` environment variable to `1` for offline environments.
2024-12-29 11:53:09 +05:30
Timothy Jaeryang Baek
50f36a5262
refac: styling
2024-12-19 20:56:16 -08:00
Timothy Jaeryang Baek
e500461dc0
refac
2024-12-17 18:40:50 -08:00
Timothy Jaeryang Baek
f341971eae
fix
2024-12-15 23:41:17 -08:00
MooreDerek
4905c180a5
Only log file contents in debug
2024-12-16 15:58:26 +13:00
Timothy Jaeryang Baek
867c4bc0d0
wip: retrieval
2024-12-11 18:05:42 -08:00
Timothy Jaeryang Baek
d3d161f723
wip
2024-12-10 00:54:13 -08:00