Athanasios Oikonomou
657162e96d
feat(ocr): add support for Docling OCR engine and language configuration
...
This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.
Fixes #13133
2025-05-03 00:32:06 +03:00
ayan4m1
039dec6820
fix: pass header to Tika if PDF_EXTRACT_IMAGES is true
2025-04-20 17:36:40 +02:00
Timothy Jaeryang Baek
ef787e4a79
Merge pull request #12486 from FabioPolito24/text-file-handling-docling
...
fix: text file handling with docling
2025-04-05 09:55:51 -07:00
Fabio Polito
cd0a1b4852
fix: fix for text file handling with docling
2025-04-05 16:44:08 +00:00
Patrick Wachter
0ac00b9256
refactor: update import path for MistralLoader
2025-04-02 13:56:10 +02:00
Patrick Wachter
93d7702e8c
refactor: move MistralLoader to a separate module and just use the requests package instead of mistralai
2025-04-01 20:14:34 +02:00
Patrick Wachter
1ac6879268
Add Mistral OCR integration and configuration support
2025-04-01 14:24:33 +02:00
Junaid Pinjari
e782e7d3a7
Fix: CSV loader encoding issue using autodetect_encoding=True
2025-03-29 13:14:53 +05:30
Iván Baldo
115e46a6a2
Fix: Tika 3.1.0.0 sends a lot of blank lines which degrades the RAG results, strip them.
2025-03-25 14:53:14 -03:00
Fabio Polito
9d6743824e
fix: fix params DoclingLoader
2025-03-09 16:12:14 +00:00
Fabio Polito
0716f96da8
style: change style in DoclingLoader
2025-03-05 23:15:55 +00:00
Fabio Polito
9aa407dbd2
feat: merge with main
2025-03-05 22:04:34 +00:00
Fabio Polito
a44b35e99e
fix: fix DoclingLoader input params
2025-03-05 17:53:45 +00:00
Timothy Jaeryang Baek
33d3558ca9
Merge pull request #10817 from NovoNordisk-OpenSource/ivaroli/adding-json-as-supported-file-type
...
fix: Using the TextLoader instead of Tika for JSON files
2025-02-26 12:49:29 -08:00
Ívar Óli Sigurðsson
c5a09cdd21
adding a comma
2025-02-26 15:27:03 +01:00
Ívar Óli Sigurðsson
661711164a
Adding json as a known source for Tika
2025-02-26 15:11:21 +01:00
Fabio Polito
2419ef06a0
feat: docling support for document preprocessing
2025-02-14 12:08:03 +00:00
Mazurek Michal
35f3824932
feat: Implement Document Intelligence as Content Extraction Engine
2025-02-07 13:44:47 +01:00
Timothy Jaeryang Baek
f341971eae
fix
2024-12-15 23:41:17 -08:00
MooreDerek
4905c180a5
Only log file contents in debug
2024-12-16 15:58:26 +13:00
Timothy Jaeryang Baek
d3d161f723
wip
2024-12-10 00:54:13 -08:00