Timothy Jaeryang Baek
7f75acff96
chore: format
2025-06-08 22:08:25 +04:00
Timothy Jaeryang Baek
0cd400f5ee
refac: docling picture describe params
2025-06-08 20:02:14 +04:00
Tim Jaeryang Baek
6bf393a480
Merge pull request #14787 from vaclcer/vaclavs-custom-docling
...
feat: Customize Docling's "Describe Pictures" feature
2025-06-08 19:02:36 +04:00
Tim Jaeryang Baek
50d9a2ac58
Merge pull request #14781 from lucyknada/patch-2
...
fix: fix #14752 and add manual transcription retrieval
2025-06-08 18:40:28 +04:00
Vaclav Cerny
99f05561f8
Add configuration options for picture description modes and update related components
2025-06-08 16:30:26 +02:00
lucy
b0965a8184
fixes #14752 and adds manual transcription option
2025-06-08 14:26:24 +02:00
Timothy Jaeryang Baek
5e35aab292
chore: format
2025-06-05 01:12:28 +04:00
Vaclav Cerny
9772c18b20
fix(loader): remove deprecated picture description configuration
2025-06-04 17:21:44 +02:00
Vaclav Cerny
c71236ba07
feat(loader): enhance picture description prompt for improved detail and clarity
2025-06-04 14:25:31 +02:00
Vaclav Cerny
c4278f4784
fix description vs classification mismatch
2025-06-04 14:13:00 +02:00
Vaclav Cerny
8644e81a1c
feat(loader): add picture description configuration for DoclingLoader
2025-06-04 12:34:39 +02:00
Timothy Jaeryang Baek
4d364e2967
refac: remove msg from known type
2025-06-03 16:27:28 +04:00
PVBLIC Foundation
cf3635ba25
Update mistral.py
...
1. Intelligent Error Handling
Added _is_retryable_error() method to distinguish retryable vs non-retryable errors
Prevents unnecessary retries on client errors (4xx) that won't succeed
Caps retry delay at 30 seconds to prevent excessive waiting
2. Optimized Timeout Configuration
Upload: Capped at 2 minutes (was using full 5-minute timeout)
URL requests: 30 seconds (should be fast)
OCR processing: Full timeout (can take time)
Cleanup: 30 seconds (should be quick)
3. Enhanced Connection Pool
Increased connection limits: 20 total, 10 per host
Longer DNS cache TTL (10 minutes vs 5 minutes)
Increased keepalive timeout (60s vs 30s)
Added async DNS resolver for better performance
Granular timeout controls (connect, read, total)
4. Concurrency Control for Batch Processing
Added semaphore-based concurrency control (default: 5 concurrent)
Prevents API overwhelming while maintaining throughput
Configurable concurrency limit per workload
5. Memory Efficient Result Processing
Early exit for empty content validation
Better error metadata for debugging
Added content length tracking
Streamlined page processing logic
6. General Performance Improvements
Better error logging with truncated responses
Optimized metadata creation
Improved debug logging efficiency
2025-05-30 20:06:29 -07:00
Timothy Jaeryang Baek
7dc7d5c028
refac: PLEASE FOLLOW EXISTING CONVENTION
2025-05-29 03:47:02 +04:00
Timothy Jaeryang Baek
551597b9cc
chore: format
2025-05-29 02:36:33 +04:00
Hisma
e12a79c0e2
fix: handle json output format correctly
2025-05-27 01:12:03 -04:00
Hisma
a9405cc101
feat: Marker api content extraction support
2025-05-27 00:44:07 -04:00
Timothy Jaeryang Baek
8b5e89eada
chore: format
2025-05-24 00:43:38 +04:00
PVBLIC Foundation
bf193dfb5d
Update mistral.py
2025-05-23 10:00:19 -07:00
sree
f408b08965
minor bug fix for external document loader not working
2025-05-20 11:10:23 +05:30
Timothy Jaeryang Baek
8732b64b6b
feat: external document loader support
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
2025-05-14 22:28:40 +04:00
Timothy Jaeryang Baek
de70d0cb64
feat: docling do picture description support
2025-05-14 21:26:49 +04:00
Timothy Jaeryang Baek
6359cb55fe
chore: format
2025-05-07 02:01:03 +04:00
Tim Jaeryang Baek
ea07e242f5
Merge pull request #13528 from Classic298/dev
...
feat: Enhance YouTube Transcription Loader for multi-language support
2025-05-07 00:44:45 +04:00
Classic298
1dcbec71ec
Update youtube.py
2025-05-06 17:14:00 +02:00
Classic298
87dcbd198c
Update youtube.py
2025-05-06 17:11:03 +02:00
Classic298
d7927506f1
Update youtube.py
2025-05-06 17:06:21 +02:00
Classic298
f65dc715f9
Update youtube.py
2025-05-06 16:30:18 +02:00
Classic298
c69278c13c
Update youtube.py
2025-05-06 16:24:27 +02:00
Classic298
a129e0954e
Update youtube.py
2025-05-06 16:22:40 +02:00
Classic298
5e1cb76b93
Update youtube.py
2025-05-06 16:16:58 +02:00
Timothy Jaeryang Baek
e63b8b3879
refac
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run
2025-05-06 00:46:32 +04:00
Timothy Jaeryang Baek
27da31dc83
fix: tikaloader extract images
2025-05-05 23:40:34 +04:00
Classic298
67a612fe24
Update youtube.py
2025-05-05 20:40:48 +02:00
Classic298
791dd24ace
Update youtube.py
2025-05-05 20:08:25 +02:00
Classic298
9cf3381381
Update youtube.py
2025-05-05 20:07:52 +02:00
Classic298
b0d74a59f1
Update youtube.py
2025-05-05 20:07:37 +02:00
Classic298
1a30b3746e
Update youtube.py
2025-05-05 20:03:00 +02:00
Classic298
0a3817ed86
Update youtube.py
2025-05-05 20:00:10 +02:00
Classic298
0a845db8ec
Update youtube.py
2025-05-05 19:57:21 +02:00
Classic298
7680ac2517
Update youtube.py
2025-05-05 19:57:06 +02:00
Athanasios Oikonomou
657162e96d
feat(ocr): add support for Docling OCR engine and language configuration
...
This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.
Fixes #13133
2025-05-03 00:32:06 +03:00
Tim Jaeryang Baek
7d184c3a14
Merge pull request #13085 from ayan4m1/fix/tika-image-ocr
...
Deploy to HuggingFace Spaces / check-secret (push) Has been cancelled
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Has been cancelled
Python CI / Format Backend (3.11.x) (push) Has been cancelled
Python CI / Format Backend (3.12.x) (push) Has been cancelled
Frontend Build / Format & Build Frontend (push) Has been cancelled
Frontend Build / Frontend Unit Tests (push) Has been cancelled
Deploy to HuggingFace Spaces / deploy (push) Has been cancelled
Create and publish Docker images with specific build args / merge-main-images (push) Has been cancelled
Create and publish Docker images with specific build args / merge-cuda-images (push) Has been cancelled
Create and publish Docker images with specific build args / merge-ollama-images (push) Has been cancelled
fix: pass extractInlineImages header to Tika if PDF_EXTRACT_IMAGES is true
2025-05-02 03:47:51 -07:00
ayan4m1
039dec6820
fix: pass header to Tika if PDF_EXTRACT_IMAGES is true
2025-04-20 17:36:40 +02:00
tth37
008fec80c1
fix: Update external search/loader method to POST
2025-04-14 18:17:27 +08:00
tth37
22f0365cef
format
2025-04-14 02:05:58 +08:00
tth37
839ba22c90
feat: Backend for Self-Hosted/External Web Search/Loader Engines
2025-04-14 01:49:05 +08:00
lucy
bc295546cd
fix #12678
2025-04-10 07:23:34 +02:00
Timothy Jaeryang Baek
ef787e4a79
Merge pull request #12486 from FabioPolito24/text-file-handling-docling
...
fix: text file handling with docling
2025-04-05 09:55:51 -07:00
Fabio Polito
cd0a1b4852
fix: fix for text file handling with docling
2025-04-05 16:44:08 +00:00