Merge pull request #13085 from ayan4m1/fix/tika-image-ocr
Some checks failed
Deploy to HuggingFace Spaces / check-secret (push) Has been cancelled
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Has been cancelled
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Has been cancelled
Python CI / Format Backend (3.11.x) (push) Has been cancelled
Python CI / Format Backend (3.12.x) (push) Has been cancelled
Frontend Build / Format & Build Frontend (push) Has been cancelled
Frontend Build / Frontend Unit Tests (push) Has been cancelled
Deploy to HuggingFace Spaces / deploy (push) Has been cancelled
Create and publish Docker images with specific build args / merge-main-images (push) Has been cancelled
Create and publish Docker images with specific build args / merge-cuda-images (push) Has been cancelled
Create and publish Docker images with specific build args / merge-ollama-images (push) Has been cancelled

fix: pass extractInlineImages header to Tika if PDF_EXTRACT_IMAGES is true
This commit is contained in:
Tim Jaeryang Baek 2025-05-02 03:47:51 -07:00 committed by GitHub
commit 7d184c3a14
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -99,6 +99,9 @@ class TikaLoader:
else:
headers = {}
if self.kwargs.get("PDF_EXTRACT_IMAGES") == True:
headers['X-Tika-PDFextractInlineImages'] = 'true'
endpoint = self.url
if not endpoint.endswith("/"):
endpoint += "/"