Merge pull request #12056 from ivanbaldo/fix-tika-spaces

**fix**: Tika 3.1.0.0 sends a lot of blank lines which degrades the RAG results and don't look good on the UI, strip them.
This commit is contained in:
Timothy Jaeryang Baek 2025-03-25 21:00:49 -07:00 committed by GitHub
commit 9ef5868226
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -105,7 +105,7 @@ class TikaLoader:
if r.ok:
raw_metadata = r.json()
text = raw_metadata.get("X-TIKA:content", "<No text content found>")
text = raw_metadata.get("X-TIKA:content", "<No text content found>").strip()
if "Content-Type" in raw_metadata:
headers["Content-Type"] = raw_metadata["Content-Type"]