Merge pull request #13731 from tth37/fix_duplicate_web_search_urls
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run

fix: Duplicate web search urls
This commit is contained in:
Tim Jaeryang Baek 2025-05-09 16:43:11 +04:00 committed by GitHub
commit 8acc1ab425
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1590,6 +1590,11 @@ async def process_web_search(
try:
urls = [result.link for result in web_results]
# Remove duplicates
urls = list(dict.fromkeys(urls))
log.debug(f"urls: {urls}")
loader = get_web_loader(
urls,
verify_ssl=request.app.state.config.ENABLE_WEB_LOADER_SSL_VERIFICATION,
@ -1601,10 +1606,6 @@ async def process_web_search(
doc.metadata.get("source") for doc in docs if doc.metadata.get("source")
] # only keep URLs
# Remove duplicates
urls = list(dict.fromkeys(urls))
log.debug(f"urls: {urls}")
if request.app.state.config.BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL:
return {
"status": True,