Merge pull request #13731 from tth37/fix_duplicate_web_search_urls
Some checks are pending
Deploy to HuggingFace Spaces / check-secret (push) Waiting to run
Deploy to HuggingFace Spaces / deploy (push) Blocked by required conditions
Create and publish Docker images with specific build args / build-main-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-main-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-cuda-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/amd64) (push) Waiting to run
Create and publish Docker images with specific build args / build-ollama-image (linux/arm64) (push) Waiting to run
Create and publish Docker images with specific build args / merge-main-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-cuda-images (push) Blocked by required conditions
Create and publish Docker images with specific build args / merge-ollama-images (push) Blocked by required conditions
Python CI / Format Backend (3.11.x) (push) Waiting to run
Python CI / Format Backend (3.12.x) (push) Waiting to run
Frontend Build / Format & Build Frontend (push) Waiting to run
Frontend Build / Frontend Unit Tests (push) Waiting to run

fix: Duplicate web search urls
This commit is contained in:
Tim Jaeryang Baek
2025-05-09 16:43:11 +04:00
committed by GitHub

View File

@@ -1590,6 +1590,11 @@ async def process_web_search(
try:
urls = [result.link for result in web_results]
# Remove duplicates
urls = list(dict.fromkeys(urls))
log.debug(f"urls: {urls}")
loader = get_web_loader(
urls,
verify_ssl=request.app.state.config.ENABLE_WEB_LOADER_SSL_VERIFICATION,
@@ -1601,10 +1606,6 @@ async def process_web_search(
doc.metadata.get("source") for doc in docs if doc.metadata.get("source")
] # only keep URLs
# Remove duplicates
urls = list(dict.fromkeys(urls))
log.debug(f"urls: {urls}")
if request.app.state.config.BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL:
return {
"status": True,