Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
metadata
name: scrape-web
description: Fetch a web page and extract clean text for indexing
task: research
tools:
- scrape_web
Workflow
- Receive a full
https://URL from the user or orchestrator. - Call
scrape_webwith the URL. - Return title, extracted text, and final URL metadata.
- Pass the
ExtractedDocumenttoextract_and_indexfor MemRAG storage.
See references/html-cleanup.md for extraction settings and references/allowed-domains.md for rate-limit notes.