Web Scraper
Overview
Web Scraper Drivers can be used to scrape text from the web. They are used by WebLoader to provide its functionality. All Web Scraper Drivers implement the following methods:
scrape_url()
scrapes text from a website and returns a TextArtifact. The format of the scrapped text is determined by the Driver.
Web Scraper Drivers
Proxy
The ProxyWebScraperDriver uses the requests
library with a provided set of proxies to do web scraping. Paid webscraping services like ZenRows or ScraperAPI offer a way to use their API via a set of proxies passed to requests.get()
Example using ProxyWebScraperDriver
directly:
import os
from griptape.drivers.web_scraper.proxy import ProxyWebScraperDriver
query_params = [
"markdown_response=true",
"js_render=false",
"premium_proxy=false",
]
proxy_url = f"http://{os.environ['ZENROWS_API_KEY']}:{'&'.join(query_params)}@proxy.zenrows.com:8001"
driver = ProxyWebScraperDriver(
proxies={
"http": proxy_url,
"https": proxy_url,
},
params={"verify": False},
)
driver.scrape_url("https://griptape.ai")
Markdownify
Info
This driver requires the drivers-web-scraper-markdownify
extra and the
playwright browsers to be installed.
To install the playwright browsers, run playwright install
in your terminal. If you are using
uv, run uv run playwright install
instead. The playwright
command should already be
installed as a dependency of the drivers-web-scraper-markdownify
extra. For more details about
playwright, see the playwright docs.
Note that if you skip installing the playwright browsers, you will see the following error when you run your code:
playwright._impl._errors.Error: Executable doesn't exist at ...
╔════════════════════════════════════════════════════════════╗
║ Looks like Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ playwright install ║
║ ║
║ <3 Playwright Team ║
╚════════════════════════════════════════════════════════════╝
The MarkdownifyWebScraperDriver outputs the scraped text in markdown format. It uses playwright to render web pages along with dynamically loaded content, and a combination of beautifulsoup4 and markdownify to produce a markdown representation of a webpage. It makes a best effort to produce a markdown representation of a webpage that is concise yet human (and LLM) readable.
Example using MarkdownifyWebScraperDriver
directly:
from griptape.drivers.web_scraper.markdownify import MarkdownifyWebScraperDriver
driver = MarkdownifyWebScraperDriver()
driver.scrape_url("https://griptape.ai")
Example of using MarkdownifyWebScraperDriver
with an agent:
from griptape.drivers.web_scraper.markdownify import MarkdownifyWebScraperDriver
from griptape.loaders import WebLoader
from griptape.structures import Agent
from griptape.tools import WebScraperTool
agent = Agent(
tools=[
WebScraperTool(
web_loader=WebLoader(web_scraper_driver=MarkdownifyWebScraperDriver(timeout=1000)),
off_prompt=False,
),
],
)
agent.run("List all email addresses on griptape.ai in a flat numbered markdown list.")
[02/27/25 20:24:25] INFO PromptTask c8efa9f397014ff480eccaeaa5791978
Input: List all email addresses on griptape.ai in a
flat numbered markdown list.
[02/27/25 20:24:39] INFO Subtask 0c9408d854f74c5b9482600a3225234e
Actions: [
{
"tag": "call_zcOOIWYhARbzGdl1LmcOMFZV",
"name": "WebScraperTool",
"path": "get_content",
"input": {
"values": {
"url": "https://griptape.ai"
}
}
}
]
[02/27/25 20:24:42] INFO Subtask 0c9408d854f74c5b9482600a3225234e
Response: We value your privacy
This website or its third-party tools process
personal data. You can opt out of the sale of your
personal information by clicking on the “Do Not
Sell or Share My Personal Information” link.
Do Not Sell or Share My Personal Information
Opt-out Preferences
We use third-party cookies that help us analyze how
you use this website, store your preferences, and
provide the content and advertisements that are
relevant to you. However, you can opt out of these
cookies by checking "Do Not Sell or Share My
Personal Information" and clicking the "Save My
Preferences" button. Once you opt out, you can opt
in again at any time by unchecking "Do Not Sell or
Share My Personal Information" and clicking the
"Save My Preferences" button.
Do Not Sell or Share My Personal Information
Cancel Save My Preferences
Powered by
* Products
[AI Framework](/ai-framework)[AI
Cloud](/cloud)[AI Cloud
Pricing](/pricing-griptape-cloud)
* [Docs](https://docs.griptape.ai/stable/)
* How to
[Samples](/sample-applications)[Learn](https://le
arn.griptape.ai/latest/)
* Resources
[Blog](/blog)[Community](https://discord.gg/gript
ape)[GitHub](https://github.com/griptape-ai/griptap
e)[ComfyUI Nodes](/griptape-comfyui-nodes)
* [About](/team)
* [Request
Demo](https://calendly.com/griptape-ai/product-over
view-demo?month=2024-12)
* [Log In](https://cloud.griptape.ai/)
* [Start
FREE](https://cloud.griptape.ai/account?signup=true
)
Build Production Ready AI Agents
================================
Everything You Need to Build Reliable AI Agents
Quickly and Securely, Using Your Data.Built for the
Enterprise. Deploy Anywhere.
---------------------------------------------------
---------------------------------------------------
--------------------------
[Start
Building](https://cloud.griptape.ai/account?signup=
true)[GitHub](https://github.com/griptape-ai/gripta
pe)
Build, Deploy, and Scale End-to-End Solutions, from
LLM-Powered Data Prep and Retrieval to AI Agents,
Pipelines, and Workflows.
===================================================
===================================================
=========================
### Griptape gives developers everything they need,
from the open source AI framework ([Griptape AI
Framework](/ai-framework)) to the execution runtime
([Griptape AI Cloud](/cloud)).
### Build &Secure
1. Build your business logic using predictable,
programmable python - don’t gamble on prompting.
2. Turn any developer into an AI developer.
3. Off-Prompt™ gives you better security,
performance, and lower costs.
[Learn More](/ai-framework)
### Deploy &Scale
1. Deploy and run the ETL, RAG, and structures you
developed.
2. Simple API abstractions.
3. Skip the infrastructure management.
4. Scale seamlessly so you can grow with your
workload requirements.
[Learn More](/cloud)
### Manage &Monitor
1. Monitor directly in Griptape Cloud or integrate
with any third-party service.
2. Measure performance, reliability, and spending
across the organization
3. Enforce policy for each user, structure, task,
and query
[Learn More](/cloud)
Sample Applications Built on Griptape
=====================================
[#### Transform Data with Find and
Replace](https://github.com/griptape-ai/griptape-sa
mple-structures/tree/main/griptape_find_replace_tra
nsform)
[#### Event Handler for LLM-Powered Slack
Applications](https://github.com/griptape-ai/gripta
pe-sample-structures/tree/main/griptape_slack_handl
er)
[#### Keep Private Data ‘Off Prompt’ with
TaskMemory](https://github.com/griptape-ai/griptape
-sample-structures/tree/main/griptape_off_prompt)
[More Apps](/sample-applications)
🎢 Griptape AI Framework
=======================
### Griptape provides clean and clear abstractions
for building Gen AI Agents, Systems of Agents,
Pipelines, Workflows, and RAG implementations
without having to spend weeks learning Gen AI nor
need to ever learn Prompt Engineering.
### Build
Build ETL pipelines to prep your data for secure
LLM access.
### Compose
Compose retrieval patterns that give fast,
accurate, detailed information.
### Write
Write agents, pipelines, and workflows (i.e.
structures) to integrate your business logic.
[Learn More](/ai-framework)
🌩️ Griptape AI Cloud
====================
### Skip the infrastructure management. We’ll host
and operate everything for you, from the data
processing pipeline to the retrieval-ready database
to the serverless application runtime. Simple to
complex, one layer of the stack or the whole
enchilada, we’ve got you covered.
### Automated Data Prep(ETL)
Connect any data source and extract. Prep/transform
it (extract, clean, chunk, embed, add metadata).
Load it into a vector database index.
### Retrieval as a Service(RAG)
Generate answers, summaries, and details from your
own data. Use ready-made retrieval patterns,
customize them to fit your use case, or compose
your own from scratch (Modular RAG).
### Structure Runtime(RUN)
Build your own AI agents, pipelines, and workflows.
Real-time interfaces, transactional processes,
batch workloads. Plug them into client
applications.
[Learn More](/cloud)
Learn more...
=============
[#### Griptape Rules. No, not like
that.](/blog/griptape-rules-no-not-like-that)
[#### New Features in Griptape Framework
1.3](/blog/new-features-in-griptape-framework-1-3)
[#### Announcing Griptape Framework v1.2 with
Structured
Output](/blog/announcing-griptape-framework-v1-2-wi
th-structured-output)
[AI Blog](/blog)
Be part of our community.
=========================
### Join our social channels for the latest news,
tutorials, and exclusive insights.
Our partners
============
Resources
[Docs](https://docs.griptape.ai/latest/)[Learning](
https://learn.griptape.ai/latest/)[Github](https://
github.com/griptape-ai/griptape)[Blog](/blog)
Company
[Brand
Guidelines](/brand-guidelines)[Careers](/team)[Crun
chbase](https://www.crunchbase.com/organization/gri
ptape)[AI Glossary](/ai-glossary)
Contact
[hello@griptape.ai](mailto:hello@griptpae.ai)[press
@griptape.ai](mailto:press@griptpae.ai)[careers@gri
ptape.ai](mailto:careers@griptpae.ai)
About
[Privacy Policy](/privacy-policy)[Terms of
Service](/terms-of-service)
© Griptape, Inc
[Sitemap](https://www.griptape.ai/sitemap.xml)
[02/27/25 20:24:43] INFO PromptTask c8efa9f397014ff480eccaeaa5791978
Output: Here are the email addresses found on the
Griptape.ai website:
1. [hello@griptape.ai](mailto:hello@griptape.ai)
2. [press@griptape.ai](mailto:press@griptape.ai)
3.
[careers@griptape.ai](mailto:careers@griptape.ai)
Trafilatura
Info
This driver requires the drivers-web-scraper-trafilatura
extra.
The TrafilaturaWebScraperDriver scrapes text from a webpage using the Trafilatura library.
Example of using TrafilaturaWebScraperDriver
directly: