Skip to content

Web Scraper

Overview

Web Scraper Drivers can be used to scrape text from the web. They are used by WebLoader to provide its functionality. All Web Scraper Drivers implement the following methods:

  • scrape_url() scrapes text from a website and returns a TextArtifact. The format of the scrapped text is determined by the Driver.

Web Scraper Drivers

Proxy

The ProxyWebScraperDriver uses the requests library with a provided set of proxies to do web scraping. Paid webscraping services like ZenRows or ScraperAPI offer a way to use their API via a set of proxies passed to requests.get()

Example using ProxyWebScraperDriver directly:

import os

from griptape.drivers.web_scraper.proxy import ProxyWebScraperDriver

query_params = [
    "markdown_response=true",
    "js_render=false",
    "premium_proxy=false",
]
proxy_url = f"http://{os.environ['ZENROWS_API_KEY']}:{'&'.join(query_params)}@proxy.zenrows.com:8001"

driver = ProxyWebScraperDriver(
    proxies={
        "http": proxy_url,
        "https": proxy_url,
    },
    params={"verify": False},
)

driver.scrape_url("https://griptape.ai")

Markdownify

Info

This driver requires the drivers-web-scraper-markdownify extra and the playwright browsers to be installed.

To install the playwright browsers, run playwright install in your terminal. If you are using uv, run uv run playwright install instead. The playwright command should already be installed as a dependency of the drivers-web-scraper-markdownify extra. For more details about playwright, see the playwright docs.

Note that if you skip installing the playwright browsers, you will see the following error when you run your code:

playwright._impl._errors.Error: Executable doesn't exist at ...
╔════════════════════════════════════════════════════════════╗
║ Looks like Playwright was just installed or updated.       ║
║ Please run the following command to download new browsers: ║
║                                                            ║
║     playwright install                                     ║
║                                                            ║
║ <3 Playwright Team                                         ║
╚════════════════════════════════════════════════════════════╝

The MarkdownifyWebScraperDriver outputs the scraped text in markdown format. It uses playwright to render web pages along with dynamically loaded content, and a combination of beautifulsoup4 and markdownify to produce a markdown representation of a webpage. It makes a best effort to produce a markdown representation of a webpage that is concise yet human (and LLM) readable.

Example using MarkdownifyWebScraperDriver directly:

from griptape.drivers.web_scraper.markdownify import MarkdownifyWebScraperDriver

driver = MarkdownifyWebScraperDriver()

driver.scrape_url("https://griptape.ai")

Example of using MarkdownifyWebScraperDriver with an agent:

from griptape.drivers.web_scraper.markdownify import MarkdownifyWebScraperDriver
from griptape.loaders import WebLoader
from griptape.structures import Agent
from griptape.tools import WebScraperTool

agent = Agent(
    tools=[
        WebScraperTool(
            web_loader=WebLoader(web_scraper_driver=MarkdownifyWebScraperDriver(timeout=1000)),
            off_prompt=False,
        ),
    ],
)
agent.run("List all email addresses on griptape.ai in a flat numbered markdown list.")
[02/27/25 20:24:25] INFO     PromptTask c8efa9f397014ff480eccaeaa5791978        
                             Input: List all email addresses on griptape.ai in a
                             flat numbered markdown list.                       
[02/27/25 20:24:39] INFO     Subtask 0c9408d854f74c5b9482600a3225234e           
                             Actions: [                                         
                               {                                                
                                 "tag": "call_zcOOIWYhARbzGdl1LmcOMFZV",        
                                 "name": "WebScraperTool",                      
                                 "path": "get_content",                         
                                 "input": {                                     
                                   "values": {                                  
                                     "url": "https://griptape.ai"               
                                   }                                            
                                 }                                              
                               }                                                
                             ]                                                  
[02/27/25 20:24:42] INFO     Subtask 0c9408d854f74c5b9482600a3225234e           
                             Response: We value your privacy                    

                             This website or its third-party tools process      
                             personal data. You can opt out of the sale of your 
                             personal information by clicking on the “Do Not    
                             Sell or Share My Personal Information” link.       

                             Do Not Sell or Share My Personal Information       

                             Opt-out Preferences                                

                             We use third-party cookies that help us analyze how
                             you use this website, store your preferences, and  
                             provide the content and advertisements that are    
                             relevant to you. However, you can opt out of these 
                             cookies by checking "Do Not Sell or Share My       
                             Personal Information" and clicking the "Save My    
                             Preferences" button. Once you opt out, you can opt 
                             in again at any time by unchecking "Do Not Sell or 
                             Share My Personal Information" and clicking the    
                             "Save My Preferences" button.                      

                             Do Not Sell or Share My Personal Information       

                             Cancel   Save My Preferences                       

                             Powered by                                         

                             * Products                                         

                               [AI Framework](/ai-framework)[AI                 
                             Cloud](/cloud)[AI Cloud                            
                             Pricing](/pricing-griptape-cloud)                  
                             * [Docs](https://docs.griptape.ai/stable/)         
                             * How to                                           

                               [Samples](/sample-applications)[Learn](https://le
                             arn.griptape.ai/latest/)                           
                             * Resources                                        

                               [Blog](/blog)[Community](https://discord.gg/gript
                             ape)[GitHub](https://github.com/griptape-ai/griptap
                             e)[ComfyUI Nodes](/griptape-comfyui-nodes)         
                             * [About](/team)                                   
                             * [Request                                         
                             Demo](https://calendly.com/griptape-ai/product-over
                             view-demo?month=2024-12)                           
                             * [Log In](https://cloud.griptape.ai/)             
                             * [Start                                           
                             FREE](https://cloud.griptape.ai/account?signup=true
                             )                                                  

                             Build Production Ready AI Agents                   
                             ================================                   

                             Everything You Need to Build Reliable AI Agents    
                             Quickly and Securely, Using Your Data.Built for the
                             Enterprise. Deploy Anywhere.                       
                             ---------------------------------------------------
                             ---------------------------------------------------
                             --------------------------                         

                             [Start                                             
                             Building](https://cloud.griptape.ai/account?signup=
                             true)[GitHub](https://github.com/griptape-ai/gripta
                             pe)                                                

                             Build, Deploy, and Scale End-to-End Solutions, from
                             LLM-Powered Data Prep and Retrieval to AI Agents,  
                             Pipelines, and Workflows.                          
                             ===================================================
                             ===================================================
                             =========================                          

                             ### Griptape gives developers everything they need,
                             from the open source AI framework ([Griptape AI    
                             Framework](/ai-framework)) to the execution runtime
                             ([Griptape AI Cloud](/cloud)).                     

                             ### Build &Secure                                  

                             1. Build your business logic using predictable,    
                             programmable python - don’t gamble on prompting.   
                             2. Turn any developer into an AI developer.        
                             3. Off-Prompt™ gives you better security,          
                             performance, and lower costs.                      

                             [Learn More](/ai-framework)                        

                             ### Deploy &Scale                                  

                             1. Deploy and run the ETL, RAG, and structures you 
                             developed.                                         
                             2. Simple API abstractions.                        
                             3. Skip the infrastructure management.             
                             4. Scale seamlessly so you can grow with your      
                             workload requirements.                             

                             [Learn More](/cloud)                               

                             ### Manage &Monitor                                

                             1. Monitor directly in Griptape Cloud or integrate 
                             with any third-party service.                      
                             2. Measure performance, reliability, and spending  
                             across the organization                            
                             3. Enforce policy for each user, structure, task,  
                             and query                                          

                             [Learn More](/cloud)                               

                             Sample Applications Built on Griptape              
                             =====================================              

                             [#### Transform Data with Find and                 
                             Replace](https://github.com/griptape-ai/griptape-sa
                             mple-structures/tree/main/griptape_find_replace_tra
                             nsform)                                            

                             [#### Event Handler for LLM-Powered Slack          
                             Applications](https://github.com/griptape-ai/gripta
                             pe-sample-structures/tree/main/griptape_slack_handl
                             er)                                                

                             [#### Keep Private Data ‘Off Prompt’ with          
                             TaskMemory](https://github.com/griptape-ai/griptape
                             -sample-structures/tree/main/griptape_off_prompt)  

                             [More Apps](/sample-applications)                  

                             🎢 Griptape AI Framework                           
                             =======================                            

                             ### Griptape provides clean and clear abstractions 
                             for building Gen AI Agents, Systems of Agents,     
                             Pipelines, Workflows, and RAG implementations      
                             without having to spend weeks learning Gen AI nor  
                             need to ever learn Prompt Engineering.             

                             ### Build                                          

                             Build ETL pipelines to prep your data for secure   
                             LLM access.                                        

                             ### Compose                                        

                             Compose retrieval patterns that give fast,         
                             accurate, detailed information.                    

                             ### Write                                          

                             Write agents, pipelines, and workflows (i.e.       
                             structures) to integrate your business logic.      

                             [Learn More](/ai-framework)                        

                             🌩️ Griptape AI Cloud                                
                             ====================                               

                             ### Skip the infrastructure management. We’ll host 
                             and operate everything for you, from the data      
                             processing pipeline to the retrieval-ready database
                             to the serverless application runtime. Simple to   
                             complex, one layer of the stack or the whole       
                             enchilada, we’ve got you covered.                  

                             ### Automated Data Prep(ETL)                       

                             Connect any data source and extract. Prep/transform
                             it (extract, clean, chunk, embed, add metadata).   
                             Load it into a vector database index.              

                             ### Retrieval as a Service(RAG)                    

                             Generate answers, summaries, and details from your 
                             own data. Use ready-made retrieval patterns,       
                             customize them to fit your use case, or compose    
                             your own from scratch (Modular RAG).               

                             ### Structure Runtime(RUN)                         

                             Build your own AI agents, pipelines, and workflows.
                             Real-time interfaces, transactional processes,     
                             batch workloads. Plug them into client             
                             applications.                                      

                             [Learn More](/cloud)                               

                             Learn more...                                      
                             =============                                      

                             [#### Griptape Rules. No, not like                 
                             that.](/blog/griptape-rules-no-not-like-that)      

                             [#### New Features in Griptape Framework           
                             1.3](/blog/new-features-in-griptape-framework-1-3) 

                             [#### Announcing Griptape Framework v1.2 with      
                             Structured                                         
                             Output](/blog/announcing-griptape-framework-v1-2-wi
                             th-structured-output)                              

                             [AI Blog](/blog)                                   

                             Be part of our community.                          
                             =========================                          

                             ### Join our social channels for the latest news,  
                             tutorials, and exclusive insights.                 

                             Our partners                                       
                             ============                                       

                             Resources                                          

                             [Docs](https://docs.griptape.ai/latest/)[Learning](
                             https://learn.griptape.ai/latest/)[Github](https://
                             github.com/griptape-ai/griptape)[Blog](/blog)      

                             Company                                            

                             [Brand                                             
                             Guidelines](/brand-guidelines)[Careers](/team)[Crun
                             chbase](https://www.crunchbase.com/organization/gri
                             ptape)[AI Glossary](/ai-glossary)                  

                             Contact                                            

                             [hello@griptape.ai](mailto:hello@griptpae.ai)[press
                             @griptape.ai](mailto:press@griptpae.ai)[careers@gri
                             ptape.ai](mailto:careers@griptpae.ai)              

                             About                                              

                             [Privacy Policy](/privacy-policy)[Terms of         
                             Service](/terms-of-service)                        

                             © Griptape, Inc                                    

                             [Sitemap](https://www.griptape.ai/sitemap.xml)     
[02/27/25 20:24:43] INFO     PromptTask c8efa9f397014ff480eccaeaa5791978        
                             Output: Here are the email addresses found on the  
                             Griptape.ai website:                               

                             1. [hello@griptape.ai](mailto:hello@griptape.ai)   
                             2. [press@griptape.ai](mailto:press@griptape.ai)   
                             3.                                                 
                             [careers@griptape.ai](mailto:careers@griptape.ai)  

Trafilatura

Info

This driver requires the drivers-web-scraper-trafilatura extra.

The TrafilaturaWebScraperDriver scrapes text from a webpage using the Trafilatura library.

Example of using TrafilaturaWebScraperDriver directly:

from griptape.drivers.web_scraper.trafilatura import TrafilaturaWebScraperDriver

driver = TrafilaturaWebScraperDriver()

driver.scrape_url("https://griptape.ai")