FireCrawl
Load data from URL using FireCrawl.
FireCrawl

FireCrawl Document Loader
FireCrawl is a powerful web crawling and scraping service that provides advanced capabilities for extracting content from websites. This module enables loading and processing web content through the FireCrawl API.
This module provides a sophisticated web crawler that can:
Scrape single web pages
Crawl entire websites
Extract structured data
Handle JavaScript-rendered content
Process content with text splitters
Customize metadata extraction
Support multiple operation modes
Inputs
Required Parameters
URL: The webpage or website URL to process
Connect Credential: FireCrawl API credentials
Mode: Choose between:
Scrape: Single page extraction
Crawl: Multi-page website crawling
Extract: Structured data extraction
Optional Parameters
Text Splitter: A text splitter to process the extracted content
Scrape Options:
Include Tags: HTML tags to include
Exclude Tags: HTML tags to exclude
Mobile: Use mobile user agent
Skip TLS Verification: Bypass SSL checks
Timeout: Request timeout
Additional Metadata: JSON object with additional metadata
Omit Metadata Keys: Comma-separated list of metadata keys to omit
Outputs
Document: Array of document objects containing metadata and pageContent
Text: Concatenated string from pageContent of documents
Features
Multiple operation modes
Advanced scraping options
Structured data extraction
JavaScript rendering
Mobile device emulation
Custom timeout settings
Error handling
Operation Modes
Scrape Mode
Single page processing
Main content extraction
Format selection
Custom tag filtering
Crawl Mode
Multi-page crawling
Subdomain handling
Sitemap processing
Link extraction
Extract Mode
Structured data extraction
Schema-based parsing
LLM-powered extraction
Custom extraction prompts
Document Structure
Each document contains:
pageContent: Extracted content in markdown format
metadata:
title: Page title
description: Meta description
language: Content language
sourceURL: Original URL
Additional custom metadata
Notes
Requires valid FireCrawl API key
Supports multiple content formats
Handles rate limiting
Job status monitoring
Error handling and retries
Customizable request options
Memory-efficient processing
Last updated