Spider - web search & crawler
Scrape & Crawl the web with Spider - the fastest open source web scraper & crawler.

Spider is the fastest open source web scraper & crawler that returns LLM-ready data. To get started using this node you need an API key from Spider.cloud.
Get Started
Go to the Spider.cloud website and sign up for a free account.
Then go to the API Keys and create a new API key.
Copy the API key and paste it into the "Credential" field in the Spider node.
Features
Two operation modes: Scrape and Crawl
Text splitting capabilities
Customizable metadata handling
Flexible parameter configuration
Multiple output formats
Markdown-formatted content
Rate limit handling
Inputs
Required Parameters
Mode: Choose between:
Scrape: Extract data from a single page
Crawl: Extract data from multiple pages within the same domain
Web Page URL: The target URL to scrape or crawl (e.g., https://spider.cloud)
Credential: Spider API key
Optional Parameters
Text Splitter: A text splitter to process the extracted content
Limit: Maximum number of pages to crawl (default: 25, only applicable in crawl mode)
Additional Metadata: JSON object with additional metadata to add to documents
Additional Parameters: JSON object with Spider API parameters
Example:
{ "anti_bot": true }Note:
return_formatis always set to "markdown"
Omit Metadata Keys: Comma-separated list of metadata keys to exclude
Format:
key1, key2, key3.nestedKey1Use * to remove all default metadata
Outputs
Document: Array of document objects containing:
metadata: Page metadata and custom fields
pageContent: Extracted content in markdown format
Text: Concatenated string of all extracted content
Document Structure
Each document contains:
pageContent: The main content from the webpage in markdown format
metadata:
source: The URL of the page
Additional custom metadata (if specified)
Filtered metadata (based on omitted keys)
Usage Examples
Basic Scraping
Advanced Crawling
Example

Notes
The crawler respects the specified limit for crawl operations
All content is returned in markdown format
Error handling is built-in for both scraping and crawling operations
Invalid JSON configurations are handled gracefully
Memory-efficient processing of large websites
Supports both single-page and multi-page extraction
Automatic metadata handling and filtering
Last updated