Apify Website Content Crawler

Load data from Apify Website Content Crawler.

Apify is a web scraping and data extraction platform that provides an app store with more than a thousand ready-made cloud tools called Actors.

The Website Content Crawler Actor can deeply crawl websites, clean their HTML by removing a cookies modals, footers, or navigation, and then transform the HTML into Markdown. This Markdown can then be stored in a vector database for semantic search or Retrieval-Augmented Generation (RAG).

Crawl Entire Website

  1. (Optional) Connect Text Splitter.

  2. Connect Apify API (create a new credential with your Apify API token).

  3. Input one or more URLs (separated by commas) where the crawler will start, e.g https://docs.flowiseai.com/.

  4. (Optional) Specify additional parameters such as maximum crawling depth and the maximum number of pages to crawl.

Output

Loads website content as a Document.

Resources

Last updated