GitBook

Load data from GitBook.

GitBook

GitBook Node

GitBook Document Loader

GitBook is a modern documentation platform that helps teams share knowledge. This module provides functionality to load and process content from GitBook documentation sites.

This module provides a sophisticated GitBook document loader that can:

  • Load content from specific GitBook pages

  • Crawl entire GitBook documentation sites

  • Extract structured content

  • Process content with text splitters

  • Customize metadata extraction

  • Handle recursive page loading

Inputs

Required Parameters

  • Web Path: The URL to the GitBook page or root path

    • Single page: e.g., https://docs.gitbook.com/product-tour/navigation

    • Root path: e.g., https://docs.gitbook.com/

Optional Parameters

  • Should Load All Paths: Whether to recursively load all pages from the root path

  • Text Splitter: A text splitter to process the extracted content

  • Additional Metadata: JSON object with additional metadata

  • Omit Metadata Keys: Comma-separated list of metadata keys to omit

Outputs

  • Document: Array of document objects containing metadata and pageContent

  • Text: Concatenated string from pageContent of documents

Features

  • Single page loading

  • Recursive site crawling

  • Content extraction

  • Text splitting support

  • Metadata customization

  • Error handling

  • Path management

Loading Modes

Single Page Mode

  • Loads content from a specific page

  • Extracts page content and metadata

  • Preserves page structure

  • Faster for single page access

All Paths Mode

  • Recursively loads all pages from root

  • Maintains site hierarchy

  • Extracts all available content

  • Preserves navigation structure

Document Structure

Each document contains:

  • pageContent: Extracted content from the page

  • metadata:

    • title: Page title

    • url: Original page URL

    • Additional custom metadata

Notes

  • Supports both single page and full site loading

  • Handles GitBook's dynamic content

  • Preserves document structure

  • Supports custom metadata addition

  • Error handling for invalid URLs

  • Memory-efficient processing

  • Flexible output formats

Last updated