Epub File

EPUB (Electronic Publication) is a free and open e-book standard by the International Digital Publishing Forum (IDPF). This module provides functionality to load and process EPUB files within your workflow.

This module provides a sophisticated EPUB document loader that can:

  • Load single or multiple EPUB files

  • Support both base64-encoded files and files from storage

  • Extract content per chapter or per file

  • Process content with text splitters

  • Handle metadata extraction

  • Manage temporary file processing

Inputs

Required Parameters

  • EPUB File: The EPUB file(s) to process (.epub extension required)

  • Usage: Choose between:

    • One document per chapter: Split content by chapters

    • One document per file: Process entire file as one document

Optional Parameters

  • Text Splitter: A text splitter to process the extracted content

  • Additional Metadata: JSON object with additional metadata

  • Omit Metadata Keys: Comma-separated list of metadata keys to omit

Outputs

  • Document: Array of document objects containing metadata and pageContent

  • Text: Concatenated string from pageContent of documents

Features

  • Multiple file processing

  • Chapter-level splitting

  • File-level processing

  • Storage integration

  • Metadata customization

  • Text splitting support

  • Temporary file handling

  • Error handling

Processing Modes

Per Chapter Mode

  • Creates separate documents for each chapter

  • Maintains chapter structure

  • Preserves chapter metadata

  • Better for detailed analysis

Per File Mode

  • Processes entire file as one document

  • Maintains overall structure

  • Simpler document organization

  • Better for overview analysis

Notes

  • Supports both local and storage-based files

  • Handles base64 encoded content

  • Automatically cleans up temporary files

  • Preserves document structure

  • Supports custom metadata addition

  • Error handling for invalid files

  • Memory-efficient processing

Last updated