Docx File

Load data from DOCX files.

Microsoft Word Document (DOCX) is a widely used document format for creating and editing text documents. This module provides functionality to load and process DOCX files within your workflow.

This module provides a comprehensive DOCX document loader that can:

Load single or multiple DOCX files
Support both base64-encoded files and files from storage
Extract text content with metadata
Integrate with text splitters for content processing
Handle custom metadata management

Inputs

DOCX File: The DOCX file(s) to process (.docx extension required)
Text Splitter (optional): A text splitter to process the extracted content
Additional Metadata (optional): JSON object with additional metadata to add to documents
Omit Metadata Keys (optional): Comma-separated list of metadata keys to omit from the default metadata

Outputs

Document: Array of document objects containing metadata and pageContent
Text: Concatenated string from pageContent of all documents

Features

Multiple file processing support
Flexible text splitting options
Customizable metadata handling
Storage integration support
Base64 and blob handling capabilities

PreviousDocument Store NextEpub File

Last updated 1 month ago