Docx File
Load data from DOCX files.
Last updated
Load data from DOCX files.
Last updated
Microsoft Word Document (DOCX) is a widely used document format for creating and editing text documents. This module provides functionality to load and process DOCX files within your workflow.
This module provides a comprehensive DOCX document loader that can:
Load single or multiple DOCX files
Support both base64-encoded files and files from storage
Extract text content with metadata
Integrate with text splitters for content processing
Handle custom metadata management
DOCX File: The DOCX file(s) to process (.docx extension required)
Text Splitter (optional): A text splitter to process the extracted content
Additional Metadata (optional): JSON object with additional metadata to add to documents
Omit Metadata Keys (optional): Comma-separated list of metadata keys to omit from the default metadata
Document: Array of document objects containing metadata and pageContent
Text: Concatenated string from pageContent of all documents
Multiple file processing support
Flexible text splitting options
Customizable metadata handling
Storage integration support
Base64 and blob handling capabilities