File
Last updated
Last updated
The File Loader is a versatile document loader that supports multiple file formats including TXT, JSON, CSV, DOCX, PDF, Excel, PowerPoint, and more. This module provides a unified interface for loading and processing various file types.
This module provides a sophisticated file loader that can:
Load multiple file formats
Support both base64-encoded files and files from storage
Handle PDF-specific processing options
Process JSON and JSONL with pointer extraction
Support text splitting
Customize metadata extraction
Handle file storage integration
File: The file(s) to process (supports multiple formats)
Text Splitter: A text splitter to process the extracted content
PDF Usage: Choose between:
One document per page
One document per file
Use Legacy Build: Use legacy build for PDF compatibility issues
JSONL Pointer Extraction: Pointer name for JSONL files
Additional Metadata: JSON object with additional metadata
Omit Metadata Keys: Comma-separated list of metadata keys to omit
Document: Array of document objects containing metadata and pageContent
Text: Concatenated string from pageContent of documents
Text Files (.txt)
JSON Files (.json)
JSONL Files (.jsonl)
CSV Files (.csv)
PDF Files (.pdf)
Word Documents (.docx)
Excel Files (.xlsx, .xls)
PowerPoint Files (.pptx, .ppt)
And more...
Multi-format support
Storage integration
PDF processing options
JSON pointer extraction
Text splitting support
Metadata customization
Error handling
MIME type detection
Per-page splitting
Single document mode
Legacy build support
OCR compatibility
Pointer-based extraction
Structured data handling
Array processing
Nested object support
Automatically detects file type
Handles multiple files simultaneously
Supports file storage integration
Preserves file metadata
Handles large files efficiently
Error handling for invalid files
Memory-efficient processing