# Microsoft Word

<figure><img src="/files/EaS3bHpmn4BTnznGeIjH" alt="" width="234"><figcaption></figcaption></figure>

Microsoft Word is a word processing software for creating and editing text documents. This module provides functionality to load and process Word documents using officeparser.

This module provides a sophisticated Word document loader that can:

* Load Word documents
* Extract text content
* Split content into sections
* Handle page numbering
* Process metadata per section
* Support multiple section formats
* Handle various section separators

## Inputs

### Required Parameters

* **Word File**: The Word file(s) to process (.doc, .docx)

### Optional Parameters

* **Text Splitter**: A text splitter to process the extracted content
* **Additional Metadata**: JSON object with additional metadata
* **Omit Metadata Keys**: Comma-separated list of metadata keys to omit

## Outputs

* **Document**: Array of document objects containing metadata and pageContent
* **Text**: Concatenated string from pageContent of documents

## Features

* Text extraction
* Section separation
* Metadata handling
* Error handling
* Memory-efficient processing
* Heuristic section detection
* Content filtering

## Section Detection Methods

### Pattern Recognition

The loader attempts to identify sections using common patterns:

* "Page X" markers
* "Section X" markers
* "Chapter X" markers
* Numbered sections (e.g., "1. ", "2. ")
* ALL CAPS headings
* Long underscore separators
* Long dash separators

### Fallback Mechanisms

If pattern recognition fails:

1. Split by multiple newlines
2. Split by double newlines
3. Treat content as single section

## Document Structure

Each document contains:

* **pageContent**: Extracted text content from the section
* **metadata**:
  * documentType: "word"
  * pageNumber: Sequential section number
  * Additional custom metadata

## Content Processing

* Empty sections are filtered out
* Leading/trailing whitespace removed
* Minimum content length validation
* Reasonable section count validation

## Metadata Attributes

Default attributes include:

* documentType: Type of document (string)
* pageCount: Number of pages/sections (number)
* Custom metadata from input

## Notes

* Uses officeparser for extraction
* Handles various document formats
* Intelligent section detection
* Content validation
* Memory-efficient processing
* Error handling for invalid files
* Flexible output formats
* Robust fallback mechanisms


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.flowiseai.com/integrations/langchain/document-loaders/microsoft-word.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
