llama-index readers file integration
Project Links
Meta
Author: Your Name
Maintainer: FarisHijazi, Haowjy, ephe-meral, hursh-desai, iamarunbrahma, jon-chuang, mmaatouk, ravi03071991, sangwongenip, thejessezhang
Requires Python: <4.0,>=3.9
Classifiers
LlamaIndex Readers Integration: File
pip install llama-index-readers-file
This is the default integration for different loaders that are used within SimpleDirectoryReader.
Provides support for the following loaders:
- DocxReader
- HWPReader
- PDFReader
- EpubReader
- FlatReader
- HTMLTagReader
- ImageCaptionReader
- ImageReader
- ImageVisionLLMReader
- IPYNBReader
- MarkdownReader
- MboxReader
- PptxReader
- PandasCSVReader
- VideoAudioReader
- UnstructuredReader
- PyMuPDFReader
- ImageTabularChartReader
- XMLReader
- PagedCSVReader
- CSVReader
- RTFReader
Installation
pip install llama-index-readers-file
Usage
Once installed, You can import any of the loader. Here's an example usage of one of the loader.
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import (
DocxReader,
HWPReader,
PDFReader,
EpubReader,
FlatReader,
HTMLTagReader,
ImageCaptionReader,
ImageReader,
ImageVisionLLMReader,
IPYNBReader,
MarkdownReader,
MboxReader,
PptxReader,
PandasCSVReader,
VideoAudioReader,
UnstructuredReader,
PyMuPDFReader,
ImageTabularChartReader,
XMLReader,
PagedCSVReader,
CSVReader,
RTFReader,
)
# PDF Reader with `SimpleDirectoryReader`
parser = PDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Docx Reader example
parser = DocxReader()
file_extractor = {".docx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# HWP Reader example
parser = HWPReader()
file_extractor = {".hwp": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Epub Reader example
parser = EpubReader()
file_extractor = {".epub": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Flat Reader example
parser = FlatReader()
file_extractor = {".txt": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# HTML Tag Reader example
parser = HTMLTagReader()
file_extractor = {".html": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Image Reader example
parser = ImageReader()
file_extractor = {
".jpg": parser,
".jpeg": parser,
".png": parser,
} # Add other image formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# IPYNB Reader example
parser = IPYNBReader()
file_extractor = {".ipynb": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Markdown Reader example
parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Mbox Reader example
parser = MboxReader()
file_extractor = {".mbox": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Pptx Reader example
# Basic usage - extracts text, tables, charts, and speaker notes
parser = PptxReader()
# Advanced usage - control parsing behavior
parser = PptxReader(
extract_images=True, # Enable image captioning
context_consolidation_with_llm=True, # Use LLM for content synthesis
num_workers=4, # Parallel processing
batch_size=10, # Slides processed per worker batch
raise_on_error=True, # Raise value error if file_parsing is not successful
)
file_extractor = {".pptx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Pandas CSV Reader example
parser = PandasCSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# PyMuPDF Reader example
parser = PyMuPDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# XML Reader example
parser = XMLReader()
file_extractor = {".xml": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# Paged CSV Reader example
parser = PagedCSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
# CSV Reader example
parser = CSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
This loader is designed to be used as a way to load data into LlamaIndex.
0.5.4
Sep 08, 2025
0.5.3
Sep 04, 2025
0.5.2
Aug 16, 2025
0.5.1
Aug 12, 2025
0.5.0
Jul 30, 2025
0.4.11
Jul 07, 2025
0.4.9
Jun 05, 2025
0.4.8
May 27, 2025
0.4.7
Mar 24, 2025
0.4.6
Mar 04, 2025
0.4.5
Feb 11, 2025
0.4.4
Jan 23, 2025
0.4.3
Jan 10, 2025
0.4.2
Jan 03, 2025
0.4.1
Dec 05, 2024
0.4.0
Nov 18, 2024
0.3.0
Nov 11, 2024
0.2.2
Sep 19, 2024
0.2.1
Sep 05, 2024
0.2.0
Aug 22, 2024
0.1.33
Aug 09, 2024
0.1.32
Jul 29, 2024
0.1.31
Jul 25, 2024
0.1.30
Jul 11, 2024
0.1.29
Jul 08, 2024
0.1.28
Jul 08, 2024
0.1.27
Jul 04, 2024
0.1.26
Jul 02, 2024
0.1.25
Jun 12, 2024
0.1.23
May 23, 2024
0.1.22
May 08, 2024
0.1.21
May 07, 2024
0.1.20
May 02, 2024
0.1.19
Apr 15, 2024
0.1.18
Apr 14, 2024
0.1.17
Apr 13, 2024
0.1.16
Apr 09, 2024
0.1.15
Apr 08, 2024
0.1.14
Apr 08, 2024
0.1.13
Mar 29, 2024
0.1.12
Mar 22, 2024
0.1.11
Mar 14, 2024
0.1.9
Mar 09, 2024
0.1.8
Mar 06, 2024
0.1.7
Mar 05, 2024
0.1.6
Feb 26, 2024
0.1.5
Feb 21, 2024
0.1.4
Feb 20, 2024
0.1.3
Feb 13, 2024
0.1.2
Feb 13, 2024
0.1.1
Feb 12, 2024
0.1.0
Feb 10, 2024
0.0.8
Feb 08, 2024
0.0.7
Feb 08, 2024
0.0.6
Feb 05, 2024
0.0.5
Feb 05, 2024
0.0.4
Feb 05, 2024
0.0.3
Feb 05, 2024
0.0.2
Feb 05, 2024
0.0.2a0
Feb 05, 2024
0.0.1
Feb 04, 2024