License
- OSI Approved :: MIT License
Development Status
- 5 - Production/Stable
Intended Audience
- Developers
Operating System
- MacOS
- POSIX
- Unix
- Microsoft :: Windows
Programming Language
- Python :: 3
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
- Python :: 3.13
- Python :: 3.14
- C++
- C
Topic
- Software Development :: Libraries
- Software Development :: Libraries :: Python Modules
- System :: Archiving
indexed_zstd
A Python module for fast random access to zstd-compressed files without full decompression.
IndexedZstdFile implements Python's io.BufferedReader interface, so it works as a drop-in replacement for open() on .zst files — supporting seek(), read(), readline(), tell(), and context managers.
Under the hood it uses libzstd-seek to build a jump table of frame boundaries, enabling O(1) seeking to any position in multi-frame archives.
This project is based on indexed_bzip2 to target zstd specifically.
How it works
Zstd files can contain multiple independently compressed frames. indexed_zstd scans frame boundaries on first access and builds an in-memory jump table that maps uncompressed offsets to compressed positions. When you seek(), only the relevant frame is decompressed.
Seeking within a frame is emulated by decompressing from the frame start, so the more frames your archive has, the faster random access will be.
To create multi-frame archives use t2sz or the --stream-size option of the zstd CLI.
Installation
pip (recommended)
Pre-built wheels are available for Linux, macOS, and Windows:
pip install indexed-zstd
If no wheel is available for your platform, pip will build from source automatically.
In that case you need zstd development headers and a C++17 compiler:
# Debian/Ubuntu
sudo apt install libzstd-dev
# macOS
brew install zstd
conda
conda install -c conda-forge indexed_zstd
Arch Linux (AUR)
yay -S python-indexed-zstd
Usage
Basic random access
from indexed_zstd import IndexedZstdFile
with IndexedZstdFile("example.zst") as f:
f.seek(1024)
data = f.read(256)
print(f.tell()) # 1280
print(f.seekable()) # True
Reading line by line
from indexed_zstd import IndexedZstdFile
with IndexedZstdFile("logfile.zst") as f:
for line in f:
if b"ERROR" in line:
print(line.decode())
Opening by file descriptor
import os
from indexed_zstd import IndexedZstdFile
fd = os.open("example.zst", os.O_RDONLY)
with IndexedZstdFile(fd) as f:
data = f.read()
Inspecting frame structure
from indexed_zstd import IndexedZstdFile
with IndexedZstdFile("example.zst") as f:
print(f.size()) # uncompressed size in bytes
print(f.number_of_frames()) # number of zstd frames
print(f.is_multiframe()) # True if more than one frame
print(f.block_offsets()) # {compressed_offset: uncompressed_offset, ...}
API reference
IndexedZstdFile inherits from io.BufferedReader and adds:
| Method | Description |
|---|---|
size() |
Uncompressed file size in bytes |
number_of_frames() |
Total number of zstd frames |
is_multiframe() |
True if the file contains more than one frame |
block_offsets() |
dict mapping compressed offsets to uncompressed offsets |
available_block_offsets() |
Same as block_offsets(), but returns only the offsets discovered so far |
set_block_offsets(offsets) |
Manually set the jump table from a dict |
block_offsets_complete() |
True if the jump table has been fully built |
tell_compressed() |
Current position in the compressed stream |
All standard io.BufferedReader methods are available: read(), readline(), readlines(), seek(), tell(), seekable(), readable(), fileno(), close(), etc.
Testing
The test suite requires gen_seekable (built from the bundled submodule) and covers API, error paths, round-trip, reference, and heavy-data scenarios.
# Build gen_seekable from the submodule
cmake -S indexed_zstd/libzstd-seek -B indexed_zstd/libzstd-seek/build -DBUILD_TESTS=ON
cmake --build indexed_zstd/libzstd-seek/build --target gen_seekable
# Add it to PATH
export PATH="$PWD/indexed_zstd/libzstd-seek/build/tests:$PATH"
# Run the standard test suite (111 tests)
python -m pytest tests/ -v -m "not heavy and not reference"
Additional test categories (optional):
| Marker | Requirements | Description |
|---|---|---|
reference |
zstd CLI in PATH |
Compares library output against zstd -d |
heavy |
t2sz in PATH |
Large realistic data tests (10-50 MB) |
# Run everything including reference and heavy tests
python -m pytest tests/ -v
Building from source
Requires a C++17 compiler, Cython, and platform-specific zstd libraries.
# Clone with submodules (includes libzstd-seek)
git clone --recurse-submodules https://github.com/martinellimarco/indexed_zstd.git
cd indexed_zstd
pip install cython setuptools
Linux
sudo apt install libzstd-dev # Debian/Ubuntu
pip install .
macOS
brew install zstd
pip install .
Windows
Requires Visual Studio Build Tools with the C++ workload.
python libzstd/_get_zstd.py # downloads zstd headers and DLL
pip install .