The fastest semantic text chunking library
Project Links
Meta
Author: Bhavnick Minhas
Requires Python: >=3.8
Classifiers
Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
- OSI Approved :: Apache Software License
Programming Language
- Rust
- Python :: Implementation :: CPython
- Python :: 3
- Python :: 3.8
- Python :: 3.9
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
Topic
- Text Processing
chonkie-core
the fastest text chunking library โ up to 1 TB/s throughput
you know how every chunking library claims to be fast? yeah, we actually meant it.
chonkie-core splits text at semantic boundaries (periods, newlines, the usual suspects) and does it stupid fast. we're talking "chunk the entire english wikipedia in 120ms" fast.
want to know how? read the blog post where we nerd out about SIMD instructions and lookup tables.
๐ฆ installation
pip install chonkie-core
looking for rust or javascript?
๐ usage
from chonkie_core import Chunker
text = "Hello world. How are you? I'm fine.\nThanks for asking."
# with defaults (4KB chunks, split at \n . ?)
for chunk in Chunker(text):
print(bytes(chunk))
# with custom size
for chunk in Chunker(text, size=1024):
print(bytes(chunk))
# with custom delimiters
for chunk in Chunker(text, delimiters=".?!\n"):
print(bytes(chunk))
# with multi-byte pattern (e.g., metaspace โ for SentencePiece tokenizers)
for chunk in Chunker(text, pattern="โ", prefix=True):
print(bytes(chunk))
# with consecutive pattern handling (split at START of runs, not middle)
for chunk in Chunker("word next", pattern=" ", consecutive=True):
print(bytes(chunk))
# with forward fallback (search forward if no pattern in backward window)
for chunk in Chunker(text, pattern=" ", forward_fallback=True):
print(bytes(chunk))
# collect all chunks
chunks = list(Chunker(text))
chunks are returned as memoryview objects (zero-copy slices of the original text).
๐ citation
if you use chonkie-core in your research, please cite it as follows:
@software{chunk2025,
author = {Minhas, Bhavnick},
title = {chunk: The fastest text chunking library},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/chonkie-inc/chunk}},
}
๐ license
licensed under either of Apache License, Version 2.0 or MIT license at your option.
0.10.1
Mar 30, 2026
0.10.0
Mar 29, 2026
0.9.2
Jan 21, 2026
0.9.1
Jan 21, 2026
0.9.0
Jan 21, 2026
0.8.0
Jan 21, 2026
0.7.0
Jan 21, 2026
0.5.0
Jan 21, 2026
Wheel compatibility matrix
Files in release
chonkie_core-0.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (368.4KiB)
chonkie_core-0.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385.6KiB)
chonkie_core-0.10.1-cp310-cp310-win_amd64.whl (225.5KiB)
chonkie_core-0.10.1-cp311-cp311-macosx_10_12_x86_64.whl (346.5KiB)
chonkie_core-0.10.1-cp311-cp311-macosx_11_0_arm64.whl (334.3KiB)
chonkie_core-0.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (368.2KiB)
chonkie_core-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385.5KiB)
chonkie_core-0.10.1-cp311-cp311-win_amd64.whl (225.8KiB)
chonkie_core-0.10.1-cp312-cp312-macosx_10_12_x86_64.whl (344.6KiB)
chonkie_core-0.10.1-cp312-cp312-macosx_11_0_arm64.whl (331.9KiB)
chonkie_core-0.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (364.4KiB)
chonkie_core-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (383.1KiB)
chonkie_core-0.10.1-cp312-cp312-win_amd64.whl (223.7KiB)
chonkie_core-0.10.1-cp313-cp313-macosx_10_12_x86_64.whl (344.1KiB)
chonkie_core-0.10.1-cp313-cp313-macosx_11_0_arm64.whl (331.6KiB)
chonkie_core-0.10.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (364.0KiB)
chonkie_core-0.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (382.4KiB)
chonkie_core-0.10.1-cp313-cp313-win_amd64.whl (224.0KiB)
chonkie_core-0.10.1-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (363.7KiB)
chonkie_core-0.10.1-cp314-cp314-macosx_10_12_x86_64.whl (344.5KiB)
chonkie_core-0.10.1-cp314-cp314-macosx_11_0_arm64.whl (331.8KiB)
chonkie_core-0.10.1-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (364.2KiB)
chonkie_core-0.10.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (382.0KiB)
chonkie_core-0.10.1-cp314-cp314-win_amd64.whl (223.6KiB)
chonkie_core-0.10.1-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (363.1KiB)
chonkie_core-0.10.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (370.9KiB)
chonkie_core-0.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (387.4KiB)
chonkie_core-0.10.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (370.8KiB)
chonkie_core-0.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (387.4KiB)
chonkie_core-0.10.1-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (369.8KiB)
chonkie_core-0.10.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (386.9KiB)
chonkie_core-0.10.1.tar.gz (52.1KiB)
Extras:
None
Dependencies:
numpy
(>=1.20)