chonkie-core 0.10.1


pip install chonkie-core

  Latest version

Released: Mar 30, 2026


Meta
Author: Bhavnick Minhas
Requires Python: >=3.8

Classifiers

Development Status
  • 4 - Beta

Intended Audience
  • Developers

License
  • OSI Approved :: MIT License
  • OSI Approved :: Apache Software License

Programming Language
  • Rust
  • Python :: Implementation :: CPython
  • Python :: 3
  • Python :: 3.8
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12

Topic
  • Text Processing

chonkie-core

chonkie-core

the fastest text chunking library โ€” up to 1 TB/s throughput

crates.io PyPI npm GitHub License


you know how every chunking library claims to be fast? yeah, we actually meant it.

chonkie-core splits text at semantic boundaries (periods, newlines, the usual suspects) and does it stupid fast. we're talking "chunk the entire english wikipedia in 120ms" fast.

want to know how? read the blog post where we nerd out about SIMD instructions and lookup tables.

๐Ÿ“ฆ installation

pip install chonkie-core

looking for rust or javascript?

๐Ÿš€ usage

from chonkie_core import Chunker

text = "Hello world. How are you? I'm fine.\nThanks for asking."

# with defaults (4KB chunks, split at \n . ?)
for chunk in Chunker(text):
    print(bytes(chunk))

# with custom size
for chunk in Chunker(text, size=1024):
    print(bytes(chunk))

# with custom delimiters
for chunk in Chunker(text, delimiters=".?!\n"):
    print(bytes(chunk))

# with multi-byte pattern (e.g., metaspace โ– for SentencePiece tokenizers)
for chunk in Chunker(text, pattern="โ–", prefix=True):
    print(bytes(chunk))

# with consecutive pattern handling (split at START of runs, not middle)
for chunk in Chunker("word   next", pattern=" ", consecutive=True):
    print(bytes(chunk))

# with forward fallback (search forward if no pattern in backward window)
for chunk in Chunker(text, pattern=" ", forward_fallback=True):
    print(bytes(chunk))

# collect all chunks
chunks = list(Chunker(text))

chunks are returned as memoryview objects (zero-copy slices of the original text).

๐Ÿ“ citation

if you use chonkie-core in your research, please cite it as follows:

@software{chunk2025,
  author = {Minhas, Bhavnick},
  title = {chunk: The fastest text chunking library},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/chonkie-inc/chunk}},
}

๐Ÿ“„ license

licensed under either of Apache License, Version 2.0 or MIT license at your option.

Wheel compatibility matrix

Platform CPython 3.8 CPython 3.9 CPython 3.10 CPython 3.11 CPython 3.12 CPython 3.13 CPython 3.14 CPython (additional flags: t) 3.13 CPython (additional flags: t) 3.14 PyPy 3.11 (pp73)
macosx_10_12_x86_64
macosx_11_0_arm64
manylinux2014_aarch64
manylinux2014_x86_64
manylinux_2_17_aarch64
manylinux_2_17_x86_64
win_amd64

Files in release

chonkie_core-0.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (368.4KiB)
chonkie_core-0.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385.6KiB)
chonkie_core-0.10.1-cp310-cp310-win_amd64.whl (225.5KiB)
chonkie_core-0.10.1-cp311-cp311-macosx_10_12_x86_64.whl (346.5KiB)
chonkie_core-0.10.1-cp311-cp311-macosx_11_0_arm64.whl (334.3KiB)
chonkie_core-0.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (368.2KiB)
chonkie_core-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385.5KiB)
chonkie_core-0.10.1-cp311-cp311-win_amd64.whl (225.8KiB)
chonkie_core-0.10.1-cp312-cp312-macosx_10_12_x86_64.whl (344.6KiB)
chonkie_core-0.10.1-cp312-cp312-macosx_11_0_arm64.whl (331.9KiB)
chonkie_core-0.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (364.4KiB)
chonkie_core-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (383.1KiB)
chonkie_core-0.10.1-cp312-cp312-win_amd64.whl (223.7KiB)
chonkie_core-0.10.1-cp313-cp313-macosx_10_12_x86_64.whl (344.1KiB)
chonkie_core-0.10.1-cp313-cp313-macosx_11_0_arm64.whl (331.6KiB)
chonkie_core-0.10.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (364.0KiB)
chonkie_core-0.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (382.4KiB)
chonkie_core-0.10.1-cp313-cp313-win_amd64.whl (224.0KiB)
chonkie_core-0.10.1-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (363.7KiB)
chonkie_core-0.10.1-cp314-cp314-macosx_10_12_x86_64.whl (344.5KiB)
chonkie_core-0.10.1-cp314-cp314-macosx_11_0_arm64.whl (331.8KiB)
chonkie_core-0.10.1-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (364.2KiB)
chonkie_core-0.10.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (382.0KiB)
chonkie_core-0.10.1-cp314-cp314-win_amd64.whl (223.6KiB)
chonkie_core-0.10.1-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (363.1KiB)
chonkie_core-0.10.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (370.9KiB)
chonkie_core-0.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (387.4KiB)
chonkie_core-0.10.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (370.8KiB)
chonkie_core-0.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (387.4KiB)
chonkie_core-0.10.1-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (369.8KiB)
chonkie_core-0.10.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (386.9KiB)
chonkie_core-0.10.1.tar.gz (52.1KiB)
Extras: None
Dependencies:
numpy (>=1.20)