Project Links

Meta

Author: Dan Blanchard

Maintainer: Ian Cordasco

Requires Python: >=3.10

Classifiers

Development Status

5 - Production/Stable

Intended Audience

Developers

Operating System

OS Independent

Programming Language

Python
Python :: 3
Python :: 3.10
Python :: 3.11
Python :: 3.12
Python :: 3.13
Python :: 3.14
Python :: Implementation :: CPython
Python :: Implementation :: PyPy

Topic

Software Development :: Libraries :: Python Modules
Text Processing :: Linguistic

chardet

Universal character encoding detector.

chardet 7 is a ground-up, 0BSD-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x, just much faster and more accurate. Python 3.10+, zero runtime dependencies, works on PyPy.

Why chardet 7?

99.3% accuracy on 2,517 test files. 47x faster than chardet 6.0.0 and 1.5x faster than charset-normalizer 3.4.6. Language detection for every result. MIME type detection for binary files. 0BSD licensed.

	chardet 7.4.0 (mypyc)	chardet 6.0.0	charset-normalizer 3.4.6
Accuracy (2,517 files)	99.3%	88.2%	85.4%
Speed	551 files/s	12 files/s	376 files/s
Language detection	95.7%	40.0%	59.2%
Peak memory	52.9 MiB	29.5 MiB	78.8 MiB
Streaming detection	yes	yes	no
Encoding era filtering	yes	no	no
Encoding filters	yes	no	yes
MIME type detection	yes	no	no
Supported encodings	99	84	99
License	0BSD	LGPL	MIT

Installation

pip install chardet

Quick Start

import chardet

chardet.detect(b"Python is a great programming language for beginners and experts alike.")
# {'encoding': 'ascii', 'confidence': 1.0, 'language': 'en', 'mime_type': 'text/plain'}

# UTF-8 English with accented characters
chardet.detect("The naïve approach doesn't always work in complex systems.".encode("utf-8"))
# {'encoding': 'utf-8', 'confidence': 0.84, 'language': 'en', 'mime_type': 'text/plain'}

# Japanese EUC-JP
chardet.detect("日本語の文字コード検出テストです。このテキストはEUC-JPでエンコードされています。正しく検出できるか確認します。".encode("euc-jp"))
# {'encoding': 'EUC-JP', 'confidence': 1.0, 'language': 'ja', 'mime_type': 'text/plain'}

# Get all candidate encodings ranked by confidence
text = "Le café est une boisson très populaire en France et dans le monde entier."
results = chardet.detect_all(text.encode("windows-1252"))
for r in results[:4]:
    print(r["encoding"], round(r["confidence"], 2))
# Windows-1252 0.32
# iso8859-15 0.32
# ISO-8859-1 0.32
# MacRoman 0.31

Streaming Detection

For large files or network streams, use UniversalDetector to feed data incrementally:

from chardet import UniversalDetector

detector = UniversalDetector()
with open("unknown.txt", "rb") as f:
    for line in f:
        detector.feed(line)
        if detector.done:
            break
result = detector.close()
print(result)

Encoding Era Filtering

Restrict detection to specific encoding eras to reduce false positives:

from chardet import detect_all
from chardet.enums import EncodingEra

data = "Москва является столицей Российской Федерации и крупнейшим городом страны.".encode("windows-1251")

# All encoding eras are considered by default — 4 candidates across eras
for r in detect_all(data):
    print(r["encoding"], round(r["confidence"], 2))
# Windows-1251 0.46
# MacCyrillic 0.42
# KZ1048 0.2
# ptcp154 0.2

# Restrict to modern web encodings — 1 confident result
for r in detect_all(data, encoding_era=EncodingEra.MODERN_WEB):
    print(r["encoding"], round(r["confidence"], 2))
# Windows-1251 0.46

Encoding Filters

Restrict detection to specific encodings, or exclude encodings you don't want:

# Only consider UTF-8 and Windows-1252
chardet.detect(data, include_encodings=["utf-8", "windows-1252"])

# Consider everything except EBCDIC
chardet.detect(data, exclude_encodings=["cp037", "cp500"])

CLI

chardetect somefile.txt
# somefile.txt: utf-8 with confidence 0.99

chardetect --minimal somefile.txt
# utf-8

# Include detected language
chardetect -l somefile.txt
# somefile.txt: utf-8 en (English) with confidence 0.99

# Only consider specific encodings
chardetect -i utf-8,windows-1252 somefile.txt
# somefile.txt: utf-8 with confidence 0.99

# Pipe from stdin
cat somefile.txt | chardetect
# stdin: utf-8 with confidence 0.99

What's New in chardet 7?

0BSD license (previous versions were LGPL)
Ground-up rewrite: 13-stage detection pipeline using BOM detection, magic number identification, structural probing, byte validity filtering, and bigram statistical models
47x faster than chardet 6.0.0 with mypyc, 1.5x faster than charset-normalizer 3.4.6
99.3% accuracy: +11.1pp vs chardet 6.0.0, +13.9pp vs charset-normalizer 3.4.6
Language detection: 95.7% accuracy across 49 languages, returned with every result
MIME type detection: identifies 40+ binary file formats (images, audio/video, archives, documents, executables, fonts) via magic number signatures, plus text/html, text/xml, and text/x-python for markup
Encoding filters: include_encodings and exclude_encodings parameters to restrict or exclude specific encodings from the candidate set
99 encodings: full coverage including EBCDIC, Mac, DOS, and Baltic/Central European families
Optional mypyc compilation: 1.67x additional speedup on CPython
Thread-safe: detect() and detect_all() are safe to call concurrently; scales on free-threaded Python
Same API: detect(), detect_all(), UniversalDetector, and the chardetect CLI all work as before

Documentation

Full documentation is available at chardet.readthedocs.io.

Project History

chardet was originally created by Mark Pilgrim in 2006 as a Python port of Mozilla's universal charset detection library. He released versions 1.0 (2006) and 1.0.1 (2008) on PyPI, then developed an unreleased Python 3 port (2.0.1) on Google Code. After Mark deleted his online accounts in 2011, the project was continued by David Cramer, Erik Rose, Toshio Kuratomi, Ian Cordasco, and Dan Blanchard.

In 2026, Dan Blanchard rewrote chardet using Claude, releasing chardet 7.0 under a new license. All releases after 7 are not derivative of the original chardet code, but are released under the same name to allow an easier transition for users who can immediately benefit from the speed and accuracy improvements. For historical preservation and to allow easier comparison with the other releases, Dan has restored Mark's lost commits to this repository in the history/pilgrim branch.

To see the full history from 2006 to present in git log, fetch the graft refs:

git fetch origin 'refs/replace/*:refs/replace/*'

License

7.4.0.post2 Mar 29, 2026

7.4.0.post1 Mar 26, 2026

7.0.0rc4 Mar 02, 2026

6.0.0.post1 Feb 22, 2026

Wheel compatibility matrix

Platform	CPython 3.10	CPython 3.11	CPython 3.12	CPython 3.13	CPython 3.14	CPython (additional flags: t) 3.14	Python 3
any
macosx_10_13_x86_64
macosx_10_15_x86_64
macosx_10_9_x86_64
macosx_11_0_arm64
manylinux2014_aarch64
manylinux2014_x86_64
manylinux_2_17_aarch64
manylinux_2_17_x86_64
manylinux_2_28_aarch64
manylinux_2_28_x86_64
manylinux_2_31_riscv64
manylinux_2_39_riscv64
win_amd64