Universal encoding detector
Project Links
Meta
Author: Mark Pilgrim
Classifiers
chardet guesses the encoding of text files.
Detects…
ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
EUC-KR, ISO-2022-KR (Korean)
KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
ISO-8859-2, windows-1250 (Hungarian)
ISO-8859-5, windows-1251 (Bulgarian)
windows-1252 (English)
ISO-8859-7, windows-1253 (Greek)
ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
TIS-620 (Thai)
Requires Python 2.1 or later.
Command-line Tool
chardet comes with a command-line script which reports on the encodings of one or more files:
% chardetect.py somefile someotherfile somefile: windows-1252 with confidence 0.5 someotherfile: ascii with confidence 1.0
Aug 01, 2023
5.2.0
Dec 01, 2022
5.1.0
Jun 25, 2022
5.0.0
Dec 10, 2020
4.0.0
Jun 08, 2017
3.0.4
May 16, 2017
3.0.3
Apr 12, 2017
3.0.2
Apr 11, 2017
3.0.1
Apr 11, 2017
3.0.0
Oct 07, 2014
2.3.0
Dec 18, 2013
2.2.1
Oct 01, 2012
2.1.1
Jul 27, 2012
1.1
Apr 19, 2008
1.0.1
Dec 23, 2006
1.0
Files in release
No dependencies