chardet · Python Simple Repository Browser

Project Links

Meta

Author: Mark Pilgrim

Maintainer: Daniel Blanchard

Requires Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*

Classifiers

Development Status

5 - Production/Stable

Intended Audience

Developers

License

OSI Approved :: GNU Library or Lesser General Public License (LGPL)

Operating System

OS Independent

Programming Language

Python
Python :: 2
Python :: 2.7
Python :: 3
Python :: 3.5
Python :: 3.6
Python :: 3.7
Python :: 3.8
Python :: 3.9
Python :: Implementation :: CPython
Python :: Implementation :: PyPy

Topic

Software Development :: Libraries :: Python Modules
Text Processing :: Linguistic

Chardet: The Universal Character Encoding Detector

https://img.shields.io/coveralls/chardet/chardet/stable.svg

Detects

ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
EUC-KR, ISO-2022-KR (Korean)
KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
ISO-8859-5, windows-1251 (Bulgarian)
ISO-8859-1, windows-1252 (Western European languages)
ISO-8859-7, windows-1253 (Greek)
ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
TIS-620 (Thai)

Requires Python 2.7 or 3.5+.

Installation

Install from PyPI:

pip install chardet

Documentation

For users, docs are now available at https://chardet.readthedocs.io/.

Command-line Tool

chardet comes with a command-line script which reports on the encodings of one or more files:

% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

About

This is a continuation of Mark Pilgrim’s excellent chardet. Previously, two versions needed to be maintained: one that supported python 2.x and one that supported python 3.x. We’ve recently merged with Ian Cordasco’s charade fork, so now we have one coherent version that works for Python 2.7+ and 3.4+.