cchardet 2.1.7


pip install cchardet

  Latest version

Released: Oct 27, 2020

Project Links

Meta
Author: PyYoshi

Classifiers

License
  • OSI Approved :: Mozilla Public License 1.1 (MPL 1.1)
  • OSI Approved :: GNU General Public License (GPL)
  • OSI Approved :: GNU Library or Lesser General Public License (LGPL)

Programming Language
  • Python :: 3
  • Python :: 3.6
  • Python :: 3.7
  • Python :: 3.8
  • Python :: 3.9

Topic
  • Software Development :: Libraries

cChardet

cChardet is high speed universal character encoding detector. - binding to uchardet.

PyPI version Build for Linux Build for macOS Build for Windows

Supported Languages/Encodings

  • International (Unicode)

    • UTF-8

    • UTF-16BE / UTF-16LE

    • UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431

  • Arabic

    • ISO-8859-6

    • WINDOWS-1256

  • Bulgarian

    • ISO-8859-5

    • WINDOWS-1251

  • Chinese

    • ISO-2022-CN

    • BIG5

    • EUC-TW

    • GB18030

    • HZ-GB-2312

  • Croatian:

    • ISO-8859-2

    • ISO-8859-13

    • ISO-8859-16

    • Windows-1250

    • IBM852

    • MAC-CENTRALEUROPE

  • Czech

    • Windows-1250

    • ISO-8859-2

    • IBM852

    • MAC-CENTRALEUROPE

  • Danish

    • ISO-8859-1

    • ISO-8859-15

    • WINDOWS-1252

  • English

    • ASCII

  • Esperanto

    • ISO-8859-3

  • Estonian

    • ISO-8859-4

    • ISO-8859-13

    • ISO-8859-13

    • Windows-1252

    • Windows-1257

  • Finnish

    • ISO-8859-1

    • ISO-8859-4

    • ISO-8859-9

    • ISO-8859-13

    • ISO-8859-15

    • WINDOWS-1252

  • French

    • ISO-8859-1

    • ISO-8859-15

    • WINDOWS-1252

  • German

    • ISO-8859-1

    • WINDOWS-1252

  • Greek

    • ISO-8859-7

    • WINDOWS-1253

  • Hebrew

    • ISO-8859-8

    • WINDOWS-1255

  • Hungarian:

    • ISO-8859-2

    • WINDOWS-1250

  • Irish Gaelic

    • ISO-8859-1

    • ISO-8859-9

    • ISO-8859-15

    • WINDOWS-1252

  • Italian

    • ISO-8859-1

    • ISO-8859-3

    • ISO-8859-9

    • ISO-8859-15

    • WINDOWS-1252

  • Japanese

    • ISO-2022-JP

    • SHIFT_JIS

    • EUC-JP

  • Korean

    • ISO-2022-KR

    • EUC-KR / UHC

  • Lithuanian

    • ISO-8859-4

    • ISO-8859-10

    • ISO-8859-13

  • Latvian

    • ISO-8859-4

    • ISO-8859-10

    • ISO-8859-13

  • Maltese

    • ISO-8859-3

  • Polish:

    • ISO-8859-2

    • ISO-8859-13

    • ISO-8859-16

    • Windows-1250

    • IBM852

    • MAC-CENTRALEUROPE

  • Portuguese

    • ISO-8859-1

    • ISO-8859-9

    • ISO-8859-15

    • WINDOWS-1252

  • Romanian:

    • ISO-8859-2

    • ISO-8859-16

    • Windows-1250

    • IBM852

  • Russian

    • ISO-8859-5

    • KOI8-R

    • WINDOWS-1251

    • MAC-CYRILLIC

    • IBM866

    • IBM855

  • Slovak

    • Windows-1250

    • ISO-8859-2

    • IBM852

    • MAC-CENTRALEUROPE

  • Slovene

    • ISO-8859-2

    • ISO-8859-16

    • Windows-1250

    • IBM852

    • M

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
    result = chardet.detect(msg)
    print(result)

Benchmark

$ cd src/
$ pip install chardet
$ python tests/bench.py

Results

CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz

RAM: DDR3 1600Mhz 16GB

Platform: Ubuntu 16.04 amd64

Python 3.6.1

Request (call/s)

chardet v3.0.2

0.35

cchardet v2.0.1

1467.77

LICENSE

See COPYING file.

Contact

Platform

Support

  • Windows i686, x86_64

  • Linux i686, x86_64

  • macOS x86_64

Do not Support

CHANGES

2.1.7 (2020-10-27)

  • support Python 3.9

  • drop support for Python 3.5

2.1.6 (2020-03-17)

  • drop support for Python 2.7

  • support Github Actions

  • update dev-dependencies

2.1.5 (2019-09-27)

  • update language models (uchardet)

  • add iso8859-2 test but disabled it

  • support Python 3.8

  • drop support for Python 3.4

2.1.4 (2018-09-27)

  • disable LTO because become poor performance

2.1.3 (2018-09-26)

  • support Python 3.7

2.1.2 (2018-09-26)

  • enable LTO for wheel builds

  • update Cython

2.1.1 (2017-07-01)

  • fix that different results with different chuck sizes

  • fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior

  • include COPYING in package

2.1.0 (2017-05-15)

2.0.1 (2017-04-25)

  • fix an issue where UTF-8 with a BOM would not be detected as UTF-8-SIG (fix #28)

  • pass NULL Byte to feed() / detect() (fix #27)

2.0.0 (2017-04-06)

  • Improve tests

2.0a4 (2017-04-05)

  • Update uchardet repo (Fix buffer overflow)

2.0a3 (2017-03-29)

  • Implement UniversalDetector (like chardet)

2.0a2 (2017-03-28)

  • Update uchardet repo (Fix memory leak)

2.0a1 (2017-03-28)

1.1.3 (2017-02-26)

  • Support AArch64

1.1.2 (2017-01-08)

  • Support Python 3.6

1.1.1 (2016-11-05)

  • Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)

  • Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)

  • Support manylinux1 wheel

1.1.0 (2016-10-17)

  • Add Detector class

  • Improve unit tests

Wheel compatibility matrix

Platform CPython 3.6 CPython 3.7 CPython 3.8 CPython 3.9
macosx_10_9_x86_64
manylinux1_i686
manylinux1_x86_64
manylinux2010_i686
manylinux2010_x86_64
win32
win_amd64

Files in release

cchardet-2.1.7-cp36-cp36m-macosx_10_9_x86_64.whl (121.2KiB)
cchardet-2.1.7-cp36-cp36m-manylinux1_i686.whl (248.4KiB)
cchardet-2.1.7-cp36-cp36m-manylinux1_x86_64.whl (257.1KiB)
cchardet-2.1.7-cp36-cp36m-manylinux2010_i686.whl (248.4KiB)
cchardet-2.1.7-cp36-cp36m-manylinux2010_x86_64.whl (257.1KiB)
cchardet-2.1.7-cp36-cp36m-win32.whl (109.0KiB)
cchardet-2.1.7-cp36-cp36m-win_amd64.whl (112.4KiB)
cchardet-2.1.7-cp37-cp37m-macosx_10_9_x86_64.whl (121.1KiB)
cchardet-2.1.7-cp37-cp37m-manylinux1_i686.whl (249.1KiB)
cchardet-2.1.7-cp37-cp37m-manylinux1_x86_64.whl (257.5KiB)
cchardet-2.1.7-cp37-cp37m-manylinux2010_i686.whl (249.1KiB)
cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl (257.5KiB)
cchardet-2.1.7-cp37-cp37m-win32.whl (108.9KiB)
cchardet-2.1.7-cp37-cp37m-win_amd64.whl (112.2KiB)
cchardet-2.1.7-cp38-cp38-macosx_10_9_x86_64.whl (121.2KiB)
cchardet-2.1.7-cp38-cp38-manylinux1_i686.whl (251.1KiB)
cchardet-2.1.7-cp38-cp38-manylinux1_x86_64.whl (259.8KiB)
cchardet-2.1.7-cp38-cp38-manylinux2010_i686.whl (251.1KiB)
cchardet-2.1.7-cp38-cp38-manylinux2010_x86_64.whl (259.8KiB)
cchardet-2.1.7-cp38-cp38-win32.whl (109.2KiB)
cchardet-2.1.7-cp38-cp38-win_amd64.whl (112.5KiB)
cchardet-2.1.7-cp39-cp39-macosx_10_9_x86_64.whl (121.4KiB)
cchardet-2.1.7-cp39-cp39-manylinux1_i686.whl (250.3KiB)
cchardet-2.1.7-cp39-cp39-manylinux1_x86_64.whl (259.2KiB)
cchardet-2.1.7-cp39-cp39-manylinux2010_i686.whl (250.3KiB)
cchardet-2.1.7-cp39-cp39-manylinux2010_x86_64.whl (259.2KiB)
cchardet-2.1.7-cp39-cp39-win32.whl (109.2KiB)
cchardet-2.1.7-cp39-cp39-win_amd64.whl (112.4KiB)
cchardet-2.1.7.tar.gz (638.3KiB)
No dependencies