neologdn 0.5.6


pip install neologdn

  Latest version

Released: Dec 02, 2025


Meta
Author: Yukino Ikegami
Maintainer: Yukino Ikegami

Classifiers

Development Status
  • 4 - Beta

Intended Audience
  • Science/Research
  • Developers

Natural Language
  • Japanese

Programming Language
  • Cython
  • Python
  • Python :: 3
  • Python :: 3.8
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
  • Python :: 3.13
  • Python :: 3.14
  • Python :: Free Threading

Topic
  • Text Processing :: Linguistic
  • Text Processing

neologdn

PyPI - VersionPyPI - Python VersionPyPI - LicensePyPI DownloadsGitHub code search countGitHub Repo stars

neologdn is a Japanese text normalizer for mecab-neologd.

The normalization is based on the neologd's rules: https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja

And also some optional features are added.

Contributions are welcome!

NOTE: Installing this module requires C++11 compiler.

Installation

pip install neologdn

If setuptools is not installed, you must install it:

pip install setuptools

If you encountered the following error:

ERROR: Could not find a version that satisfies the requirement setuptools (from versions: none)

Then execute the following commands to may solve this error:

pip install wheel
pip install --no-build-isolation neologdn

Usage

import neologdn
neologdn.normalize("ハンカクカナ")
# => 'ハンカクカナ'
neologdn.normalize("全角記号!?@#")
# => '全角記号!?@#'
neologdn.normalize("全角記号例外「・」")
# => '全角記号例外「・」'
neologdn.normalize("長音短縮ウェーーーーイ")
# => '長音短縮ウェーイ'
neologdn.normalize("チルダ削除ウェ~∼∾〜〰~イ")
# => 'チルダ削除ウェイ'
neologdn.normalize("いろんなハイフン˗֊‐‑‒–⁃⁻₋−")
# => 'いろんなハイフン-'
neologdn.normalize("   PRML  副 読 本   ")
# => 'PRML副読本'
neologdn.normalize(" Natural Language Processing ")
# => 'Natural Language Processing'
neologdn.normalize("かわいいいいいいいいい", repeat=6)
# => 'かわいいいいいい'
neologdn.normalize("無駄無駄無駄無駄ァ", repeat=1)
# => '無駄ァ'
neologdn.normalize("1995〜2001年", tilde="normalize")
# => '1995~2001年'
neologdn.normalize("1995~2001年", tilde="normalize_zenkaku")
# => '1995〜2001年'
neologdn.normalize("1995〜2001年", tilde="ignore")  # Don't convert tilde
# => '1995〜2001年'
neologdn.normalize("1995〜2001年", tilde="remove")
# => '19952001年'
neologdn.normalize("1995〜2001年")  # Default parameter
# => '19952001年'

Benchmark

# Sample code from
# https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja#python-written-by-hideaki-t--overlast
import normalize_neologd

%timeit normalize(normalize_neologd.normalize_neologd)
# => 9.55 s ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

import neologdn
%timeit normalize(neologdn.normalize)
# => 6.66 s ± 35.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

neologdn is about x1.43 faster than sample code.

details are described as the below notebook: https://github.com/ikegami-yukino/neologdn/blob/master/benchmark/benchmark.ipynb

License

Apache Software License.

CHANGES

0.5.6 (2025-12-02)

  • Support Python 3.14 and 3.14t (free-treaded)
  • Normalize the left double quotation ” (U+201C) to double quotation " (U+0022)

0.5.4 (2025-03-15)

  • Support Python 3.13
  • Fix tilde loss after latin and whitespace (Many thanks @a-lucky)

0.5.3 (2024-05-03)

  • Support Python 3.12

0.5.2 (2023-08-03)

  • Support Python 3.10 and 3.11 (Many thanks @polm)

0.5.1 (2021-05-02)

  • Improve performance of shorten_repeat function (Many thanks @yskn67)
  • Add tilde option to normalize function

0.4 (2018-12-06)

  • Add shorten_repeat function, which shortening contiguous substring. For example: neologdn.normalize("無駄無駄無駄無駄ァ", repeat=1) -> 無駄ァ

0.3.2 (2018-05-17)

  • Add option for suppression removal of spaces between Japanese characters

0.2.2 (2018-03-10)

  • Fix bug (daku-ten & handaku-ten)
  • Support mac osx 10.13 (Many thanks @r9y9)

0.2.1 (2017-01-23)

  • Fix bug (Check if a previous character of daku-ten character is in maps) (Many thanks @unnonouno)

0.2 (2016-04-12)

  • Add lengthened expression (repeating character) threshold

0.1.2 (2016-03-29)

  • Fix installation bug

0.1.1.1 (2016-03-19)

  • Support Windows
  • Explicitly specify to -std=c++11 in build (Many thanks @id774)

0.1.1 (2015-10-10)

Initial release.

Contribution

Contributions are welcome! See: https://github.com/ikegami-yukino/neologdn/blob/master/.github/CONTRIBUTING.md

Cited by

Book

  • 山本 和英. テキスト処理の要素技術. 近代科学者. P.41. 2021.

Blog

Wheel compatibility matrix

Platform CPython 3.8 CPython 3.9 CPython 3.10 CPython 3.11 CPython 3.12 CPython 3.13 CPython 3.14 CPython (additional flags: t) 3.14
macosx_10_13_universal2
macosx_10_13_x86_64
macosx_10_15_universal2
macosx_10_15_x86_64
macosx_10_9_universal2
macosx_10_9_x86_64
macosx_11_0_arm64
manylinux_2_24_aarch64
manylinux_2_24_ppc64le
manylinux_2_24_s390x
manylinux_2_24_x86_64
manylinux_2_28_aarch64
manylinux_2_28_ppc64le
manylinux_2_28_s390x
manylinux_2_28_x86_64
musllinux_1_2_aarch64
musllinux_1_2_ppc64le
musllinux_1_2_s390x
musllinux_1_2_x86_64
win32
win_amd64
win_arm64

Files in release

neologdn-0.5.6-cp310-cp310-macosx_10_9_universal2.whl (83.8KiB)
neologdn-0.5.6-cp310-cp310-macosx_10_9_x86_64.whl (47.6KiB)
neologdn-0.5.6-cp310-cp310-macosx_11_0_arm64.whl (45.6KiB)
neologdn-0.5.6-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (240.8KiB)
neologdn-0.5.6-cp310-cp310-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (247.8KiB)
neologdn-0.5.6-cp310-cp310-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (248.7KiB)
neologdn-0.5.6-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (241.9KiB)
neologdn-0.5.6-cp310-cp310-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp310-cp310-musllinux_1_2_ppc64le.whl (1.2MiB)
neologdn-0.5.6-cp310-cp310-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp310-cp310-musllinux_1_2_x86_64.whl (1.2MiB)
neologdn-0.5.6-cp310-cp310-win32.whl (43.8KiB)
neologdn-0.5.6-cp310-cp310-win_amd64.whl (46.5KiB)
neologdn-0.5.6-cp310-cp310-win_arm64.whl (41.5KiB)
neologdn-0.5.6-cp311-cp311-macosx_10_9_universal2.whl (83.6KiB)
neologdn-0.5.6-cp311-cp311-macosx_10_9_x86_64.whl (47.5KiB)
neologdn-0.5.6-cp311-cp311-macosx_11_0_arm64.whl (45.5KiB)
neologdn-0.5.6-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (253.6KiB)
neologdn-0.5.6-cp311-cp311-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (261.9KiB)
neologdn-0.5.6-cp311-cp311-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (261.5KiB)
neologdn-0.5.6-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (255.6KiB)
neologdn-0.5.6-cp311-cp311-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp311-cp311-musllinux_1_2_ppc64le.whl (1.3MiB)
neologdn-0.5.6-cp311-cp311-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp311-cp311-musllinux_1_2_x86_64.whl (1.2MiB)
neologdn-0.5.6-cp311-cp311-win32.whl (43.8KiB)
neologdn-0.5.6-cp311-cp311-win_amd64.whl (46.6KiB)
neologdn-0.5.6-cp311-cp311-win_arm64.whl (41.6KiB)
neologdn-0.5.6-cp312-cp312-macosx_10_13_universal2.whl (82.8KiB)
neologdn-0.5.6-cp312-cp312-macosx_10_13_x86_64.whl (46.9KiB)
neologdn-0.5.6-cp312-cp312-macosx_11_0_arm64.whl (45.2KiB)
neologdn-0.5.6-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (256.5KiB)
neologdn-0.5.6-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (264.2KiB)
neologdn-0.5.6-cp312-cp312-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (268.5KiB)
neologdn-0.5.6-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (260.3KiB)
neologdn-0.5.6-cp312-cp312-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp312-cp312-musllinux_1_2_ppc64le.whl (1.3MiB)
neologdn-0.5.6-cp312-cp312-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp312-cp312-musllinux_1_2_x86_64.whl (1.2MiB)
neologdn-0.5.6-cp312-cp312-win32.whl (43.4KiB)
neologdn-0.5.6-cp312-cp312-win_amd64.whl (46.2KiB)
neologdn-0.5.6-cp312-cp312-win_arm64.whl (41.0KiB)
neologdn-0.5.6-cp313-cp313-macosx_10_13_universal2.whl (82.0KiB)
neologdn-0.5.6-cp313-cp313-macosx_10_13_x86_64.whl (46.5KiB)
neologdn-0.5.6-cp313-cp313-macosx_11_0_arm64.whl (44.8KiB)
neologdn-0.5.6-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (255.4KiB)
neologdn-0.5.6-cp313-cp313-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (262.4KiB)
neologdn-0.5.6-cp313-cp313-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (266.7KiB)
neologdn-0.5.6-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (260.1KiB)
neologdn-0.5.6-cp313-cp313-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp313-cp313-musllinux_1_2_ppc64le.whl (1.3MiB)
neologdn-0.5.6-cp313-cp313-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp313-cp313-musllinux_1_2_x86_64.whl (1.2MiB)
neologdn-0.5.6-cp313-cp313-win32.whl (43.0KiB)
neologdn-0.5.6-cp313-cp313-win_amd64.whl (45.7KiB)
neologdn-0.5.6-cp313-cp313-win_arm64.whl (40.7KiB)
neologdn-0.5.6-cp314-cp314-macosx_10_15_universal2.whl (81.9KiB)
neologdn-0.5.6-cp314-cp314-macosx_10_15_x86_64.whl (46.4KiB)
neologdn-0.5.6-cp314-cp314-macosx_11_0_arm64.whl (44.9KiB)
neologdn-0.5.6-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (255.2KiB)
neologdn-0.5.6-cp314-cp314-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (262.8KiB)
neologdn-0.5.6-cp314-cp314-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (263.6KiB)
neologdn-0.5.6-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (258.3KiB)
neologdn-0.5.6-cp314-cp314-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp314-cp314-musllinux_1_2_ppc64le.whl (1.3MiB)
neologdn-0.5.6-cp314-cp314-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp314-cp314-musllinux_1_2_x86_64.whl (1.2MiB)
neologdn-0.5.6-cp314-cp314-win32.whl (44.6KiB)
neologdn-0.5.6-cp314-cp314-win_amd64.whl (47.2KiB)
neologdn-0.5.6-cp314-cp314-win_arm64.whl (42.2KiB)
neologdn-0.5.6-cp314-cp314t-macosx_10_15_universal2.whl (87.8KiB)
neologdn-0.5.6-cp314-cp314t-macosx_10_15_x86_64.whl (49.4KiB)
neologdn-0.5.6-cp314-cp314t-macosx_11_0_arm64.whl (48.0KiB)
neologdn-0.5.6-cp314-cp314t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (304.3KiB)
neologdn-0.5.6-cp314-cp314t-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (306.7KiB)
neologdn-0.5.6-cp314-cp314t-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (308.2KiB)
neologdn-0.5.6-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (299.8KiB)
neologdn-0.5.6-cp314-cp314t-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp314-cp314t-musllinux_1_2_ppc64le.whl (1.3MiB)
neologdn-0.5.6-cp314-cp314t-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp314-cp314t-musllinux_1_2_x86_64.whl (1.3MiB)
neologdn-0.5.6-cp314-cp314t-win32.whl (48.0KiB)
neologdn-0.5.6-cp314-cp314t-win_amd64.whl (51.6KiB)
neologdn-0.5.6-cp314-cp314t-win_arm64.whl (44.7KiB)
neologdn-0.5.6-cp38-cp38-macosx_10_9_universal2.whl (84.9KiB)
neologdn-0.5.6-cp38-cp38-macosx_10_9_x86_64.whl (48.1KiB)
neologdn-0.5.6-cp38-cp38-macosx_11_0_arm64.whl (46.1KiB)
neologdn-0.5.6-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (243.0KiB)
neologdn-0.5.6-cp38-cp38-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (249.7KiB)
neologdn-0.5.6-cp38-cp38-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (251.0KiB)
neologdn-0.5.6-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (243.8KiB)
neologdn-0.5.6-cp38-cp38-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp38-cp38-musllinux_1_2_ppc64le.whl (1.2MiB)
neologdn-0.5.6-cp38-cp38-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp38-cp38-musllinux_1_2_x86_64.whl (1.2MiB)
neologdn-0.5.6-cp38-cp38-win32.whl (44.4KiB)
neologdn-0.5.6-cp38-cp38-win_amd64.whl (47.1KiB)
neologdn-0.5.6-cp39-cp39-macosx_10_9_universal2.whl (84.1KiB)
neologdn-0.5.6-cp39-cp39-macosx_10_9_x86_64.whl (47.8KiB)
neologdn-0.5.6-cp39-cp39-macosx_11_0_arm64.whl (45.7KiB)
neologdn-0.5.6-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (239.6KiB)
neologdn-0.5.6-cp39-cp39-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl (246.6KiB)
neologdn-0.5.6-cp39-cp39-manylinux_2_24_s390x.manylinux_2_28_s390x.whl (247.6KiB)
neologdn-0.5.6-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (240.5KiB)
neologdn-0.5.6-cp39-cp39-musllinux_1_2_aarch64.whl (1.2MiB)
neologdn-0.5.6-cp39-cp39-musllinux_1_2_ppc64le.whl (1.2MiB)
neologdn-0.5.6-cp39-cp39-musllinux_1_2_s390x.whl (1.4MiB)
neologdn-0.5.6-cp39-cp39-musllinux_1_2_x86_64.whl (1.2MiB)
neologdn-0.5.6-cp39-cp39-win32.whl (43.9KiB)
neologdn-0.5.6-cp39-cp39-win_amd64.whl (46.6KiB)
neologdn-0.5.6-cp39-cp39-win_arm64.whl (41.6KiB)
neologdn-0.5.6.tar.gz (102.7KiB)
No dependencies