pythainlp 5.3.3


pip install pythainlp

  Latest version

Released: Mar 26, 2026


Meta
Author: Korakot Chaovavanich, Charin Polpanumas, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, Can Udomcharoenchaikit
Maintainer: Wannaphong Phatthiyaphaibun, Arthit Suriyawongkul
Requires Python: >=3.9

Classifiers

Development Status
  • 5 - Production/Stable

Intended Audience
  • Developers

Natural Language
  • Thai

Programming Language
  • Python :: 3
  • Python :: 3 :: Only
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
  • Python :: 3.13
  • Python :: 3.14

Topic
  • Scientific/Engineering :: Artificial Intelligence
  • Software Development :: Localization
  • Text Processing
  • Text Processing :: General
  • Text Processing :: Linguistic

PyThaiNLP: Thai Natural Language Processing in Python

Project Logo

pypi Python 3.9 License DOI Project Status: Active Codacy Grade Coverage Status Google Colab Badge Facebook Chat on Matrix

pythainlp.org | Tutorials | License info | Model cards | Adopters | เอกสารภาษาไทย

Designed to be a Thai-focused counterpart to NLTK, PyThaiNLP provides standard tools for linguistic analysis under an Apache-2.0 license, with its data and models covered by CC0-1.0 and CC-BY-4.0.

pip install pythainlp
Version Python version Changes Documentation
5.3.3 3.9+ Log pythainlp.org/docs
dev 3.9+ Log pythainlp.org/dev-docs

Features

  • Linguistic units: Sentence, word, and subword segmentation (sent_tokenize, word_tokenize, subword_tokenize).

  • Tagging: Part-of-speech tagging (pos_tag).

  • Transliteration: Romanization (transliterate) and IPA conversion.

  • Correction: Spelling suggestion and correction (spell, correct).

  • Utilities: Soundex, collation, number-to-text (bahttext), datetime formatting (thai_strftime), and keyboard layout correction.

  • Data: Built-in Thai character sets, word lists, and stop words.

  • CLI: Command-line interface via thainlp.

    thainlp data catalog  # List datasets
    thainlp help          # Show usage
    

Installation options

To install with specific extras (e.g., translate, wordnet, full):

pip install "pythainlp[extra1,extra2,...]"

Possible extras included:

  • compact — install a stable and small subset of dependencies (recommended)
  • translate — machine translation support
  • wordnet — WordNet support
  • full — install all optional dependencies (may introduce conflicts)

The documentation website maintains the full list of extras. To see the specific libraries included in each extra, please inspect the [project.optional-dependencies] section of pyproject.toml.

Environment variables

Variable Description Status
PYTHAINLP_DATA Path to the data directory (default: ~/pythainlp-data). Current
PYTHAINLP_DATA_DIR Legacy alias for PYTHAINLP_DATA. Emits a DeprecationWarning. Setting both raises ValueError. Deprecated; use PYTHAINLP_DATA
PYTHAINLP_OFFLINE Set to 1 to disable automatic corpus downloads. Explicit download() calls still work. Current
PYTHAINLP_READ_ONLY Set to 1 to enable read-only mode, which prevents implicit background writes to PyThaiNLP's internal data directory (corpus downloads, catalog updates, directory creation). Explicit user-initiated saves to user-specified paths are unaffected. Current
PYTHAINLP_READ_MODE Legacy alias for PYTHAINLP_READ_ONLY. Emits a DeprecationWarning. Setting both raises ValueError. Deprecated; use PYTHAINLP_READ_ONLY

Data directory

PyThaiNLP downloads data (see the data catalog db.json at pythainlp-corpus) to ~/pythainlp-data by default. Set the PYTHAINLP_DATA environment variable to override this location. (PYTHAINLP_DATA_DIR is still accepted but deprecated.)

When using PyThaiNLP in distributed computing environments (e.g., Apache Spark), set the PYTHAINLP_DATA environment variable inside the function that will be distributed to worker nodes. See details in the documentation.

Offline mode

Set PYTHAINLP_OFFLINE=1 to disable automatic corpus downloads. When this variable is set and a corpus is not already cached locally, a FileNotFoundError is raised instead of attempting a network download. Explicit calls to pythainlp.corpus.download() are unaffected. Use pythainlp.is_offline_mode() to check the current state programmatically.

import pythainlp
print(pythainlp.is_offline_mode())  # True if PYTHAINLP_OFFLINE=1

Read-only mode

Set PYTHAINLP_READ_ONLY=1 to prevent implicit background writes to PyThaiNLP's internal data directory. This blocks corpus downloads, catalog updates, and automatic data directory creation — writes that happen as side effects the user may not be aware of.

Note: Read-only mode is more restrictive than offline mode. PYTHAINLP_OFFLINE=1 blocks only automatic downloads triggered by get_corpus_path(); explicit pythainlp.corpus.download() calls still work. PYTHAINLP_READ_ONLY=1 also blocks explicit download() calls, because any download requires writing to the data directory. Use PYTHAINLP_READ_ONLY when the data directory is on a read-only file system (e.g., a read-only Docker volume or a shared cluster mount).

Operations where the user explicitly specifies an output path are unaffected (e.g., model.save("path"), tagger.train(..., save_loc="path"), thainlp misspell --output myfile.txt).

Use pythainlp.is_read_only_mode() to check the current state programmatically.

import pythainlp
print(pythainlp.is_read_only_mode())  # True if PYTHAINLP_READ_ONLY=1

Testing

We test core functionalities on all officially supported Python versions.

See tests/README.md for test matrix and other details.

Contribute to PyThaiNLP

Please fork and create a pull request. See CONTRIBUTING.md for guidelines and algorithm references.

Citations

If you use PyThaiNLP library in your project, please cite the software as follows:

Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. “PyThaiNLP: Thai Natural Language Processing in Python”. Zenodo, 2 June 2024. https://doi.org/10.5281/zenodo.3519354.

with this BibTeX entry:

@software{pythainlp,
    title = "{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",
    author = "Phatthiyaphaibun, Wannaphong  and
      Chaovavanich, Korakot  and
      Polpanumas, Charin  and
      Suriyawongkul, Arthit  and
      Lowphansirikul, Lalita  and
      Chormai, Pattarawat",
    doi = {10.5281/zenodo.3519354},
    license = {Apache-2.0},
    month = jun,
    url = {https://github.com/PyThaiNLP/pythainlp/},
    version = {v5.0.4},
    year = {2024},
}

To cite our NLP-OSS 2023 academic paper, please cite the paper as follows:

Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. 2023. PyThaiNLP: Thai Natural Language Processing in Python. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25–36, Singapore, Singapore. Empirical Methods in Natural Language Processing.

with this BibTeX entry:

@inproceedings{phatthiyaphaibun-etal-2023-pythainlp,
    title = "{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",
    author = "Phatthiyaphaibun, Wannaphong  and
      Chaovavanich, Korakot  and
      Polpanumas, Charin  and
      Suriyawongkul, Arthit  and
      Lowphansirikul, Lalita  and
      Chormai, Pattarawat  and
      Limkonchotiwat, Peerat  and
      Suntorntip, Thanathip  and
      Udomcharoenchaikit, Can",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.4",
    pages = "25--36",
    abstract = "We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.",
}

Sponsors

See SPONSORS.md

Acknowledgements

PyThaiNLP was founded by Wannaphong Phatthiyaphaibun in 2016. His contributions from 2021 were made during a PhD studentship supported by Vidyasirimedhi Institute of Science and Technology (VISTEC).

The contributions of Arthit Suriyawongkul to PyThaiNLP from November 2017 until August 2019 were funded by Wisesight. His contributions from November 2019 until October 2024 were made during a PhD studentship supported by Taighde Éireann – Research Ireland under Grant Number 18/CRT/6224 (Research Ireland Centre for Research Training in Digitally-Enhanced Reality (d-real)).

The contributions of Pattarawat Chormai to PyThaiNLP from 2018 until 2019 were made during a research internship at the Natural Language Processing Lab, Department of Linguistics, Faculty of Arts, Chulalongkorn University.

The contributions of Korakot Chaovavanich and Lalita Lowphansirikul to PyThaiNLP from 2019 until 2022 were funded by the VISTEC-depa Thailand AI Research Institute.

The Mac Mini M1 used for macOS testing was donated by MacStadium. This hardware was essential for the project's testing suite from October 2022 to October 2023, filling a critical gap before GitHub Actions introduced native support for Apple Silicon runners.

We have only one official repository at https://github.com/PyThaiNLP/pythainlp and another mirror at https://gitlab.com/pythainlp/pythainlp.

Beware of malware if you use code from places other than these two.

Made with ❤️ | PyThaiNLP Team 💻 | "We build Thai NLP" 🇹🇭

5.3.3 Mar 26, 2026
5.3.2 Mar 19, 2026
5.3.1 Mar 14, 2026
5.3.0 Mar 10, 2026
5.2.0 Dec 20, 2025
5.2.0b1 Dec 10, 2025
5.1.2 May 09, 2025
5.1.1 Mar 31, 2025
5.1.0 Feb 25, 2025
5.1.0b2 Feb 09, 2025
5.1.0b1 Dec 27, 2024
5.0.5 Dec 14, 2024
5.0.4 Jun 02, 2024
5.0.3 May 12, 2024
5.0.2 Apr 03, 2024
5.0.1 Feb 10, 2024
5.0.0 Feb 10, 2024
5.0.0b1 Feb 05, 2024
5.0.0.dev2 Jan 15, 2024
5.0.0.dev1 Dec 19, 2023
5.0.0.dev0 Nov 26, 2023
4.1.0b5 Sep 24, 2023
4.1.0b4 Sep 05, 2023
4.1.0b3 Aug 04, 2023
4.1.0b2 Jul 27, 2023
4.1.0b1 Jul 24, 2023
4.0.2 May 30, 2023
4.0.1 May 03, 2023
4.0.0 Apr 14, 2023
4.0.0b1 Apr 01, 2023
3.1.1 Oct 30, 2022
3.1.0 Sep 24, 2022
3.1.0b0 Sep 20, 2022
3.1.0.dev3 Sep 18, 2022
3.1.0.dev2 Sep 15, 2022
3.1.0.dev1 Sep 01, 2022
3.1.0.dev0 Aug 31, 2022
3.0.10 Sep 20, 2022
3.0.9 Sep 14, 2022
3.0.8 May 16, 2022
3.0.7 May 16, 2022
3.0.5 Feb 14, 2022
3.0.4 Feb 14, 2022
3.0.3 Feb 09, 2022
3.0.2 Feb 09, 2022
3.0.1 Feb 09, 2022
3.0.0 Jan 29, 2022
3.0.0b0 Jan 20, 2022
3.0.0.dev0 Dec 27, 2021
2.4.0.dev0 Aug 01, 2021
2.3.2 Aug 25, 2021
2.3.1 Apr 04, 2021
2.3.1.dev0 Apr 04, 2021
2.3.0 Mar 31, 2021
2.3.0b1 Mar 23, 2021
2.3.0.dev0 Mar 16, 2021
2.2.6 Dec 13, 2020
2.2.5 Nov 16, 2020
2.2.4 Sep 17, 2020
2.2.3 Aug 01, 2020
2.2.2 Jul 09, 2020
2.2.1 Jun 27, 2020
2.2.0 Jun 24, 2020
2.2.0b1 Jun 15, 2020
2.2.0.dev1 May 23, 2020
2.2.0.dev0 May 01, 2020
2.1.4 Feb 07, 2020
2.1.4.dev0 Feb 07, 2020
2.1.3 Jan 11, 2020
2.1.2 Dec 31, 2019
2.1.1 Dec 19, 2019
2.1 Dec 10, 2019
2.1.dev8 Nov 16, 2019
2.1.dev7 Oct 25, 2019
2.1.dev6 Sep 26, 2019
2.1.dev4 Sep 21, 2019
2.1.dev3 Sep 03, 2019
2.1.dev2 Aug 21, 2019
2.1.dev1 May 10, 2019
2.0.7 Aug 16, 2019
2.0.6 Jun 27, 2019
2.0.5 May 09, 2019
2.0.4 Apr 20, 2019
2.0.3 Apr 14, 2019
2.0.2 Apr 11, 2019
2.0.1 Apr 07, 2019
2.0 Mar 31, 2019
1.7.4 Mar 09, 2019
1.7.3 Feb 10, 2019
1.7.2 Dec 28, 2018
1.7.1 Oct 31, 2018
1.7.0.1 Sep 29, 2018
1.7.0 Sep 22, 2018
1.6.0.7 Jun 22, 2018
1.6.0.6 Jun 11, 2018
1.6.0.5 May 13, 2018
1.6.0.4 Mar 04, 2018
1.6.0.3 Mar 04, 2018
1.6.0.2 Feb 25, 2018
1.5.4.2 Nov 20, 2017
1.5.4.1 Nov 03, 2017
1.5.4 Nov 02, 2017
1.5.3 Sep 21, 2017
1.5.2 Sep 20, 2017
1.5.1 Sep 11, 2017
1.5 Sep 03, 2017
1.4.1 Jul 27, 2017
1.3 May 30, 2017
1.2 Apr 02, 2017
1.1 Feb 05, 2017
1.0.0 Jan 04, 2017
0.0.9 Dec 27, 2016
0.0.8 Dec 27, 2016
0.0.7 Aug 07, 2016
0.0.6 Jul 16, 2016
0.0.5 Jul 10, 2016
0.0.4 Jul 10, 2016

Wheel compatibility matrix

Platform Python 3
any

Files in release

Extras:
Dependencies:
importlib-resources
tzdata