nlpo3 1.3.1


pip install nlpo3

  Latest version

Released: Nov 11, 2024


Meta
Author: Thanathip Suntorntip, Arthit Suriyawongkul, Wannaphong Phatthiyaphaibun
Requires Python: >=3.7

Classifiers

Development Status
  • 5 - Production/Stable

Programming Language
  • Python :: 3 :: Only
  • Python :: 3.7
  • Python :: 3.8
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
  • Python :: 3.13
  • Python :: Implementation :: CPython
  • Python :: Implementation :: PyPy

Intended Audience
  • Developers

License
  • OSI Approved :: Apache Software License

Natural Language
  • Thai

Topic
  • Text Processing :: Linguistic
  • Software Development :: Libraries :: Python Modules

SPDX-FileCopyrightText: 2024 PyThaiNLP Project SPDX-License-Identifier: Apache-2.0

nlpO3 Python binding

PyPI Python 3.7 Apache-2.0

Python binding for nlpO3, a Thai natural language processing library in Rust.

To install:

pip install nlpo3

Table of Contents

Features

  • Thai word tokenizer
    • segment() - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
      • 2.5x faster than similar pure Python implementation (PyThaiNLP's newmm)
    • load_dict() - load a dictionary from a plain text file (one word per line)

Use

Load file path/to/dict.file to memory and assign a name dict_name to it.

Then tokenize a text with the dict_name dictionary:

from nlpo3 import load_dict, segment

load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "dict_name")

it will return a list of strings:

['สวัสดี', 'ครับ']

(result depends on words included in the dictionary)

Use multithread mode, also use the dict_name dictionary:

segment("สวัสดีครับ", dict_name="dict_name", parallel=True)

Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries:

segment("สวัสดีครับ", dict_name="dict_name", safe=True)

Dictionary

  • For the interest of library size, nlpO3 does not assume what dictionary the user would like to use, and it does not come with a dictionary.
  • A dictionary is needed for the dictionary-based word tokenizer.
  • For tokenization dictionary, try

Build

Requirements

  • Rust 2018 Edition
  • Python 3.7 or newer (PyO3's minimum supported version)
  • Python Development Headers
    • Ubuntu: sudo apt-get install python3-dev
    • macOS: No action needed
  • PyO3 - already included in Cargo.toml
  • setuptools-rust

Steps

python -m pip install --upgrade build
python -m build

This should generate a wheel file, in dist/ directory, which can be installed by pip.

To install a wheel from a local directory:

pip install dist/nlpo3-1.3.1-cp311-cp311-macosx_12_0_x86_64.whl 

Test

To run a Python unit test:

cd tests
python -m unittest

Issues

Please report issues at https://github.com/PyThaiNLP/nlpo3/issues

License

nlpO3 Python binding is copyrighted by its authors and licensed under terms of the Apache Software License 2.0 (Apache-2.0). See file LICENSE for details.

Binary wheels

A pre-built binary package is available from PyPI for these platforms:

Python OS Architecture Has binary wheel?
3.13 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
musllinux x86_64
3.12 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
musllinux x86_64
3.11 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
musllinux x86_64
3.10 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
musllinux x86_64
3.9 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
musllinux x86_64
3.8 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
musllinux x86_64
3.7 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
musllinux x86_64
PyPy 3.10 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
PyPy 3.9 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
PyPy 3.8 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686
PyPy 3.7 Windows x86
Windows AMD64
macOS x86_64
macOS arm64
manylinux x86_64
manylinux i686

Wheel compatibility matrix

Platform CPython 3.7 CPython 3.8 CPython 3.9 CPython 3.10 CPython 3.11 CPython 3.12 CPython 3.13 PyPy 3.7 (pp73) PyPy 3.8 (pp73) PyPy 3.9 (pp73) PyPy 3.10 (pp73)
macosx_10_13_x86_64
macosx_10_15_x86_64
macosx_10_9_x86_64
macosx_11_0_arm64
manylinux2014_i686
manylinux2014_x86_64
manylinux_2_17_i686
manylinux_2_17_x86_64
musllinux_1_2_x86_64
win32
win_amd64

Files in release

nlpo3-1.3.1-cp310-cp310-macosx_10_9_x86_64.whl (702.9KiB)
nlpo3-1.3.1-cp310-cp310-macosx_11_0_arm64.whl (662.9KiB)
nlpo3-1.3.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (800.3KiB)
nlpo3-1.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (800.0KiB)
nlpo3-1.3.1-cp310-cp310-musllinux_1_2_x86_64.whl (860.0KiB)
nlpo3-1.3.1-cp310-cp310-win32.whl (504.6KiB)
nlpo3-1.3.1-cp310-cp310-win_amd64.whl (568.6KiB)
nlpo3-1.3.1-cp311-cp311-macosx_10_9_x86_64.whl (703.1KiB)
nlpo3-1.3.1-cp311-cp311-macosx_11_0_arm64.whl (662.8KiB)
nlpo3-1.3.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (800.0KiB)
nlpo3-1.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (799.8KiB)
nlpo3-1.3.1-cp311-cp311-musllinux_1_2_x86_64.whl (859.7KiB)
nlpo3-1.3.1-cp311-cp311-win32.whl (504.5KiB)
nlpo3-1.3.1-cp311-cp311-win_amd64.whl (568.6KiB)
nlpo3-1.3.1-cp312-cp312-macosx_10_13_x86_64.whl (702.6KiB)
nlpo3-1.3.1-cp312-cp312-macosx_11_0_arm64.whl (662.7KiB)
nlpo3-1.3.1-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl (799.7KiB)
nlpo3-1.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (799.6KiB)
nlpo3-1.3.1-cp312-cp312-musllinux_1_2_x86_64.whl (859.4KiB)
nlpo3-1.3.1-cp312-cp312-win32.whl (504.7KiB)
nlpo3-1.3.1-cp312-cp312-win_amd64.whl (568.7KiB)
nlpo3-1.3.1-cp313-cp313-macosx_10_13_x86_64.whl (702.1KiB)
nlpo3-1.3.1-cp313-cp313-macosx_11_0_arm64.whl (662.0KiB)
nlpo3-1.3.1-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl (799.1KiB)
nlpo3-1.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (799.3KiB)
nlpo3-1.3.1-cp313-cp313-musllinux_1_2_x86_64.whl (859.0KiB)
nlpo3-1.3.1-cp313-cp313-win32.whl (504.1KiB)
nlpo3-1.3.1-cp313-cp313-win_amd64.whl (568.0KiB)
nlpo3-1.3.1-cp37-cp37m-macosx_10_9_x86_64.whl (703.0KiB)
nlpo3-1.3.1-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl (800.1KiB)
nlpo3-1.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (799.8KiB)
nlpo3-1.3.1-cp37-cp37m-musllinux_1_2_x86_64.whl (859.7KiB)
nlpo3-1.3.1-cp37-cp37m-win32.whl (504.4KiB)
nlpo3-1.3.1-cp37-cp37m-win_amd64.whl (568.2KiB)
nlpo3-1.3.1-cp38-cp38-macosx_10_9_x86_64.whl (703.1KiB)
nlpo3-1.3.1-cp38-cp38-macosx_11_0_arm64.whl (662.7KiB)
nlpo3-1.3.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (800.3KiB)
nlpo3-1.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (799.7KiB)
nlpo3-1.3.1-cp38-cp38-musllinux_1_2_x86_64.whl (859.6KiB)
nlpo3-1.3.1-cp38-cp38-win32.whl (504.4KiB)
nlpo3-1.3.1-cp38-cp38-win_amd64.whl (568.4KiB)
nlpo3-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl (703.0KiB)
nlpo3-1.3.1-cp39-cp39-macosx_11_0_arm64.whl (663.0KiB)
nlpo3-1.3.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (800.5KiB)
nlpo3-1.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (799.8KiB)
nlpo3-1.3.1-cp39-cp39-musllinux_1_2_x86_64.whl (859.9KiB)
nlpo3-1.3.1-cp39-cp39-win32.whl (504.6KiB)
nlpo3-1.3.1-cp39-cp39-win_amd64.whl (568.7KiB)
nlpo3-1.3.1-pp310-pypy310_pp73-macosx_10_15_x86_64.whl (704.8KiB)
nlpo3-1.3.1-pp310-pypy310_pp73-macosx_11_0_arm64.whl (664.1KiB)
nlpo3-1.3.1-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (802.3KiB)
nlpo3-1.3.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.3KiB)
nlpo3-1.3.1-pp310-pypy310_pp73-win_amd64.whl (569.6KiB)
nlpo3-1.3.1-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (707.3KiB)
nlpo3-1.3.1-pp37-pypy37_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (804.5KiB)
nlpo3-1.3.1-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (804.4KiB)
nlpo3-1.3.1-pp37-pypy37_pp73-win_amd64.whl (571.1KiB)
nlpo3-1.3.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (704.0KiB)
nlpo3-1.3.1-pp38-pypy38_pp73-macosx_11_0_arm64.whl (664.0KiB)
nlpo3-1.3.1-pp38-pypy38_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (802.1KiB)
nlpo3-1.3.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.4KiB)
nlpo3-1.3.1-pp38-pypy38_pp73-win_amd64.whl (569.4KiB)
nlpo3-1.3.1-pp39-pypy39_pp73-macosx_10_15_x86_64.whl (704.0KiB)
nlpo3-1.3.1-pp39-pypy39_pp73-macosx_11_0_arm64.whl (664.1KiB)
nlpo3-1.3.1-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (802.5KiB)
nlpo3-1.3.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.3KiB)
nlpo3-1.3.1-pp39-pypy39_pp73-win_amd64.whl (569.4KiB)
nlpo3-1.3.1.tar.gz (16.6KiB)
No dependencies