License
- OSI Approved :: MIT License
Natural Language
- Japanese
Operating System
- MacOS
- MacOS :: MacOS X
- Microsoft :: Windows
- POSIX :: Linux
Programming Language
- Python :: 3
- Python :: 3.9
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
- Python :: 3.13
Topic
- Scientific/Engineering
- Software Development :: Libraries
- Software Development :: Libraries :: Python Modules
- Text Processing
- Text Processing :: Linguistic
rhoknp: Yet another Python binding for Juman++/KNP/KWJA
Documentation: https://rhoknp.readthedocs.io/en/latest/
Source Code: https://github.com/ku-nlp/rhoknp
rhoknp is a Python binding for Juman++, KNP, and KWJA.[^1]
[^1]: The logo was generated by OpenAI DALL·E 2.
import rhoknp
# Perform morphological analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply_to_sentence(
"電気抵抗率は電気の通しにくさを表す物性値である。"
)
# Access to the result
for morpheme in sentence.morphemes: # a.k.a. keitai-so
...
# Save the result
with open("result.jumanpp", "wt") as f:
f.write(sentence.to_jumanpp())
# Load the result
with open("result.jumanpp", "rt") as f:
sentence = rhoknp.Sentence.from_jumanpp(f.read())
Requirements
Installation
pip install rhoknp
Quick tour
Let's begin by using Juman++ with rhoknp. Here, we present a simple example demonstrating how Juman++ can be used to analyze a sentence.
# Perform morphological analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply_to_sentence("電気抵抗率は電気の通しにくさを表す物性値である。")
You can easily access the individual morphemes that make up the sentence.
for morpheme in sentence.morphemes: # a.k.a. keitai-so
...
Sentence objects can be saved in the JUMAN format.
# Save the sentence in the JUMAN format
with open("sentence.jumanpp", "wt") as f:
f.write(sentence.to_jumanpp())
# Load the sentence
with open("sentence.jumanpp", "rt") as f:
sentence = rhoknp.Sentence.from_jumanpp(f.read())
Almost the same APIs are available for KNP.
# Perform language analysis by KNP
knp = rhoknp.KNP()
sentence = knp.apply_to_sentence("電気抵抗率は電気の通しにくさを表す物性値である。")
KNP performs language analysis at multiple levels.
for clause in sentence.clauses: # a.k.a., setsu
...
for phrase in sentence.phrases: # a.k.a. bunsetsu
...
for base_phrase in sentence.base_phrases: # a.k.a. kihon-ku
...
for morpheme in sentence.morphemes: # a.k.a. keitai-so
...
Sentence objects can be saved in the KNP format.
# Save the sentence in the KNP format
with open("sentence.knp", "wt") as f:
f.write(sentence.to_knp())
# Load the sentence
with open("sentence.knp", "rt") as f:
sentence = rhoknp.Sentence.from_knp(f.read())
Furthermore, rhoknp provides convenient APIs for document-level language analysis.
document = rhoknp.Document.from_raw_text(
"電気抵抗率は電気の通しにくさを表す物性値である。単に抵抗率とも呼ばれる。"
)
# If you know sentence boundaries, you can use `Document.from_sentences` instead.
document = rhoknp.Document.from_sentences(
[
"電気抵抗率は電気の通しにくさを表す物性値である。",
"単に抵抗率とも呼ばれる。",
]
)
Document objects can be handled in a similar manner as Sentence objects.
# Perform morphological analysis by Juman++
document = jumanpp.apply_to_document(document)
# Access language units in the document
for sentence in document.sentences:
...
for morpheme in document.morphemes:
...
# Save language analysis by Juman++
with open("document.jumanpp", "wt") as f:
f.write(document.to_jumanpp())
# Load language analysis by Juman++
with open("document.jumanpp", "rt") as f:
document = rhoknp.Document.from_jumanpp(f.read())
For more information, please refer to the examples and documentation.
Main differences from pyknp
pyknp serves as the official Python binding for Juman++ and KNP. In the development of rhoknp, we redesigned the API, considering the current use cases of pyknp. The key differences between the two are as follows:
- Support for document-level language analysis: rhoknp allows you to load and instantiate the results of document-level language analysis, including cohesion analysis and discourse relation analysis.
- Strict type-awareness: rhoknp has been thoroughly annotated with type annotations, ensuring strict type checking and improved code clarity.
- Comprehensive test suite: rhoknp is extensively tested with a comprehensive test suite. You can view the code coverage report on Codecov.
License
MIT
Contributing
We warmly welcome contributions to rhoknp. You can get started by reading the contribution guide.