Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models for Japanese and other languages
Project Links
Meta
Author: Koichi Yasuoka
Requires Python: >=3.7
Classifiers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Operating System
- OS Independent
Natural Language
- Japanese
- Korean
- Chinese (Simplified)
- Chinese (Traditional)
- Thai
- Vietnamese
- English
- German
- Serbian
Topic
- Text Processing :: Linguistic
esupar
Tokenizer, POS-tagger, and dependency-parser with Transformers and SuPar.
Basic usage
>>> import esupar
>>> nlp=esupar.load("ja")
>>> doc=nlp("太郎は花子が読んでいる本を次郎に渡した")
>>> print(doc)
1 太郎 _ PROPN _ _ 12 nsubj _ SpaceAfter=No
2 は _ ADP _ _ 1 case _ SpaceAfter=No
3 花子 _ PROPN _ _ 5 nsubj _ SpaceAfter=No
4 が _ ADP _ _ 3 case _ SpaceAfter=No
5 読ん _ VERB _ _ 8 acl _ SpaceAfter=No
6 で _ SCONJ _ _ 5 mark _ SpaceAfter=No
7 いる _ AUX _ _ 5 aux _ SpaceAfter=No
8 本 _ NOUN _ _ 12 obj _ SpaceAfter=No
9 を _ ADP _ _ 8 case _ SpaceAfter=No
10 次郎 _ PROPN _ _ 12 obl _ SpaceAfter=No
11 に _ ADP _ _ 10 case _ SpaceAfter=No
12 渡し _ VERB _ _ 0 root _ SpaceAfter=No
13 た _ AUX _ _ 12 aux _ _
>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
太郎 PROPN ═╗<════════╗ nsubj(主語)
は ADP <╝ ║ case(格表示)
花子 PROPN ═╗<══╗ ║ nsubj(主語)
が ADP <╝ ║ ║ case(格表示)
読ん VERB ═╗═╗═╝<╗ ║ acl(連体修飾節)
で SCONJ <╝ ║ ║ ║ mark(標識)
いる AUX <══╝ ║ ║ aux(動詞補助成分)
本 NOUN ═╗═════╝<╗ ║ obj(目的語)
を ADP <╝ ║ ║ case(格表示)
次郎 PROPN ═╗<╗ ║ ║ obl(斜格補語)
に ADP <╝ ║ ║ ║ case(格表示)
渡し VERB ═╗═╝═════╝═╝ root(親)
た AUX <╝ aux(動詞補助成分)
esupar.load(model)
loads a natural language processor pipeline, working on Universal Dependencies. Available model
options are:
model="ja"
Japanese model bert-base-japanese-upos (default)model="ja_large"
Japanese model bert-large-japanese-uposmodel="ja_luw_small"
Japanese long-unit-word model roberta-small-japanese-char-luw-uposmodel="ja_luw_base"
Japanese long-unit-word model bert-base-japanese-luw-uposmodel="ja_luw_large"
Japanese long-unit-word model bert-large-japanese-luw-uposmodel="ko"
Korean model roberta-base-korean-uposmodel="ko_large"
Korean model roberta-large-korean-uposmodel="ko_morph_base"
Korean morpheme model roberta-base-korean-morph-uposmodel="ko_morph_large"
Korean morpheme model roberta-large-korean-morph-uposmodel="zh"
Chinese model chinese-bert-wwm-ext-uposmodel="zh_base"
Chinese model chinese-roberta-base-uposmodel="zh_large"
Chinese model chinese-roberta-large-uposmodel="lzh"
Classical Chinese model roberta-classical-chinese-base-uposmodel="lzh_large"
Classical Chinese model roberta-classical-chinese-large-uposmodel="th"
Thai model roberta-base-thai-spm-uposmodel="vi"
Vietnamese model bert-base-vietnamese-uposmodel="en"
English model roberta-base-english-uposmodel="en_large"
English model roberta-large-english-uposmodel="de"
German model bert-base-german-uposmodel="de_large"
German model bert-large-german-uposmodel="sr"
Serbian (Cyrillic and Latin) model gpt2-small-serbian-uposmodel="sr_large"
Serbian (Cyrillic and Latin) model gpt2-large-serbian-uposmodel="cop"
Coptic model roberta-base-coptic-uposmodel="ain"
Ainu model roberta-base-ainu-upos
Installation for Linux
pip3 install esupar --user
Installation for Cygwin64
Make sure to get python37-devel
python37-pip
python37-cython
python37-numpy
python37-wheel
gcc-g++
mingw64-x86_64-gcc-g++
git
curl
make
cmake
, and then:
curl -L https://raw.githubusercontent.com/KoichiYasuoka/CygTorch/master/installer/supar.sh | sh
pip3.7 install esupar
Installation for Google Colaboratory
!pip install esupar
Try notebook.
Author
Koichi Yasuoka (安岡孝一)
Aug 31, 2025
1.8.8
Aug 31, 2025
1.8.7
Aug 30, 2025
1.8.6
Aug 30, 2025
1.8.5
Aug 30, 2025
1.8.4
Aug 12, 2025
1.8.3
Apr 13, 2025
1.8.2
Mar 28, 2025
1.8.1
Mar 28, 2025
1.8.0
Mar 28, 2025
1.7.9
Mar 27, 2025
1.7.8
Jan 03, 2025
1.7.7
Nov 28, 2024
1.7.6
Sep 27, 2024
1.7.5
Sep 12, 2024
1.7.4
Aug 15, 2024
1.7.3
May 20, 2024
1.7.2
May 09, 2024
1.7.1
Feb 29, 2024
1.7.0
Feb 06, 2024
1.6.9
Jan 10, 2024
1.6.8
Nov 08, 2023
1.6.7
Nov 05, 2023
1.6.6
Nov 05, 2023
1.6.5
Jul 17, 2023
1.6.4
Jul 17, 2023
1.6.3
Feb 18, 2023
1.6.2
Feb 17, 2023
1.6.1
Feb 13, 2023
1.6.0
Feb 06, 2023
1.5.9
Feb 04, 2023
1.5.8
Jan 31, 2023
1.5.7
Jan 27, 2023
1.5.6
Jan 22, 2023
1.5.5
Jan 21, 2023
1.5.4
Jan 19, 2023
1.5.3
Jan 17, 2023
1.5.2
Jan 14, 2023
1.5.1
Jan 10, 2023
1.5.0
Jan 09, 2023
1.4.9
Jan 04, 2023
1.4.8
Dec 30, 2022
1.4.7
Dec 30, 2022
1.4.6
Dec 30, 2022
1.4.5
Dec 24, 2022
1.4.4
Dec 15, 2022
1.4.3
Dec 12, 2022
1.4.2
Dec 05, 2022
1.4.1
Nov 30, 2022
1.4.0
Nov 29, 2022
1.3.9
Sep 16, 2022
1.3.8
Aug 03, 2022
1.3.7
Aug 03, 2022
1.3.6
Aug 03, 2022
1.3.5
Jul 17, 2022
1.3.4
Jul 14, 2022
1.3.3
Jun 26, 2022
1.3.2
Jun 19, 2022
1.3.1
May 24, 2022
1.3.0
May 24, 2022
1.2.9
May 23, 2022
1.2.8
May 08, 2022
1.2.7
May 07, 2022
1.2.6
Apr 17, 2022
1.2.5
Apr 12, 2022
1.2.4
Apr 10, 2022
1.2.3
Apr 09, 2022
1.2.2
Mar 14, 2022
1.2.1
Mar 13, 2022
1.2.0
Mar 13, 2022
1.1.9
Mar 11, 2022
1.1.8
Feb 23, 2022
1.1.7
Feb 21, 2022
1.1.6
Feb 17, 2022
1.1.5
Feb 16, 2022
1.1.4
Feb 13, 2022
1.1.3
Feb 10, 2022
1.1.2
Feb 10, 2022
1.1.1
Feb 10, 2022
1.1.0
Feb 10, 2022
1.0.9
Feb 09, 2022
1.0.8
Feb 05, 2022
1.0.7
Feb 05, 2022
1.0.6
Jan 31, 2022
1.0.5
Jan 20, 2022
1.0.4
Jan 03, 2022
1.0.3
Jan 03, 2022
1.0.2
Jan 03, 2022
1.0.1
Jan 01, 2022
1.0.0
Jan 01, 2022
0.9.9
Dec 31, 2021
0.9.8
Dec 28, 2021
0.9.7
Dec 23, 2021
0.9.6
Dec 20, 2021
0.9.5
Dec 18, 2021
0.9.4
Nov 21, 2021
0.9.3
Nov 05, 2021
0.9.2
Nov 05, 2021
0.9.1
Nov 03, 2021
0.9.0
Oct 26, 2021
0.8.3
Oct 26, 2021
0.8.2
Oct 26, 2021
0.8.1
Oct 23, 2021
0.8.0
Oct 23, 2021
0.7.6
Oct 23, 2021
0.7.5
Oct 23, 2021
0.7.4
Oct 23, 2021
0.7.3
Oct 23, 2021
0.7.2
Oct 23, 2021
0.7.1
Sep 18, 2021
0.7.0
Sep 18, 2021
0.6.0
Sep 17, 2021
0.5.0