Chinese Words Segmentation Utilities
Project Links
Meta
Author: Sun, Junyi
Classifiers
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Natural Language
- Chinese (Simplified)
- Chinese (Traditional)
Programming Language
- Python
- Python :: 2
- Python :: 2.6
- Python :: 2.7
- Python :: 3
- Python :: 3.2
- Python :: 3.3
- Python :: 3.4
Topic
- Text Processing
- Text Processing :: Indexing
- Text Processing :: Linguistic
jieba
“结巴”中文分词:做最好的 Python 中文分词组件
“Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module.
完整文档见 README.md
GitHub: https://github.com/fxsjy/jieba
特点
支持三种分词模式:
精确模式,试图将句子最精确地切开,适合文本分析;
全模式,把句子中所有的可以成词的词语都扫描出来, 速度非常快,但是不能解决歧义;
搜索引擎模式,在精确模式的基础上,对长词再次切分,提高召回率,适合用于搜索引擎分词。
支持繁体分词
支持自定义词典
MIT 授权协议
安装说明
代码对 Python 2/3 均兼容
全自动安装: easy_install jieba 或者 pip install jieba / pip3 install jieba
半自动安装:先下载 https://pypi.python.org/pypi/jieba/ ,解压后运行 python setup.py install
手动安装:将 jieba 目录放置于当前目录或者 site-packages 目录
通过 import jieba 来引用
0.42.1
Jan 20, 2020
0.42
Jan 13, 2020
0.41
Jan 08, 2020
0.40
Dec 25, 2019
0.39
Aug 28, 2017
0.38
Dec 16, 2015
0.37
Jun 27, 2015
0.36.2
Apr 17, 2015
0.36
Mar 20, 2015
0.35
Nov 13, 2014
0.34
Oct 20, 2014
0.33
Aug 31, 2014
0.32
Feb 07, 2014
0.31
Jul 29, 2013
0.30
Jul 01, 2013
0.29.1
Jun 17, 2013
0.29
Jun 07, 2013
0.28.4
May 31, 2013
0.28.3
May 02, 2013
0.28.2
Apr 28, 2013
0.28.1
Apr 27, 2013
0.28
Apr 27, 2013
0.27
Apr 22, 2013
0.26.1
Apr 07, 2013
0.26
Apr 07, 2013
0.25
Feb 18, 2013
0.24
Dec 28, 2012
0.23
Dec 12, 2012
0.22
Nov 27, 2012
0.21
Nov 23, 2012
0.20
Nov 06, 2012
Files in release
No dependencies