Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
- OSI Approved :: BSD License
Natural Language
- Arabic
- Basque
- Catalan
- Danish
- Dutch
- English
- Esperanto
- Finnish
- French
- German
- Greek
- Hindi
- Hungarian
- Indonesian
- Irish
- Italian
- Lithuanian
- Nepali
- Norwegian
- Portuguese
- Romanian
- Russian
- Serbian
- Spanish
- Swedish
- Tamil
- Turkish
Operating System
- OS Independent
Programming Language
- C
- Other
- Python
- Python :: 2
- Python :: 2.6
- Python :: 2.7
- Python :: 3
- Python :: 3.3
- Python :: 3.4
- Python :: 3.5
- Python :: 3.6
- Python :: 3.7
- Python :: 3.8
- Python :: 3.9
- Python :: 3.10
- Python :: 3.11
- Python :: 3.12
- Python :: 3.13
Topic
- Database
- Internet :: WWW/HTTP :: Indexing/Search
- Text Processing :: Indexing
- Text Processing :: Linguistic
Stemming algorithms
PyStemmer provides access to efficient algorithms for calculating a “stemmed” form of a word. This is a form with most of the common morphological endings removed; hopefully representing a common linguistic base form. This is most useful in building search engines and information retrieval software; for example, a search with stemming enabled should be able to find a document containing “cycling” given the query “cycles”.
PyStemmer provides algorithms for several (mainly European) languages, by wrapping the libstemmer library from the Snowball project in a Python module.
It also provides access to the classic Porter stemming algorithm for English: although this has been superseded by an improved algorithm, the original algorithm may be of interest to information retrieval researchers wishing to reproduce results of earlier experiments.