pyxdameraulevenshtein 1.10.0


pip install pyxdameraulevenshtein

  Latest version

Released: Mar 18, 2026

Project Links

Meta
Author: Geoffrey Fairchild
Maintainer: Geoffrey Fairchild
Requires Python: >=3.9

Classifiers

Development Status
  • 5 - Production/Stable

Intended Audience
  • Developers
  • Education
  • Science/Research

License
  • OSI Approved :: BSD License

Operating System
  • OS Independent

Programming Language
  • Cython
  • Python :: 3
  • Python :: 3.9
  • Python :: 3.10
  • Python :: 3.11
  • Python :: 3.12
  • Python :: 3.13
  • Python :: 3.14

Topic
  • Scientific/Engineering :: Bio-Informatics
  • Scientific/Engineering :: Information Analysis
  • Text Processing :: Linguistic

pyxDamerauLevenshtein

Test

LICENSE

This software is licensed under the BSD 3-Clause License. Please refer to the separate LICENSE file for the exact text of the license. You are obligated to give attribution if you use this code.

ABOUT

pyxDamerauLevenshtein implements the Damerau-Levenshtein (DL) edit distance algorithm for Python in Cython for high performance. Courtesy Wikipedia:

In information theory and computer science, the Damerau-Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein) is a "distance" (string metric) between two strings, i.e., finite sequence of symbols, given by counting the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters.

This implementation is based on Michael Homer's pure Python implementation, which implements the optimal string alignment distance algorithm. It runs in O(N*M) time using O(M) space. It supports unicode characters.

REQUIREMENTS

This code requires Python 3.9+, C compiler such as GCC, and Cython.

INSTALL

pyxDamerauLevenshtein is available on PyPI at https://pypi.org/project/pyxDamerauLevenshtein/.

Install using pip:

pip install pyxDamerauLevenshtein

Install from source:

pip install .

USING THIS CODE

The following methods are available:

  • Edit distance (damerau_levenshtein_distance)

    • Compute the raw distance between two sequences (i.e., the minimum number of operations necessary to transform one sequence into the other).
    • Supports any sequence type: str, list, tuple, range, and more.
    • Optionally accepts a max_distance integer threshold. If the true distance exceeds it, max_distance + 1 is returned immediately, avoiding unnecessary computation.
  • Normalized edit distance (normalized_damerau_levenshtein_distance)

    • Compute the ratio of the edit distance to the length of max(seq1, seq2). 0.0 means that the sequences are identical, while 1.0 means that they have nothing in common. Note that this definition is the exact opposite of difflib.SequenceMatcher.ratio().
    • Optionally accepts a max_distance float threshold. If the true normalized distance exceeds it, a value greater than max_distance is returned immediately.
  • Edit distance against a sequence of sequences (damerau_levenshtein_distance_seqs)

    • Compute the raw distances between a sequence and each sequence within another sequence (e.g., list, tuple).
    • Optionally accepts a max_distance threshold forwarded to each individual computation.
  • Normalized edit distance against a sequence of sequences (normalized_damerau_levenshtein_distance_seqs)

    • Compute the normalized distances between a sequence and each sequence within another sequence (e.g., list, tuple).
    • Optionally accepts a max_distance threshold forwarded to each individual computation.

Basic use:

from pyxdameraulevenshtein import damerau_levenshtein_distance, normalized_damerau_levenshtein_distance
damerau_levenshtein_distance('smtih', 'smith')  # expected result: 1
normalized_damerau_levenshtein_distance('smtih', 'smith')  # expected result: 0.2
damerau_levenshtein_distance([1, 2, 3, 4, 5, 6], [7, 8, 9, 7, 10, 11, 4])  # expected result: 7

# max_distance short-circuits when the true distance exceeds the threshold
damerau_levenshtein_distance('saturday', 'sunday', max_distance=2)  # expected result: 3 (max_distance + 1)
normalized_damerau_levenshtein_distance('smtih', 'smith', max_distance=0.5)  # expected result: 0.2 (within threshold)

from pyxdameraulevenshtein import damerau_levenshtein_distance_seqs, normalized_damerau_levenshtein_distance_seqs
array = ['test1', 'test12', 'test123']
damerau_levenshtein_distance_seqs('test', array)  # expected result: [1, 2, 3]
normalized_damerau_levenshtein_distance_seqs('test', array)  # expected result: [0.2, 0.3333333333333333, 0.42857142857142855]

DIFFERENCES

Other Python DL implementations:

pyxDamerauLevenshtein differs from other Python implementations in that it is both fast via Cython and supports unicode. Michael Homer's implementation is fast for Python, but it is two orders of magnitude slower than this Cython implementation. jellyfish provides C implementations for a variety of string comparison metrics and is sometimes faster than pyxDamerauLevenshtein.

Python's built-in difflib.SequenceMatcher.ratio() performs about an order of magnitude faster than Michael Homer's implementation but is still one order of magnitude slower than this DL implementation. difflib, however, uses a different algorithm (difflib uses the Ratcliff/Obershelp algorithm).

Performance differences (on Intel i7-2600 running at 3.4Ghz):

>>> import timeit
>>> #this implementation:
... timeit.timeit("damerau_levenshtein_distance('e0zdvfb840174ut74j2v7gabx1 5bs', 'qpk5vei 4tzo0bglx8rl7e 2h4uei7')", 'from pyxdameraulevenshtein import damerau_levenshtein_distance', number=500000)
7.417556047439575
>>> #Michael Homer's native Python implementation:
... timeit.timeit("dameraulevenshtein('e0zdvfb840174ut74j2v7gabx1 5bs', 'qpk5vei 4tzo0bglx8rl7e 2h4uei7')", 'from dameraulevenshtein import dameraulevenshtein', number=500000)
667.0276439189911
>>> #difflib
... timeit.timeit("difflib.SequenceMatcher(None, 'e0zdvfb840174ut74j2v7gabx1 5bs', 'qpk5vei 4tzo0bglx8rl7e 2h4uei7').ratio()", 'import difflib', number=500000)
135.41051697731018

Wheel compatibility matrix

Platform CPython 3.9 CPython 3.10 CPython 3.11 CPython 3.12 CPython 3.13 CPython 3.14
macosx_10_13_x86_64
macosx_10_9_x86_64
macosx_11_0_arm64
manylinux1_x86_64
manylinux_2_28_x86_64
manylinux_2_5_x86_64
musllinux_1_2_x86_64
win_amd64

Files in release

pyxdameraulevenshtein-1.10.0-cp310-cp310-macosx_10_9_x86_64.whl (116.2KiB)
pyxdameraulevenshtein-1.10.0-cp310-cp310-macosx_11_0_arm64.whl (116.8KiB)
pyxdameraulevenshtein-1.10.0-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (122.7KiB)
pyxdameraulevenshtein-1.10.0-cp310-cp310-musllinux_1_2_x86_64.whl (123.1KiB)
pyxdameraulevenshtein-1.10.0-cp310-cp310-win_amd64.whl (114.1KiB)
pyxdameraulevenshtein-1.10.0-cp311-cp311-macosx_10_9_x86_64.whl (116.0KiB)
pyxdameraulevenshtein-1.10.0-cp311-cp311-macosx_11_0_arm64.whl (116.3KiB)
pyxdameraulevenshtein-1.10.0-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (122.5KiB)
pyxdameraulevenshtein-1.10.0-cp311-cp311-musllinux_1_2_x86_64.whl (122.8KiB)
pyxdameraulevenshtein-1.10.0-cp311-cp311-win_amd64.whl (114.1KiB)
pyxdameraulevenshtein-1.10.0-cp312-cp312-macosx_10_13_x86_64.whl (117.5KiB)
pyxdameraulevenshtein-1.10.0-cp312-cp312-macosx_11_0_arm64.whl (117.5KiB)
pyxdameraulevenshtein-1.10.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (123.7KiB)
pyxdameraulevenshtein-1.10.0-cp312-cp312-musllinux_1_2_x86_64.whl (124.1KiB)
pyxdameraulevenshtein-1.10.0-cp312-cp312-win_amd64.whl (114.6KiB)
pyxdameraulevenshtein-1.10.0-cp313-cp313-macosx_10_13_x86_64.whl (117.9KiB)
pyxdameraulevenshtein-1.10.0-cp313-cp313-macosx_11_0_arm64.whl (117.7KiB)
pyxdameraulevenshtein-1.10.0-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (123.8KiB)
pyxdameraulevenshtein-1.10.0-cp313-cp313-musllinux_1_2_x86_64.whl (124.2KiB)
pyxdameraulevenshtein-1.10.0-cp313-cp313-win_amd64.whl (114.7KiB)
pyxdameraulevenshtein-1.10.0-cp314-cp314-macosx_10_13_x86_64.whl (121.5KiB)
pyxdameraulevenshtein-1.10.0-cp314-cp314-macosx_11_0_arm64.whl (121.4KiB)
pyxdameraulevenshtein-1.10.0-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (123.8KiB)
pyxdameraulevenshtein-1.10.0-cp314-cp314-musllinux_1_2_x86_64.whl (127.6KiB)
pyxdameraulevenshtein-1.10.0-cp314-cp314-win_amd64.whl (118.9KiB)
pyxdameraulevenshtein-1.10.0-cp39-cp39-macosx_10_9_x86_64.whl (116.2KiB)
pyxdameraulevenshtein-1.10.0-cp39-cp39-macosx_11_0_arm64.whl (116.9KiB)
pyxdameraulevenshtein-1.10.0-cp39-cp39-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (122.8KiB)
pyxdameraulevenshtein-1.10.0-cp39-cp39-musllinux_1_2_x86_64.whl (123.3KiB)
pyxdameraulevenshtein-1.10.0-cp39-cp39-win_amd64.whl (114.2KiB)
pyxdameraulevenshtein-1.10.0.tar.gz (86.7KiB)
No dependencies