How to do fuzzy text matching in Python

1 min

While regular expressions come at handy when you want to check whether a certain pattern is within a text, Fuzzy text matching is extremely useful when you want check whether two strings are similar.

The Levenstein Distance is a famous formula that works very well for comparing how many steps there is between two different sequences (see strings here).

e.g. house => mouse : 1 step

So to say it is the perfect algorithm to auto correct the mistakes of a user entry😉

Or have the closest answer to a user input (e.g. search engine).

Thefuzz library

In order to compute this Levenstein Distance we can use the thefuzz library.

Installation

pip3 install thefuzz[speedup]
Installation of the thefuzz python library

Import

from thefuzz import fuzz
How to import the library

Using the simple ratio

The fuzz.ratio() method will give you a score between 0 to 100 of how similar the two strings are.

fuzz.ratio("this is a test", "this is a test!")
This will output 97/100 as score

There are other methods than the simple ratio if you may need more, you can have a look at the github documentation. The documentation is quite straight forward.

Bravo ! You now know how to do fuzzy text matching in Python !