How to do fuzzy text matching in Python
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
While regular expressions come at handy when you want to check whether a certain pattern is within a text, Fuzzy text matching is extremely useful when you want check whether two strings are similar.
The Levenstein Distance is a famous formula that works very well for comparing how many steps there is between two different sequences (see strings here).
e.g. house => mouse : 1 step
So to say it is the perfect algorithm to auto correct the mistakes of a user entry๐
Or have the closest answer to a user input (e.g. search engine).
Thefuzz library
In order to compute this Levenstein Distance we can use the thefuzz library.
Installation
pip3 install thefuzz[speedup]
Import
from thefuzz import fuzz
Using the simple ratio
The fuzz.ratio() method will give you a score between 0 to 100 of how similar the two strings are.
fuzz.ratio("this is a test", "this is a test!")
There are other methods than the simple ratio if you may need more, you can have a look at the github documentation. The documentation is quite straight forward.
Bravo ! You now know how to do fuzzy text matching in Python !
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months ๐
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons โ Approach used by real data scientist. Not bookworms. ๐