How to do fuzzy text matching in Python
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
While regular expressions come at handy when you want to check whether a certain pattern is within a text, Fuzzy text matching is extremely useful when you want check whether two strings are similar.
The Levenstein Distance is a famous formula that works very well for comparing how many steps there is between two different sequences (see strings here).
e.g. house => mouse : 1 step
So to say it is the perfect algorithm to auto correct the mistakes of a user entry๐
Or have the closest answer to a user input (e.g. search engine).
Thefuzz library
In order to compute this Levenstein Distance we can use the thefuzz library.
Installation
Import
Using the simple ratio
The fuzz.ratio() method will give you a score between 0 to 100 of how similar the two strings are.
There are other methods than the simple ratio if you may need more, you can have a look at the github documentation. The documentation is quite straight forward.
Bravo ! You now know how to do fuzzy text matching in Python !
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months ๐
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons โ Approach used by real data scientist. Not bookworms. ๐