How to do fuzzy text matching in Python

1 min readLibraryData ManipulationDataRegexNLP
7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

While regular expressions come at handy when you want to check whether a certain pattern is within a text, Fuzzy text matching is extremely useful when you want check whether two strings are similar.

The Levenstein Distance is a famous formula that works very well for comparing how many steps there is between two different sequences (see strings here).

e.g. house => mouse : 1 step

So to say it is the perfect algorithm to auto correct the mistakes of a user entry๐Ÿ˜‰

Or have the closest answer to a user input (e.g. search engine).

Thefuzz library

In order to compute this Levenstein Distance we can use the thefuzz library.

Installation

pip3 install thefuzz[speedup]
Installation of the thefuzz python library

Import

from thefuzz import fuzz
How to import the library

Using the simple ratio

The fuzz.ratio() method will give you a score between 0 to 100 of how similar the two strings are.

fuzz.ratio("this is a test", "this is a test!")
This will output 97/100 as score

There are other methods than the simple ratio if you may need more, you can have a look at the github documentation. The documentation is quite straight forward.

Bravo ! You now know how to do fuzzy text matching in Python !

7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Free Newsletter

Master Data Science in Days, Not Months ๐Ÿš€

Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons โ€“ Approach used by real data scientist. Not bookworms. ๐Ÿ“š

Weekly simple and practical lessons
Access to ready to use code examples
Skip the math, focus on results
Learn while drinking your coffee

By subscribing, you agree to receive our newsletter. You can unsubscribe at any time.