How to filter a DataFrame

2 min readPandasDataFrameFiltering
7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Here is how you can apply a filter on a pandas DataFrame.

We define a sample DataFrame

import pandas as pd

# We read a sample dataset from the web.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
We read our sample dataset

You can download it here if the above script doesn't work for you and read it like so

import pandas as pd

# We read a sample dataset
df = pd.read_csv('./iris.csv')
We read our sample dataset

One conditional statement mask

A condition within squared brackets

# The mask
mask = df["sepal_length"] > .4

# We apply the mask
print(df[mask])
Defining a conditional statement as a mask

it is also possible to it in one-line

print(df[df["sepal_length"] > .4])
Directly within the squared brackets

N conditional statements

It is possible to add more than one conditional statement like so

# The mask
mask_1 = df["sepal_length"] > .4
mask_2 = df["sepal_width"] > 3.1

# We apply the mask
print(df[mask_1 & mask_2])
Defining two conditional statements as masks 

Or the oneliner :

# We apply the mask
print(df[(df["sepal_length"] > .4) & (df["sepal_width"] > 3.1)])
In one line of code

What can you filter for?

Where text equals

# We apply the mask
print(df[df["species"] == "setosa"])
Find all the setosa occurences

Where text contains

There is a similar method for filtering where the text contains a string.

That might be useful when you are trying to filter out a specific email out of an email list and knows the prefix before the @ sign.

# We apply the mask
print(df[df["species"].str.contains("vir")])
We select only the species containing "vir"

Filter for specific dates

Here is how to filter for dates that start on a specific date and end on another specific date.

# we import the library
import pandas as pd

dates = pd.date_range(start="2021-01-01", end="2022-01-02", freq="D")

# we create the sample dataframe with dates
df = pd.DataFrame({"date": dates,
                   "col1":range(len(dates))})

# We filter for rows that starts on the 2021-06-01 and ends on the 2021-07-01
df[(df["date"] >= "2021-06-01") & (df["date"] <= "2021-07-01")]

Boolean filter

# we import the Pandas library
import pandas as pd

dates = pd.date_range(start="2021-01-01", end="2022-01-02", freq="D")

# we create the sample dataframe with dates
df = pd.DataFrame({"date": dates,
                   "col1":range(len(dates))})

# We filter for rows that starts on the 2021-06-01 and ends on the 2021-07-01
df[(df["date"] >= "2021-06-01") & (df["date"] <= "2021-07-01")]
7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Free Newsletter

Master Data Science in Days, Not Months 🚀

Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚

Weekly simple and practical lessons
Access to ready to use code examples
Skip the math, focus on results
Learn while drinking your coffee

By subscribing, you agree to receive our newsletter. You can unsubscribe at any time.