How to filter a Pandas DataFrame using Python
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
A DataFrame is one core element of the Pandas library. It is widely used in Data Science.
Filtering might come in handy when performing statistics, etc...
But what can you filter for?
We define a sample DataFrame
Where text equals
Where text contains
There is a similar method for filtering where the text contains a string.
That might be useful when you are trying to filter out a specific email out of an email list and knows the prefix before the @ sign.
Where the number is bigger than
Where the number is lower or equal
Filter for specific dates
Here is how to filter for dates that start on a specific date and end on another specific date.
# we import the library
import pandas as pd
dates = pd.date_range(start="2021-01-01", end="2022-01-02", freq="D")
# we create the sample dataframe with dates
df = pd.DataFrame({"date": dates,
"col1":range(len(dates))})
# We filter for rows that starts on the 2021-06-01 and ends on the 2021-07-01
df[(df["date"] >= "2021-06-01") & (df["date"] <= "2021-07-01")]
Boolean filter
# we import the Pandas library
import pandas as pd
dates = pd.date_range(start="2021-01-01", end="2022-01-02", freq="D")
# we create the sample dataframe with dates
df = pd.DataFrame({"date": dates,
"col1":range(len(dates))})
# We filter for rows that starts on the 2021-06-01 and ends on the 2021-07-01
df[(df["date"] >= "2021-06-01") & (df["date"] <= "2021-07-01")]
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months 🚀
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚