How to apply a filter to a Pandas DataFrame

1 min

You can use the boolean indexing technique to filter a Pandas DataFrame based on certain conditions. Here's an example:

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Salary': [50000, 55000, 60000, 65000]}
df = pd.DataFrame(data)

# Filter the DataFrame to only include rows where the salary is greater than 55000
filtered_df = df[df['Salary'] > 55000]

You can also chain multiple conditions together using the & (and) and | (or) operators. Here's an example:

filtered_df = df[(df['Salary'] > 55000) & (df['Age'] > 30)]

You can also use the query() method, which can be used to filter the DataFrame using a query string.

filtered_df = df.query('Salary > 55000 and Age > 30')

You can also use the .loc[] accessor for filtering the DataFrame, where you can pass the conditions as a string

filtered_df = df.loc[df['Salary'] > 55000]

It's also possible to filter the DataFrame using the .where() method, which returns a new DataFrame with the same shape as the original, but with only the rows that satisfy the given condition.

filtered_df = df.where(df['Salary'] > 55000)

You can also use .apply() method to filter the dataframe if you want to use a custom function to filter the dataframe.

filtered_df = df[df.apply(lambda x : x['Salary'] > 55000 and x['Age'] > 30, axis = 1)]

In all the examples above, filtered_df will contain only the rows from the original DataFrame where the salary is greater than 55000 and Age is greater than 30.