How to apply a filter to a Pandas DataFrame
• 1 minYou can use the boolean indexing
technique to filter a Pandas DataFrame based on certain conditions. Here's an example:
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'Salary': [50000, 55000, 60000, 65000]}
df = pd.DataFrame(data)
# Filter the DataFrame to only include rows where the salary is greater than 55000
filtered_df = df[df['Salary'] > 55000]
You can also chain multiple conditions together using the & (and) and | (or) operators. Here's an example:
filtered_df = df[(df['Salary'] > 55000) & (df['Age'] > 30)]
You can also use the query() method, which can be used to filter the DataFrame using a query string.
filtered_df = df.query('Salary > 55000 and Age > 30')
You can also use the .loc[]
accessor for filtering the DataFrame, where you can pass the conditions as a string
filtered_df = df.loc[df['Salary'] > 55000]
It's also possible to filter the DataFrame using the .where()
method, which returns a new DataFrame with the same shape as the original, but with only the rows that satisfy the given condition.
filtered_df = df.where(df['Salary'] > 55000)
You can also use .apply()
method to filter the dataframe if you want to use a custom function to filter the dataframe.
filtered_df = df[df.apply(lambda x : x['Salary'] > 55000 and x['Age'] > 30, axis = 1)]
In all the examples above, filtered_df
will contain only the rows from the original DataFrame where the salary is greater than 55000 and Age is greater than 30.