How to group a DataFrame by a specific column with Pandas using Python
• 1 minThe Pandas .groupby() method is an extremely powerful tool that can be used to perform aggregation.
So, why aggregating ?
Here is a simple example.
Imagine you are an e-commerce owner and want to know which of your product did overall sold the most.
So we first need to aggregate by product then .sum() the resulting groupby to obtain the total sales amount per product.
Here is the code
# Import the Pandas library
import pandas as pd
# We create our example dataframe
df = pd.DataFrame({"product" : ["Stickers", "T-shirts", "Mug", "Stickers", "Jeans", "Mug"],
"sales_in_usd" : [10000, 2142, 3321, 11141, 12133, 3321],
"year" : [2020, 2020, 2020, 2021, 2021, 2021]})
# We print the total sales amount per product (all years combined)
print(df.groupby("product")["sales_in_usd"].sum())
# We can plot it
df.groupby("product")["sales_in_usd"].sum().plot(kind='bar', title="total sales per product")
Here you are! You now know how to group a DataFrame by a specific column with Pandas using Python.
More on DataFrames
If you want to know more about DataFrame and Pandas. Check out the other articles I wrote on the topic, just here :