How to group a DataFrame by a specific column with Pandas using Python
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
The Pandas .groupby() method is an extremely powerful tool that can be used to perform aggregation.
So, why aggregating ?
Here is a simple example.
Imagine you are an e-commerce owner and want to know which of your product did overall sold the most.
So we first need to aggregate by product then .sum() the resulting groupby to obtain the total sales amount per product.
Here is the code
# Import the Pandas library
import pandas as pd
# We create our example dataframe
df = pd.DataFrame({"product" : ["Stickers", "T-shirts", "Mug", "Stickers", "Jeans", "Mug"],
"sales_in_usd" : [10000, 2142, 3321, 11141, 12133, 3321],
"year" : [2020, 2020, 2020, 2021, 2021, 2021]})
# We print the total sales amount per product (all years combined)
print(df.groupby("product")["sales_in_usd"].sum())
# We can plot it
df.groupby("product")["sales_in_usd"].sum().plot(kind='bar', title="total sales per product")
Here you are! You now know how to group a DataFrame by a specific column with Pandas using Python.
More on DataFrames
If you want to know more about DataFrame and Pandas. Check out the other articles I wrote on the topic, just here :
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months 🚀
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚