How to compute the standard deviation of a DataFrame column

1 min readStatisticsPandasDataFrameData Science
7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

You will often need to compute the standard deviation of a DataFrame column.

Furthermore, in statistics the standard deviation is referred as sigma.

An quite famous analysis is to approximate the range of a value given its two sigma value.

Why ?

The two sigmas search

Well as you can see 95% of your data will be located between our mu - 2 sigma and mu + 2 sigma.

So it is most probable that our variable value will end up in this range.

Let's have a look at real world example data.

Reading an example dataframe

import pandas as pd

# We read a sample dataset from the web.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

Here we have a example dataset that is about iris flowers.

If we look at the sepal_length of versicolor irises and compute the mean and the std.

This is how we compute the standard deviation using the DataFrame.std() method.

Computing the standard deviation

sigma = df[df["species"] == "versicolor"]["sepal_length"].std()
Using the std() method we can compute the standard dev. aka sigma

Computing the two sigma range

mu = df[df["species"] == "versicolor"]["sepal_length"].mean()
Using the mean() method we can compute the mean
upper_bound = mu + 2 * sigma
lower_bound = mu - 2 * sigma

print(f"The sepal length of a versicolor does have a high probability of ending up from {lower_bound} to {upper_bound}")
Using the mean() method we can compute the mean

Here you are, you now know how to compute the two sigma range and will be able to perform statistical tests about it.

7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Free Newsletter

Master Data Science in Days, Not Months 🚀

Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚

Weekly simple and practical lessons
Access to ready to use code examples
Skip the math, focus on results
Learn while drinking your coffee

By subscribing, you agree to receive our newsletter. You can unsubscribe at any time.