How to compute the standard deviation of a DataFrame column
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
You will often need to compute the standard deviation of a DataFrame column.
Furthermore, in statistics the standard deviation is referred as sigma.
An quite famous analysis is to approximate the range of a value given its two sigma value.
Why ?

Well as you can see 95% of your data will be located between our mu - 2 sigma and mu + 2 sigma.
So it is most probable that our variable value will end up in this range.
Let's have a look at real world example data.
Reading an example dataframe
import pandas as pd
# We read a sample dataset from the web.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
Here we have a example dataset that is about iris flowers.
If we look at the sepal_length of versicolor irises and compute the mean and the std.
This is how we compute the standard deviation using the DataFrame.std() method.
Computing the standard deviation
sigma = df[df["species"] == "versicolor"]["sepal_length"].std()
Computing the two sigma range
mu = df[df["species"] == "versicolor"]["sepal_length"].mean()
upper_bound = mu + 2 * sigma
lower_bound = mu - 2 * sigma
print(f"The sepal length of a versicolor does have a high probability of ending up from {lower_bound} to {upper_bound}")
Here you are, you now know how to compute the two sigma range and will be able to perform statistical tests about it.
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months 🚀
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚