It is often useful to look at the statistics of each columns in order to make a quick analysis of our data.

To do so, we can use the Pandas .describe() method.

import matplotlib.pyplot as plt
import pandas as pd

# We read a sample dataset from the web.
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

We use a sample DataFrame

print(df.describe())

We use the describe method to print each column statistics

You would end up with the following indicators

The count
The mean
The minima
The standard deviation
The 25 percentile
The 50 percentile
The 75 percentile
The maxima

Giving you a good overview of how your data looks like.

Categorical Variables

By default .describe() will only compute the statistics of numeric values, but if you have categorical variables, you could also use the include="all" which will include all variables types.

print(df.describe(include='all'))

Describe all variable types

Timestamps

.describe() can also describe timestamp but you will have to specify it using the datetime_is_numeric=True parameter.

print(df_containing_timestamp.describe(datetime_is_numeric=True))

Describe timestamp data

How to compute the statistics of a DataFrame

Categorical Variables

Timestamps

Learn how you can get paid $300/hour ...

while learning how to code Python... 👈

Best Articles

Introduction to Volume Profiles in Python

Carry Trading: A step by step Guide to Profitable Strategies and Risk Management using Python

How to Read a Folder of CSVs in Python Using DuckDB

How to crawl multiple web pages using Python

How long does it take to learn Python for Data Science in 2023