How to check if your data is correlated in Python with a scatter matrix

2 min

A good way to do statistical analysis is to start by getting a broad picture of what your data is.

What is the distribution of your variables, checking for correlation, etc... A bunch of stuff that will help you go further in your analysis.

A good way to measure the correlation between variables is to use the pandas.plotting.scatter_matrix() method.

This method will plot for you all the variables against themselves which is ideal if you want to check for correlation.

Furthermore, it will also plot each variable distribution.

Here is an example with a dataset containing observations from penguins (fetched from the seaborn library)

# We import the pandas library and its scatter_matrix method
import pandas as pd
from pandas.plotting import scatter_matrix

# We use seaborn only for the example dataset
import seaborn as sns

# We get an example dataset from the seaborn library
df = sns.load_dataset("penguins")

# We plot the scatter matrix
scatter_matrix(df, figsize=(10,10))
How to plot an example of scatter_matrix()
An example of scatter_matrix() plot

A crude analysis

From a visual standpoint, we can draw a line to check for correlation before going deeper into the analysis.

Remember the correlation types visualized

The different correlation types source: Byjus

Here we could deduct that there is some weak correlation between flipper length and body mass as well as between bill length and body mass.

This would make sense from a physical standpoint.

But don't forget, never take anything for granted. You are only looking at a portion of a bigger population. It does not mean that what you see is right is the correct representation of all penguins.

That's it! I hope you liked it if you have any questions you can reach me on my telegram.

More on fundamentals

If you want to know more about Python fundamentals without headaches... check out the other articles I wrote by clicking just here:

Fundamentals - The Python You Need
We gathered the only Python essentials that you will probably ever need.