How to check if your data is correlated in Python with a scatter matrix

7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

A good way to do statistical analysis is to start by getting a broad picture of what your data is.

What is the distribution of your variables, checking for correlation, etc... A bunch of stuff that will help you go further in your analysis.

A good way to measure the correlation between variables is to use the pandas.plotting.scatter_matrix() method.

This method will plot for you all the variables against themselves which is ideal if you want to check for correlation.

Furthermore, it will also plot each variable distribution.

Here is an example with a dataset containing observations from penguins (fetched from the seaborn library)

# We import the pandas library and its scatter_matrix method
import pandas as pd
from pandas.plotting import scatter_matrix

# We use seaborn only for the example dataset
import seaborn as sns

# We get an example dataset from the seaborn library
df = sns.load_dataset("penguins")

# We plot the scatter matrix
scatter_matrix(df, figsize=(10,10))
How to plot an example of scatter_matrix()
An example of scatter_matrix() plot

A crude analysis

From a visual standpoint, we can draw a line to check for correlation before going deeper into the analysis.

Remember the correlation types visualized

The different correlation types source: Byjus

Here we could deduct that there is some weak correlation between flipper length and body mass as well as between bill length and body mass.

This would make sense from a physical standpoint.

But don't forget, never take anything for granted. You are only looking at a portion of a bigger population. It does not mean that what you see is right is the correct representation of all penguins.

That's it! I hope you liked it if you have any questions you can reach me on my telegram.

More on fundamentals

If you want to know more about Python fundamentals without headaches... check out the other articles I wrote by clicking just here:

Fundamentals - The Python You Need
We gathered the only Python essentials that you will probably ever need.
7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Free Newsletter

Master Data Science in Days, Not Months 🚀

Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚

Weekly simple and practical lessons
Access to ready to use code examples
Skip the math, focus on results
Learn while drinking your coffee

By subscribing, you agree to receive our newsletter. You can unsubscribe at any time.