How to check if your data is correlated in Python with a scatter matrix
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
A good way to do statistical analysis is to start by getting a broad picture of what your data is.
What is the distribution of your variables, checking for correlation, etc... A bunch of stuff that will help you go further in your analysis.
A good way to measure the correlation between variables is to use the pandas.plotting.scatter_matrix() method.
This method will plot for you all the variables against themselves which is ideal if you want to check for correlation.
Furthermore, it will also plot each variable distribution.
Here is an example with a dataset containing observations from penguins (fetched from the seaborn library)
A crude analysis
From a visual standpoint, we can draw a line to check for correlation before going deeper into the analysis.
Remember the correlation types visualized
Here we could deduct that there is some weak correlation between flipper length and body mass as well as between bill length and body mass.
This would make sense from a physical standpoint.
But don't forget, never take anything for granted. You are only looking at a portion of a bigger population. It does not mean that what you see is right is the correct representation of all penguins.
That's it! I hope you liked it if you have any questions you can reach me on my telegram.
More on fundamentals
If you want to know more about Python fundamentals without headaches... check out the other articles I wrote by clicking just here:
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months 🚀
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚