How to analyze the correlation between two variables

7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

There are multiple ways to analyze the correlation between two variables in Python, here are a few examples:

  • Using numpy.corrcoef(): This function returns the correlation coefficient between two variables. It takes two arrays as input, and returns a 2D array with the correlation coefficients.
import numpy as np
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
print(np.corrcoef(x, y))
  • Using pandas.DataFrame.corr(): This function returns the correlation between columns of a DataFrame as a DataFrame.
import pandas as pd
df = pd.DataFrame({'x': x, 'y': y})
print(df.corr())
  • Using scipy.stats.pearsonr(): This function returns the Pearson correlation coefficient and the p-value for testing non-correlation. It takes two arrays as input and returns a tuple of correlation coefficient and p-value.
from scipy.stats import pearsonr
corr, p_value = pearsonr(x, y)
print(corr)
  • Using seaborn.pairplot() :This function is to quickly visualize the relationship between multiple variables. It creates a matrix of scatterplots of all variables against all variables.
mport seaborn as sns
sns.pairplot(df)
  • Using scipy.stats.spearmanr(): This function returns the Spearman rank-order correlation coefficient and the p-value for testing non-correlation. It takes two arrays as input and returns a tuple of correlation coefficient and p-value.
from scipy.stats import spearmanr
corr, p_value = spearmanr(x, y)
print(corr)

In all the above examples, x and y should be numpy arrays or pandas dataframe or series containing the two variables whose correlation is to be analysed. It is important to note that the correlation coefficient ranges from -1 to 1, where -1 represents a strong negative correlation, 0 represents no correlation, and 1 represents a strong positive correlation.

7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Free Newsletter

Master Data Science in Days, Not Months 🚀

Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚

Weekly simple and practical lessons
Access to ready to use code examples
Skip the math, focus on results
Learn while drinking your coffee

By subscribing, you agree to receive our newsletter. You can unsubscribe at any time.