How to analyze the correlation between two variables

1 min

There are multiple ways to analyze the correlation between two variables in Python, here are a few examples:

  • Using numpy.corrcoef(): This function returns the correlation coefficient between two variables. It takes two arrays as input, and returns a 2D array with the correlation coefficients.
import numpy as np
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
print(np.corrcoef(x, y))
  • Using pandas.DataFrame.corr(): This function returns the correlation between columns of a DataFrame as a DataFrame.
import pandas as pd
df = pd.DataFrame({'x': x, 'y': y})
print(df.corr())
  • Using scipy.stats.pearsonr(): This function returns the Pearson correlation coefficient and the p-value for testing non-correlation. It takes two arrays as input and returns a tuple of correlation coefficient and p-value.
from scipy.stats import pearsonr
corr, p_value = pearsonr(x, y)
print(corr)
  • Using seaborn.pairplot() :This function is to quickly visualize the relationship between multiple variables. It creates a matrix of scatterplots of all variables against all variables.
mport seaborn as sns
sns.pairplot(df)
  • Using scipy.stats.spearmanr(): This function returns the Spearman rank-order correlation coefficient and the p-value for testing non-correlation. It takes two arrays as input and returns a tuple of correlation coefficient and p-value.
from scipy.stats import spearmanr
corr, p_value = spearmanr(x, y)
print(corr)

In all the above examples, x and y should be numpy arrays or pandas dataframe or series containing the two variables whose correlation is to be analysed. It is important to note that the correlation coefficient ranges from -1 to 1, where -1 represents a strong negative correlation, 0 represents no correlation, and 1 represents a strong positive correlation.