How to analyze the correlation between two variables
• 1 minThere are multiple ways to analyze the correlation between two variables in Python, here are a few examples:
- Using
numpy.corrcoef()
: This function returns the correlation coefficient between two variables. It takes two arrays as input, and returns a 2D array with the correlation coefficients.
import numpy as np
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
print(np.corrcoef(x, y))
- Using
pandas.DataFrame.corr()
: This function returns the correlation between columns of a DataFrame as a DataFrame.
import pandas as pd
df = pd.DataFrame({'x': x, 'y': y})
print(df.corr())
- Using
scipy.stats.pearsonr()
: This function returns the Pearson correlation coefficient and the p-value for testing non-correlation. It takes two arrays as input and returns a tuple of correlation coefficient and p-value.
from scipy.stats import pearsonr
corr, p_value = pearsonr(x, y)
print(corr)
- Using
seaborn.pairplot()
:This function is to quickly visualize the relationship between multiple variables. It creates a matrix of scatterplots of all variables against all variables.
mport seaborn as sns
sns.pairplot(df)
- Using
scipy.stats.spearmanr()
: This function returns the Spearman rank-order correlation coefficient and the p-value for testing non-correlation. It takes two arrays as input and returns a tuple of correlation coefficient and p-value.
from scipy.stats import spearmanr
corr, p_value = spearmanr(x, y)
print(corr)
In all the above examples, x and y should be numpy arrays or pandas dataframe or series containing the two variables whose correlation is to be analysed. It is important to note that the correlation coefficient ranges from -1 to 1, where -1 represents a strong negative correlation, 0 represents no correlation, and 1 represents a strong positive correlation.