Sometimes you need to compare features before using it for your linear models.
A necessary transformation comes when the features are not equal in terms of magnitude.
There exist many transformations
- log transform
- min-max scaling
- power transform
And another transform which uses z-scores.
What are z-scores ?
According to the booktext definition a zscore is
The distance (standard dev) from which our value is to the mean.
e.g. A given height : 159 and our sample pop has
Mean (mu) : 168
Standard deviation (sigma) : 12
To illustrate the example
mu = 168 sigma = 12 x = 159 z = (x - mu) / sigma
A real world example with data
import numpy as np # We define a list of our observable heights heights = [159, 165, 170, 154, 190, 174] # We compute our sample mean and std mu = np.mean(heights) sigma = np.std(heights) # We compute the zscore for everyone of our observation z = (heights - mu) / sigma print(z) >> array([-0.8333471 , -0.31609717, 0.11494443, -1.2643887 , 1.83911083, 0.45977771])