Using z-score to rank features

1 min

Sometimes you need to compare features before using it for your linear models.

A necessary transformation comes when the features are not equal in terms of magnitude.

There exist many transformations

  • log transform
  • min-max scaling
  • power transform

And another transform which uses z-scores.

What are z-scores ?

According to the booktext definition a zscore is

The distance (standard dev) from which our value is to the mean.

e.g. A given height : 159 and our sample pop has

Mean (mu) : 168

Standard deviation (sigma) : 12

Z-score formula

Python code

To illustrate the example

mu = 168
sigma = 12
x = 159

z = (x - mu) / sigma

A real world example with data

import numpy as np

# We define a list of our observable heights
heights = [159, 165, 170, 154, 190, 174]

# We compute our sample mean and std
mu = np.mean(heights)
sigma = np.std(heights)

# We compute the zscore for everyone of our observation
z = (heights - mu) / sigma

print(z)
>> array([-0.8333471 , -0.31609717,  0.11494443, -1.2643887 ,  1.83911083,
        0.45977771])