Using z-score to rank features

Jun 30th 2021 • 1 min

Sometimes you need to compare features before using it for your linear models.

A necessary transformation comes when the features are not equal in terms of magnitude.

There exist many transformations

log transform
min-max scaling
power transform

And another transform which uses z-scores.

What are z-scores ?

According to the booktext definition a zscore is

The distance (standard dev) from which our value is to the mean.

e.g. A given height : 159 and our sample pop has

Mean (mu) : 168

Standard deviation (sigma) : 12

Z-score formula

Python code

To illustrate the example

mu = 168
sigma = 12
x = 159

z = (x - mu) / sigma

A real world example with data

import numpy as np

# We define a list of our observable heights
heights = [159, 165, 170, 154, 190, 174]

# We compute our sample mean and std
mu = np.mean(heights)
sigma = np.std(heights)

# We compute the zscore for everyone of our observation
z = (heights - mu) / sigma

print(z)
>> array([-0.8333471 , -0.31609717,  0.11494443, -1.2643887 ,  1.83911083,
        0.45977771])

Hey! I'm Bastien! 👋

Using z-score to rank features

What are z-scores ?

Python code

Hey! I'm Bastien! 👋

Click on my face to learn about my story

Best Articles

Introduction to Volume Profiles in Python

Carry Trading: A step by step Guide to Profitable Strategies and Risk Management using Python

How to Read a Folder of CSVs in Python Using DuckDB

How to crawl multiple web pages using Python

How long does it take to learn Python for Data Science in 2023