Using z-score to rank features
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Sometimes you need to compare features before using it for your linear models.
A necessary transformation comes when the features are not equal in terms of magnitude.
There exist many transformations
- log transform
- min-max scaling
- power transform
And another transform which uses z-scores.
What are z-scores ?
According to the booktext definition a zscore is
The distance (standard dev) from which our value is to the mean.
e.g. A given height : 159 and our sample pop has
Mean (mu) : 168
Standard deviation (sigma) : 12
Python code
To illustrate the example
mu = 168
sigma = 12
x = 159
z = (x - mu) / sigma
A real world example with data
import numpy as np
# We define a list of our observable heights
heights = [159, 165, 170, 154, 190, 174]
# We compute our sample mean and std
mu = np.mean(heights)
sigma = np.std(heights)
# We compute the zscore for everyone of our observation
z = (heights - mu) / sigma
print(z)
>> array([-0.8333471 , -0.31609717, 0.11494443, -1.2643887 , 1.83911083,
0.45977771])
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months 🚀
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚