Standardization vs Normalization

Swagata Ashwani
1 min readMay 21, 2022

--

Photo by Chris Liverani on Unsplash

Feature Scaling is an important step in the data preparation process. When you are preparing your data, the features need to be scaled to the same scale so that there is no bias in any learning algorithms, especially distance based algorithms.

There are two methodologies used for feature scaling-

Standardization

Also known as z-score standardization, Is used to re scale a data set so that it has a mean of 0 and a standard deviation of 1.

x_standard = (xn — x) / s

  • xn: The nth value in the dataset
  • x: The sample mean
  • s: The sample standard deviation

Standardization does not get affected by outliers because there is no predefined range of transformed features.

Normalization

Also known as min-max scaler, is sued to re scale the values into a range of [0,1]. This might be useful in some cases where all parameters need to have the same positive scale. However, the outliers from the data set are lost.

Xnormal=X−Xmin/Xmax−Xmin

  • Xmin: The min value in the dataset
  • Xmax: The max value in the dataset
  • X: The current value in the data set

--

--

Swagata Ashwani
Swagata Ashwani

Written by Swagata Ashwani

I love talking Data! Data Scientist with a passion for finding optimized solutions in the AI space.Follow me here — https://www.linkedin.com/in/swagata-ashwani/

No responses yet