How to deal with missing data in a dataset ?

Swagata Ashwani
2 min readMay 15, 2022

The most common challenge faced by Data Scientists is dealing with missing data in their data sets. The question is why is missing data a problem?

Data Science is all about finding patterns in data and generating insights based on that data. Hence, missing data can have severe effects on a statistical model and ignoring it may lead to a biased estimate that may invalidate statistical results.

Broadly there are two ways to deal with missing data-

Deleting the data

If the data that is missing is insignificant to the model, then you can delete that data.

If the data that is missing is significant, however the number of rows is insignificant in size, in that case as well you can delete those rows of data.

Imputing the data

  1. Replacing with a scalar value -

This strategy involves simply replacing the missing data value with one of the following calculated mathematical values -

a. Mean

b. Median

c. Mode

d. Most frequent

e. Random number

f. Previous value

g. Next value

2. Interpolation

a. Linear Interpolation

Linear interpolation is a technique used to approximate a value of some function by using two known values of that function at other points. This formula can also be understood as a weighted average. The weights are inversely related to the distance from the end points to the unknown point. The closer point has more influence than the farther point.

b. Multiple Interpolation

Similar to Linear Interpolation, however instead of substituting a single value for each missing data point, the missing values are exchanged for values that encompass the natural variability and uncertainty of the right values

c. K Nearest Neighbors

In this method, you choose a distance measure for k neighbors, and the average is used to impute an estimate. The data scientist must select the number of nearest neighbors and the distance metric.

--

--

Swagata Ashwani

I love talking Data! Data Scientist with a passion for finding optimized solutions in the AI space.Follow me here — https://www.linkedin.com/in/swagata-ashwani/