Bootstrapping — What is the goal?

Swagata Ashwani
2 min readMay 28, 2022

--

Photo by Christopher Gower on Unsplash

What is Bootstrapping?

To understand Bootstrapping, let us start with a simple problem:
We are given a bunch of prices of houses, and we want to know the median price of a house.
It is easy to compute the median directly, but how can we compute the error bars?

If it was the mean, we could make some assumptions and apply standard statistical techniques and get the correct result.
However, no similar technique exists for the median.

In general, if there is no explicit formula for the distribution of errors and there is not any simple way to try to understand accuracy of measure values.
However, if we had infinite data, it’d be easy to solve this problem-
Measure the quantity in many independent datasets of the same fixed size
Use the empirical distribution to provide the distribution.

The problem here is that we never will have infinite data!

We might get your 1000 data points once, and then need to work from that.
The question becomes:

“How can we expand a single fixed dataset to treat it like 1000 independent ones?”

There is a solution!!

Sampling with Replacement

What happens if we treat our data as the true distribution, and draw synthetic data datasets from this?
To create synthetic datasets, we sample with replacement from our dataset:
Given dataset:

[1, 2, 4, 5, 7, 9,10]

Median: 5

Potential samples with associated medians:
[ 1, 1, 2, 4, 9,10,10], median: 4
[ 2, 4, 5, 5, 7, 7, 7], median: 5
[ 1, 1, 1, 1, 1, 1, 1], median: 1
[ 1, 2, 4, 5, 7, 9,10], median: 5
… so on

The distribution of these medians gives us a guess at the true distribution of the medians over a data set of this size.

True-

Bootstrapped-

--

--

Swagata Ashwani
Swagata Ashwani

Written by Swagata Ashwani

I love talking Data! Data Scientist with a passion for finding optimized solutions in the AI space.Follow me here — https://www.linkedin.com/in/swagata-ashwani/

No responses yet