Reducible and Irreducible Errors

Senthilkumar Bala
2 min readDec 27, 2020

--

There is nothing as the perfect blue in this imperfect world

Suppose you are an aspiring data scientist in a procurement team of a large company and want to get your hands dirty on your first prediction task.

The first task that you picked-up was to estimate price forecasts for different categories of products purchased from suppliers.

Current state

Let us assume “Y” is the actual price from a supplier for a particular product in the current scheme of things.

Y = f(X) , where

X = set of variables that you believe influences Y , and

f(X) = function that captures the relation between X & Y

We all know that we live in an imperfect world and hence it would be completely unrealistic to capture all the variables that can influence an outcome. There will always be those “extra” variables which will influence Y, which we might be unaware of. Let’s aggregate these extra variables into something called as an error term ε.

And hence now, the function Y becomes

Y = f(X) + ε

Prediction

Now, comes the fun part, i.e. the prediction part. In simplest term, the prediction part is all about how can I design f(x) in such a way that I get very close to the value of Y, for the given values of X. Let us call this as our estimator function 𝕗 and our predicted value as 𝕐

𝕐 = 𝕗(X) + ε

Great ! Now that we have the actual (Y) and the predicted (𝕐) value, the difference between Y and 𝕐 is the prediction error.

Prediction error is influenced by 2 factors

  1. Difference between f(X) and 𝕗(X), which we term as the Reducible Error
  2. ε , which we call as the Irreducible Error

Given infinite time, we know that we can figure out a good enough estimator and bring down the Reducible Error closer to 0.

So, Reducible errors are those errors which we have control of and can reduce them by choosing more optimal models.

Irreducible Errors are the errors caused by the variables beyond the realm of X (our set of predictor variables). We can’t reduce these errors. This error will continue to stay in the system and are famously termed as Noise or simply the Randomness in the Universe.

To reduce Irreducible error, you can try to understand the domain better and see if you can bring down the Noise (essentially, either remove attributes or add new attributes into the scope of X)

As a novice data scientist, the idea always is to start reducing the Reducible Errors that you have control over. There will always be those errors which cannot be brought down and these are the ones you term as Irreducible errors.

--

--