Optimization in Machine Learning
The goal of building machine learning models is to reduce the error to the minimum so that we can have the best generic model that can preform well for any data set.
Learn better and better models, such that overall model error gets smaller and smaller … ideally, as small as possible! That’s when Optimization comes into the picture.
Optimization
In ML, use optimization to minimize an error function of the ML model
Error function: error=f(w), where w= input, f= function
Optimizing the error function:
Minimizing means finding the input that results in the lowest value
Maximizing, means finding that gives the largest
Above two examples show the lowest points in the function that needs to be reached to get the optimized model. The best way to do that is using a method called Gradient Descent.
Gradient Descent Optimization:
Gradient: direction and rate of the fastest increase of a function.
It can be calculated with partial derivatives of the function with respect to each input variable in .
Because it has a direction, the gradient is a “vector”.
As we go towards to the bottom part of the function, gradient gets smaller and becomes zero (i.e., function can no longer change, can no longer decrease — it reached the min!)
Gradient Descent method uses gradients to find the minimum of a function iteratively.
Taking steps (proportional to the gradient size) towards the minimum, in the opposite direction of the gradient.
Gradient Descent Algorithm:
1. Start at an initial point
2. After each iteration, update:
Important part to remember here is to choose the step size value carefully, if we choose a large step size the we might miss the global minimum and reach the local minimum, as well as avoid using a small step size since it will be computationally expensive.
Happy Optimizing!