Line search methods in optimization. We will review the theory for line search methods in optimization, and end with a practical implementation.
We will review the theory for line search methods in optimization, and end with a practical implementation.
In all optimization problems, we are ultimately interested in using a computer to find the parameters
x that minimize some function
-f(x) , if it is a maximization problem). Starting from an initial starting guess
x_0, it is common to proceed in one of three ways:
*Gradient-free optimization *— don’t laugh! Everyone does this. Here we are just guessing the next parameters:
for some random guess
p to try to minimize the function, and evaluate the function
f(x+p) at the new point to check if the step should be accepted. Actually we all do this — in almost any optimization algorithm, some hyperparameters need to be tuned (e.g. learning rates in machine learning, or initialization of parameters in any optimization problem). While some automatic strategies for tuning these also exist, it’s common to just try different values and see what works best.
First order methods — these are methods that use the first derivative
\nabla f(x) to evaluate the search direction. A common update rule is gradient descent:
for a hyperparameter
*Second order methods — *here we are using the Hessian
\nabla^2 f(x) to pick the next step. A common update rule is Newton’s rule:
Often, a step size
\eta \in (0,1] is included, which is sometimes known as the damped or relaxed Newton’s method.
This, however, is often not sufficient. First, the Hessian is usually expensive to compute. Second, we are generally interested in finding a_ descent direction, _i.e. one for which the product
is negative. This is equivalent to requiring that
\nabla^2 f(x) is positive definite. To ensure that we are always following a descent direction, q_uasi-Newton methods_ replace the true inverse Hessian with a positive definite estimate, e.g. by regularizing it with a small diagonal matrix:
for a small
I as the identity matrix. Note that we _always _choose
\eps = 0 if the true Hessian
\nabla^2 f(x) is positive definite. Additionally, it is popular to construct an estimate from changing first order gradients that is also cheaper to compute, e.g. using the BFGS algorithm.
In this article, we explore gradient descent - the grandfather of all optimization techniques and it’s variations. We implement them from scratch with Python.
How To Plot A Decision Boundary For Machine Learning Algorithms in Python, you will discover how to plot a decision surface for a classification machine learning algorithm.
Learn Python Programming
Description We love Programming. Our aim with this course is to create a love for Programming. Python is one of the most popular programming languages. Python offers both object oriented and structural programming features. We take an hands-on...
In this article, I will take you through an explanation and implementation of all Machine Learning algorithms with Python programming…