Line search methods in optimization

Line search methods in optimization

Line search methods in optimization. We will review the theory for line search methods in optimization, and end with a practical implementation.

You can find the complete code for this tutorial on GitHub here.

We will review the theory for line search methods in optimization, and end with a practical implementation.

Motivation

In all optimization problems, we are ultimately interested in using a computer to find the parameters x that minimize some function f(x) (or -f(x) , if it is a maximization problem). Starting from an initial starting guess x_0, it is common to proceed in one of three ways:

  • *Gradient-free optimization *— don’t laugh! Everyone does this. Here we are just guessing the next parameters:

    Image for post

for some random guess p to try to minimize the function, and evaluate the function f(x+p) at the new point to check if the step should be accepted. Actually we all do this — in almost any optimization algorithm, some hyperparameters need to be tuned (e.g. learning rates in machine learning, or initialization of parameters in any optimization problem). While some automatic strategies for tuning these also exist, it’s common to just try different values and see what works best.

  • First order methods — these are methods that use the first derivative \nabla f(x) to evaluate the search direction. A common update rule is gradient descent:

    Image for post

for a hyperparameter \lambda .

  • *Second order methods — *here we are using the Hessian \nabla^2 f(x) to pick the next step. A common update rule is Newton’s rule:

    Image for post

Often, a step size \eta \in (0,1] is included, which is sometimes known as the damped or relaxed Newton’s method.

This, however, is often not sufficient. First, the Hessian is usually expensive to compute. Second, we are generally interested in finding a_ descent direction, _i.e. one for which the product

Image for post

is negative. This is equivalent to requiring that \nabla^2 f(x) is positive definite. To ensure that we are always following a descent direction, q_uasi-Newton methods_ replace the true inverse Hessian with a positive definite estimate, e.g. by regularizing it with a small diagonal matrix:

Image for post

for a small \eps and I as the identity matrix. Note that we _always _choose \eps = 0 if the true Hessian \nabla^2 f(x) is positive definite. Additionally, it is popular to construct an estimate from changing first order gradients that is also cheaper to compute, e.g. using the BFGS algorithm.

python programming machine-learning algorithms optimization

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

ML Optimization pt.1 - Gradient Descent with Python

In this article, we explore gradient descent - the grandfather of all optimization techniques and it’s variations. We implement them from scratch with Python.

How To Plot A Decision Boundary For Machine Learning Algorithms in Python

How To Plot A Decision Boundary For Machine Learning Algorithms in Python, you will discover how to plot a decision surface for a classification machine learning algorithm.

Learn Python Programming

Learn Python Programming

Learn Programming With Python In 100 Steps

Description We love Programming. Our aim with this course is to create a love for Programming. Python is one of the most popular programming languages. Python offers both object oriented and structural programming features. We take an hands-on...

32 Machine Learning Algorithms Explained with Python

In this article, I will take you through an explanation and implementation of all Machine Learning algorithms with Python programming…