Bayesian Optimization

Optimization is at the core of modern Machine Learning. Why? Linear regression, minimizing the error sum of squares. Logistic Regression, minimizing the negative likelihood. Support Vector Machines, minimizing the negative distance between the two support vectors.

Instead of an explanation heavy with math definitions and equations, let us understand about Optimization in simpler terms. “It’s the last day of the month and you are left with only $30. You are craving to have some delicious food. You wanted to have at least one variety among a pizza ($5 per piece), a burger ($3 per piece) and a coke ($1 per piece). You can’t have more than 4 pizzas and should order at least two burgers”. There you go! A real-life optimization problem: Maximizing or minimizing an objective function subject to some constraints”

Is the above problem a Linear or a Non-Linear Optimization? Linear Optimization involves objective function made up of only linear terms (or first-degree terms) and if it has any second-degree terms or nonlinear equations, then it’s a nonlinear optimization.

We shall only worry about the non-linear optimization paradigm in this article — these problems are much more complex to solve than their linear counterpart…

Let’s play a game! There is a function f(x), which I won’t reveal. This function takes real numbers. The task is this: You have to approximate the minimum (lowest) value of the function in the interval [-150, 350]. I will tell you the value of the function at any value x that you ask. But here’s the catch, you have only limited asks and to be exact, only 5. Well in between -150 and 350 there are infinite real numbers, so choose carefully.

Let’s say your first ask is the nice-and-easy 0.

Now anyone rational would choose one positive and one negative number, so let me do that on your behalf. I will share f(x) at x = 150 and x=-50

That’s an interesting set of points, isn’t it? Are you thinking what I am thinking? Choosing next point around -50 to check if our function might have its minimum value around there? In optimization terms it is called Exploitation. Its like,when you know that the function value at a point is close to our desired value when compared to other values, you will keep on exploiting that region. I do not want to waste my last two chances by exploiting. May be in between 0 and 100, I might have our desired value, the region which we haven’t even explored yet. In optimization terminology this is called as Exploration, you already know what your function outputs for few values, so instead of exploiting around the known values, we will explore new regions to find our desired value. A good optimization algorithm should strike a balance between exploration and exploitation.

#data-science #gaussian-process #optimization #bayesian-optimization #machine-learning #deep learning

towardsdatascience.com

Bayesian Optimization