A Handwritten Introduction to Linear and Non-Linear Least-Square Regression

If you are given a distribution of grades, heights or income across a population — which usually correspond to a normal distribution —, one of the things you may want to do is to find the mean and standard deviation that best approximate your data. This is an example of something we can achieve using least-square regression. In this article, we will understand how we can use this method to find the optimal parameters of some function to obtain the most accurate approximation of a set of data points.

We will start by tackling the simplest case: linear least squares. It is simple enough to be solved algebraically, and we will calculate a general formula that gives us the best slope and y-intercept to approximate a dataset. We will then move on to the general case, where we will try to optimise the cost of predictions given by any type of function. We will focus on the implementation of the most basic method, but we will also briefly discuss some potential improvements. We will eventually see how we can use a single function call in Scipy to perform non-linear least-square regression efficiently.

This article assumes (very) basic knowledge of calculus (using the chain rule to differentiate functions) and Python/Numpy.

In a nutshell

Let’s say you have a set of data points and a function f: x, a1 … an -> y which depends on an input x and several parameters. You data has been generated by a function of the form of f, with some noise and unknown parameters. Your goal is to find the best parameters a1 to an to obtain a mathematical expression that approximates the dataset.

You start with an initial guess, by randomly setting all the parameters to some random values. Now, in order to know how to improve you model, you need to calculate its cost, which measures the badness of the predictions. The cost is given by the formula below:

where n is the number of observations.

Squaring the difference enables to get a positive cost (predicting 3 instead of 1 is as bad as predicting -1) and to exacerbate high costs while mitigating low ones.

Using the derivative of the cost function with respect to every parameter, we can obtain an expression for the values that minimise the cost, and, therefore, give the best results.

#python #mathematics #statistics #machine-learning #computer-science

In a nutshell

medium.com

A Handwritten Introduction to Linear and Non-Linear Least-Square Regression