Generalized additive models (GAMs) provide a general framework for extending a standard linear model by allowing nonlinear functions of each of the variables, while maintaining additivity. Let’s see what exactly that means,

Linear models are simple to describe and implement and have advantage over other approaches in terms of interpretation and inference. But they have limitations in prediction power, that is, how accurately we can predict the output. Suppose we have data which consist of input of P features (X1, X2,……, Xp), and a output Y. Therefore, the corresponding linear model (also known as **multi linear regression **model) to predict the output:

Y = β0 + β1X1 + β2X2 +···+ βpXp + Ɛ

Where β0, β1,….,βp are parameters of the equation and Ɛis the irreducible error , in order to allow for non-linear relationships between each feature and the response(output) is to replace each linear component βjXj with a (smooth) nonlinear function fj(Xj) which corresponds to the jth feature . We would then write the model as

Y = β0 + f1(X1) + f2(X2) + f3(X3) +……+ fp(Xp)+Ɛ

This is an example of a GAM. It is called an additive model because we calculate a separate fj for each Xj, and then add together all of their contributions. Now the question is how to find this nonlinear function? It turns out there are various methods, but we will specifically be looking at Natural Splines for below example:

Wage = β0 + f1(year)+f2(age)+f3(education)+Ɛ — — — — — -(1)

Before discussion on natural splines it is worth noting that the relationship which exist in real world data is often nonlinear, and a lot of time very complex, that is, even a standard nonlinear function will not prove to be a good approximation of the relation. Now, **natural splines are piece-wise degree ‘d’ polynomials whose first ‘d-1’ derivatives are continuous with additional boundary constraints **, Instead of fitting a high-degree polynomial over the entire range of feature space, piece-wise polynomial regression involves fitting separate low-degree polynomials, to be concrete, in the equation (1) we are predicting wage on the basis of years, age and education. Here we are independently fitting the functions keeping other features constant, that is, prediction of ‘wage’ on the basis of ‘age’ keeping ‘year’ and ‘education’ constant, Now we know as the ‘age’ increases ‘wages’ increases but after retirement the wages fall, that means up to a certain ‘age’ the relationship is increasing and after which it is decreasing therefore, we fit a polynomial until say age 60 which gives increasing relationship and then after 60, another polynomial to capture decreasing relationship, so it unable us to be flexibly extract relationship between feature and the response. The constraints(continuity of derivatives) unable us to smoothly join these two polynomials.

#machine-learning #data-science #deep learning

What are GAMs? Generalized additive models
4.00 GEEK