1669018963

In this blog, we will see the techniques used to overcome overfitting for a lasso regression model. Regularization is one of the methods widely used to make your model more generalized.

Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point as the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

Lasso Regression uses L1 regularization technique (will be discussed later in this article). It is used when we have more features because it automatically performs feature selection.

The word “LASSO” stands for **L**east **A**bsolute **S**hrinkage and **S**election **O**perator. It is a statistical formula for the regularisation of data models and feature selection.

Regularization is an important concept that is used to avoid overfitting of the data, especially when the trained and test data are much varying.

Regularization is implemented by adding a “penalty” term to the best fit derived from the trained data, to achieve a *lesser variance* with the tested data and also restricts the influence of predictor variables over the output variable by compressing their coefficients.

In regularization, what we do is normally we keep the same number of features but reduce the magnitude of the coefficients. We can reduce the magnitude of the coefficients by using different types of regression techniques which uses regularization to overcome this problem. So, let us discuss them. Before we move further, you can also upskill with the help of online courses on Linear Regression in Python and enhance your skills.

There are two main regularization techniques, namely Ridge Regression and Lasso Regression. They both differ in the way they assign a penalty to the coefficients. In this blog, we will try to understand more about Lasso Regularization technique.

If a regression model uses the L1 Regularization technique, then it is called Lasso Regression. If it used the L2 regularization technique, it’s called Ridge Regression. We will study more about these in the later sections.

L1 regularization adds a penalty that is equal to the absolute value of the magnitude of the coefficient. This regularization type can result in sparse models with few coefficients. Some coefficients might become zero and get eliminated from the model. Larger penalties result in coefficient values that are closer to zero (ideal for producing simpler models). On the other hand, L2 regularization does not result in any elimination of sparse models or coefficients. Thus, Lasso Regression is easier to interpret as compared to the Ridge. While there are ample resources available online to help you understand the subject, there’s nothing quite like a certificate. Check out Great Learning’s **PG program in Artificial Intelligence** to upskill in the domain. This course will help you learn from a top-ranking global school to build job-ready AIML skills. This 12-month program offers a hands-on learning experience with top faculty and mentors. On completion, you will receive a Certificate from The University of Texas at Austin, and Great Lakes Executive Learning.

*Also Read: **Python Tutorial for Beginners*

**Residual Sum of Squares + λ * (Sum of the absolute value of the magnitude of coefficients)**

Where,

- λ denotes the amount of shrinkage.
- λ = 0 implies all features are considered and it is equivalent to the linear regression where only the residual sum of squares is considered to build a predictive model
- λ = ∞ implies no feature is considered i.e, as λ closes to infinity it eliminates more and more features
- The bias increases with increase in λ
- variance increases with decrease in λ

For this example code, we will consider a dataset from Machine hack’s Predicting Restaurant Food Cost Hackathon.

The task here is about predicting the average price for a meal. The data consists of the following features.

Size of training set: 12,690 records

Size of test set: 4,231 records

**TITLE**: The feature of the restaurant which can help identify what and for whom it is suitable for.

**RESTAURANT_ID**: A unique ID for each restaurant.

**CUISINES**: The variety of cuisines that the restaurant offers.

**TIME**: The open hours of the restaurant.

**CITY**: The city in which the restaurant is located.

**LOCALITY**: The locality of the restaurant.

**RATING**: The average rating of the restaurant by customers.

**VOTES**: The overall votes received by the restaurant.

**COST**: The average cost of a two-person meal.

After completing all the steps till Feature Scaling (Excluding), we can proceed to building a Lasso regression. We are avoiding feature scaling as the lasso regression comes with a parameter that allows us to normalise the data while fitting it to the model.

*Also Read: **Top Machine Learning Interview Questions*

```
import numpy as np
```

**Creating a New Train and Validation Datasets**

```
from sklearn.model_selection import train_test_split
data_train, data_val = train_test_split(new_data_train, test_size = 0.2, random_state = 2)
```

**Classifying Predictors and Target**

```
#Classifying Independent and Dependent Features
#_______________________________________________
#Dependent Variable
Y_train = data_train.iloc[:, -1].values
#Independent Variables
X_train = data_train.iloc[:,0 : -1].values
#Independent Variables for Test Set
X_test = data_val.iloc[:,0 : -1].values
```

**Evaluating The Model With RMLSE**

```
def score(y_pred, y_true):
error = np.square(np.log10(y_pred +1) - np.log10(y_true +1)).mean() ** 0.5
score = 1 - error
return score
actual_cost = list(data_val['COST'])
actual_cost = np.asarray(actual_cost)
```

**Building the Lasso Regressor**

```
#Lasso Regression
from sklearn.linear_model import Lasso
#Initializing the Lasso Regressor with Normalization Factor as True
lasso_reg = Lasso(normalize=True)
#Fitting the Training data to the Lasso regressor
lasso_reg.fit(X_train,Y_train)
#Predicting for X_test
y_pred_lass =lasso_reg.predict(X_test)
#Printing the Score with RMLSE
print("\n\nLasso SCORE : ", score(y_pred_lass, actual_cost))
```

**0.7335508027883148**

**The Lasso Regression attained an accuracy of 73% with the given Dataset.**

*Also Read: **What is Linear Regression in Machine Learning?*

Let us take “The Big Mart Sales” dataset we have product-wise Sales for Multiple outlets of a chain.

In the dataset, we can see characteristics of the sold item (fat content, visibility, type, price) and some characteristics of the outlet (year of establishment, size, location, type) and the number of the items sold for that particular item. Let’s see if we can predict sales using these features.

Let’s us take a snapshot of the dataset:

**Let’s Code!**

**Quick check –** Deep Learning Course

Lasso Regression is different from ridge regression as it uses absolute coefficient values for normalization.

As loss function only considers absolute coefficients (weights), the optimization algorithm will penalize high coefficients. This is known as the L1 norm.

In the above image we can see, Constraint functions (blue area); left one is for lasso whereas the right one is for the ridge, along with contours (green eclipse) for loss function i.e, RSS.

In the above case, for both regression techniques, the coefficient estimates are given by the first point at which contours (an eclipse) contacts the constraint (circle or diamond) region.

On the other hand, the lasso constraint, because of diamond shape, has corners at each of the axes hence the eclipse will often intersect at each of the axes. Due to that, at least one of the coefficients will equal zero.

However, lasso regression, when α is sufficiently large, will shrink some of the coefficients estimates to 0. That’s the reason lasso provides sparse solutions.

The main problem with lasso regression is when we have correlated variables, it retains only one variable and sets other correlated variables to zero. That will possibly lead to some loss of information resulting in lower accuracy in our model.

That was Lasso Regularization technique, and I hope now you can comprehend it in a better way. You can use this to improve the accuracy of your machine learning models.

**Quick check – **Free Machine Learning Course

**Interpretations**:

- Geometric Interpretations
- Bayesian Interpretations
- Convex relaxation Interpretations
- Making λ easier to interpret with an accuracy-simplicity tradeoff

**Generalizations**

- Elastic Net
- Group Lasso
- Fused Lasso
- Adaptive Lasso
- Prior Lasso
- Quasi-norms and bridge regression

**What is Lasso regression used for?**

**A**: Lasso regression is used for eliminating automated variables and the selection of features.

**What is lasso and ridge regression?**

**A**: Lasso regression makes coefficients to absolute zero; while ridge regression is a model turning method that is used for analyzing data suffering from multicollinearity

**What is Lasso Regression in machine learning?**

**A:** Lasso regression is a regularization technique used for more accurate prediction.

**Why does Lasso shrink zero?**

**A:** The L1 regularization performed by Lasso, causes the regression coefficient of the less contributing variable to shrink to zero or near zero.

**Is lasso better than Ridge?**

**A:** Lasso is considered to be better than ridge as it selects only some features and decreases the coefficients of others to zero.

**How does Lasso regression work?**

**A:** Lasso regression uses shrinkage, where the data values are shrunk towards a central point such as the mean value.

**What is the Lasso penalty?**

**A:** The Lasso penalty shrinks or reduces the coefficient value towards zero. The less contributing variable is therefore allowed to have a zero or near-zero coefficient.

**Is lasso L1 or L2?**

**A:** A regression model using the L1 regularization technique is called Lasso Regression, while a model using L2 is called Ridge Regression. The difference between these two is the term penalty.

**Is lasso supervised or unsupervised?**

**A:** Lasso is a supervised regularization method used in machine learning.

*If you are a beginner in the field, take up a** *PG program in Artificial Intelligence and Machine Learning *offered by Great Learning.*

Original article source at: https://www.mygreatlearning.com

#Lasso #regression

1669018963

In this blog, we will see the techniques used to overcome overfitting for a lasso regression model. Regularization is one of the methods widely used to make your model more generalized.

Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point as the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

Lasso Regression uses L1 regularization technique (will be discussed later in this article). It is used when we have more features because it automatically performs feature selection.

The word “LASSO” stands for **L**east **A**bsolute **S**hrinkage and **S**election **O**perator. It is a statistical formula for the regularisation of data models and feature selection.

Regularization is an important concept that is used to avoid overfitting of the data, especially when the trained and test data are much varying.

Regularization is implemented by adding a “penalty” term to the best fit derived from the trained data, to achieve a *lesser variance* with the tested data and also restricts the influence of predictor variables over the output variable by compressing their coefficients.

In regularization, what we do is normally we keep the same number of features but reduce the magnitude of the coefficients. We can reduce the magnitude of the coefficients by using different types of regression techniques which uses regularization to overcome this problem. So, let us discuss them. Before we move further, you can also upskill with the help of online courses on Linear Regression in Python and enhance your skills.

There are two main regularization techniques, namely Ridge Regression and Lasso Regression. They both differ in the way they assign a penalty to the coefficients. In this blog, we will try to understand more about Lasso Regularization technique.

If a regression model uses the L1 Regularization technique, then it is called Lasso Regression. If it used the L2 regularization technique, it’s called Ridge Regression. We will study more about these in the later sections.

L1 regularization adds a penalty that is equal to the absolute value of the magnitude of the coefficient. This regularization type can result in sparse models with few coefficients. Some coefficients might become zero and get eliminated from the model. Larger penalties result in coefficient values that are closer to zero (ideal for producing simpler models). On the other hand, L2 regularization does not result in any elimination of sparse models or coefficients. Thus, Lasso Regression is easier to interpret as compared to the Ridge. While there are ample resources available online to help you understand the subject, there’s nothing quite like a certificate. Check out Great Learning’s **PG program in Artificial Intelligence** to upskill in the domain. This course will help you learn from a top-ranking global school to build job-ready AIML skills. This 12-month program offers a hands-on learning experience with top faculty and mentors. On completion, you will receive a Certificate from The University of Texas at Austin, and Great Lakes Executive Learning.

*Also Read: **Python Tutorial for Beginners*

**Residual Sum of Squares + λ * (Sum of the absolute value of the magnitude of coefficients)**

Where,

- λ denotes the amount of shrinkage.
- λ = 0 implies all features are considered and it is equivalent to the linear regression where only the residual sum of squares is considered to build a predictive model
- λ = ∞ implies no feature is considered i.e, as λ closes to infinity it eliminates more and more features
- The bias increases with increase in λ
- variance increases with decrease in λ

For this example code, we will consider a dataset from Machine hack’s Predicting Restaurant Food Cost Hackathon.

The task here is about predicting the average price for a meal. The data consists of the following features.

Size of training set: 12,690 records

Size of test set: 4,231 records

**TITLE**: The feature of the restaurant which can help identify what and for whom it is suitable for.

**RESTAURANT_ID**: A unique ID for each restaurant.

**CUISINES**: The variety of cuisines that the restaurant offers.

**TIME**: The open hours of the restaurant.

**CITY**: The city in which the restaurant is located.

**LOCALITY**: The locality of the restaurant.

**RATING**: The average rating of the restaurant by customers.

**VOTES**: The overall votes received by the restaurant.

**COST**: The average cost of a two-person meal.

After completing all the steps till Feature Scaling (Excluding), we can proceed to building a Lasso regression. We are avoiding feature scaling as the lasso regression comes with a parameter that allows us to normalise the data while fitting it to the model.

*Also Read: **Top Machine Learning Interview Questions*

```
import numpy as np
```

**Creating a New Train and Validation Datasets**

```
from sklearn.model_selection import train_test_split
data_train, data_val = train_test_split(new_data_train, test_size = 0.2, random_state = 2)
```

**Classifying Predictors and Target**

```
#Classifying Independent and Dependent Features
#_______________________________________________
#Dependent Variable
Y_train = data_train.iloc[:, -1].values
#Independent Variables
X_train = data_train.iloc[:,0 : -1].values
#Independent Variables for Test Set
X_test = data_val.iloc[:,0 : -1].values
```

**Evaluating The Model With RMLSE**

```
def score(y_pred, y_true):
error = np.square(np.log10(y_pred +1) - np.log10(y_true +1)).mean() ** 0.5
score = 1 - error
return score
actual_cost = list(data_val['COST'])
actual_cost = np.asarray(actual_cost)
```

**Building the Lasso Regressor**

```
#Lasso Regression
from sklearn.linear_model import Lasso
#Initializing the Lasso Regressor with Normalization Factor as True
lasso_reg = Lasso(normalize=True)
#Fitting the Training data to the Lasso regressor
lasso_reg.fit(X_train,Y_train)
#Predicting for X_test
y_pred_lass =lasso_reg.predict(X_test)
#Printing the Score with RMLSE
print("\n\nLasso SCORE : ", score(y_pred_lass, actual_cost))
```

**0.7335508027883148**

**The Lasso Regression attained an accuracy of 73% with the given Dataset.**

*Also Read: **What is Linear Regression in Machine Learning?*

Let us take “The Big Mart Sales” dataset we have product-wise Sales for Multiple outlets of a chain.

In the dataset, we can see characteristics of the sold item (fat content, visibility, type, price) and some characteristics of the outlet (year of establishment, size, location, type) and the number of the items sold for that particular item. Let’s see if we can predict sales using these features.

Let’s us take a snapshot of the dataset:

**Let’s Code!**

**Quick check –** Deep Learning Course

Lasso Regression is different from ridge regression as it uses absolute coefficient values for normalization.

As loss function only considers absolute coefficients (weights), the optimization algorithm will penalize high coefficients. This is known as the L1 norm.

In the above image we can see, Constraint functions (blue area); left one is for lasso whereas the right one is for the ridge, along with contours (green eclipse) for loss function i.e, RSS.

In the above case, for both regression techniques, the coefficient estimates are given by the first point at which contours (an eclipse) contacts the constraint (circle or diamond) region.

On the other hand, the lasso constraint, because of diamond shape, has corners at each of the axes hence the eclipse will often intersect at each of the axes. Due to that, at least one of the coefficients will equal zero.

However, lasso regression, when α is sufficiently large, will shrink some of the coefficients estimates to 0. That’s the reason lasso provides sparse solutions.

The main problem with lasso regression is when we have correlated variables, it retains only one variable and sets other correlated variables to zero. That will possibly lead to some loss of information resulting in lower accuracy in our model.

That was Lasso Regularization technique, and I hope now you can comprehend it in a better way. You can use this to improve the accuracy of your machine learning models.

**Quick check – **Free Machine Learning Course

**Interpretations**:

- Geometric Interpretations
- Bayesian Interpretations
- Convex relaxation Interpretations
- Making λ easier to interpret with an accuracy-simplicity tradeoff

**Generalizations**

- Elastic Net
- Group Lasso
- Fused Lasso
- Adaptive Lasso
- Prior Lasso
- Quasi-norms and bridge regression

**What is Lasso regression used for?**

**A**: Lasso regression is used for eliminating automated variables and the selection of features.

**What is lasso and ridge regression?**

**A**: Lasso regression makes coefficients to absolute zero; while ridge regression is a model turning method that is used for analyzing data suffering from multicollinearity

**What is Lasso Regression in machine learning?**

**A:** Lasso regression is a regularization technique used for more accurate prediction.

**Why does Lasso shrink zero?**

**A:** The L1 regularization performed by Lasso, causes the regression coefficient of the less contributing variable to shrink to zero or near zero.

**Is lasso better than Ridge?**

**A:** Lasso is considered to be better than ridge as it selects only some features and decreases the coefficients of others to zero.

**How does Lasso regression work?**

**A:** Lasso regression uses shrinkage, where the data values are shrunk towards a central point such as the mean value.

**What is the Lasso penalty?**

**A:** The Lasso penalty shrinks or reduces the coefficient value towards zero. The less contributing variable is therefore allowed to have a zero or near-zero coefficient.

**Is lasso L1 or L2?**

**A:** A regression model using the L1 regularization technique is called Lasso Regression, while a model using L2 is called Ridge Regression. The difference between these two is the term penalty.

**Is lasso supervised or unsupervised?**

**A:** Lasso is a supervised regularization method used in machine learning.

*If you are a beginner in the field, take up a** *PG program in Artificial Intelligence and Machine Learning *offered by Great Learning.*

Original article source at: https://www.mygreatlearning.com

#Lasso #regression

1593294180

Overview of the differences in 3 common regularization techniques — Ridge, Lasso, and Elastic Net.

#regression #regularization #ridge #lasso #elastic net regressions #.net

1592023980

Take your current understanding and skills on machine learning algorithms to the next level with this article. What is regression analysis in simple words? How is it applied in practice for real-world problems? And what is the possible snippet of codes in Python you can use for implementation regression algorithms for various objectives? Let’s forget about boring learning stuff and talk about science and the way it works.

#linear-regression-python #linear-regression #multivariate-regression #regression #python-programming

1598352300

Machine learning algorithms are not your regular algorithms that we may be used to because they are often described by a combination of some complex statistics and mathematics. Since it is very important to understand the background of any algorithm you want to implement, this could pose a challenge to people with a non-mathematical background as the maths can sap your motivation by slowing you down.

In this article, we would be discussing linear and logistic regression and some regression techniques assuming we all have heard or even learnt about the Linear model in Mathematics class at high school. Hopefully, at the end of the article, the concept would be clearer.

**Regression Analysis **is a statistical process for estimating the relationships between the **dependent variables ( say Y)** and one or more

#regression #machine-learning #beginner #logistic-regression #linear-regression #deep learning

1603022085

In Supervised Learning, we mostly deal with two types of variables i.e **numerical **variables and **categorical** variables. Wherein **regression** deals with numerical variables and **classification **deals with categorical variables. Where,

Regressionis one of the most popular statistical techniques used for Predictive Modelling and Data Mining in the world of Data Science. Basically,

Regression Analysis is a technique used for determining the relationship between two or more variables of interest.

However, Generally only 2–3 types of total 10+ types of regressions are used in practice. Linear Regression and Logistic Regression being widely used in general. So, Today we’re going to explore following 4 types of Regression Analysis techniques:

**Simple Linear Regression****Ridge Regression****Lasso Regression****ElasticNet Regression**

We will be observing their applications as well as the difference among them on the go while working on Student’s Score Prediction dataset. Let’s get started.

It is the simplest form of regression. As the name suggests, if the variables of interest share a linear relationship, then Linear Regression algorithm is applicable to them. If there is a single independent variable(here, Hours), then it is a **Simple Linear Regression**. If there are more than 1 independent variables, then it is a **Multiple Linear Regression**. The mathematical equation that approximates linear relationship between independent (criterion ) variable X and dependent(predictor) variable Y is:

where, β0 and β1 are intercept and slope respectively which are also known as parameters or model co-efficients.

#data-science #regression-analysis #elastic-net #ridge-regression #lasso-regression