Alec  Nikolaus

Alec Nikolaus

1596423480

Regression-based decision trees: Predicting Average Daily Rates for Hotels

The purpose of a decision tree is to visualise features of a model by means of a tree-like graph, and infer the importance (and lack thereof) of each feature in affecting the output variable.

The decision tree structure consists of:

  • Nodes: Each decision tree consists of what are called root nodes and decision nodes.
  • Branches: Represent the outcome of each decision taken across the nodes.

Image for post

Source: Photo by OpenClipart-Vectors from Pixabay.

In this example, a regression-based decision tree is formulated to predict ADR (average daily rate) for a hotel given certain customer attributes.

Background

This study focuses on hotel booking analysis. When it comes to hotel bookings, average daily rate (ADR) is a particularly important metric. This reflects the average rate per day that a particular customer pays throughout their stay.

This analysis is based on the original study by Antonio, Almeida, and Nunes, 2016.

Gauging ADR allows hotels to more accurately identify its most profitable customers and tailor its marketing strategies accordingly.

The chosen features that form the input for this neural network are as follows:

  1. IsCanceled
  2. Country of origin
  3. Market segment
  4. Deposit type
  5. Customer type
  6. Required car parking spaces
  7. Arrival Date: Year
  8. Arrival Date: Month
  9. Arrival Date: Week Number
  10. Arrival Date: Day of Month

#regression #python #machine-learning #data-science #decision-tree

What is GEEK

Buddha Community

Regression-based decision trees: Predicting Average Daily Rates for Hotels
Alec  Nikolaus

Alec Nikolaus

1596423480

Regression-based decision trees: Predicting Average Daily Rates for Hotels

The purpose of a decision tree is to visualise features of a model by means of a tree-like graph, and infer the importance (and lack thereof) of each feature in affecting the output variable.

The decision tree structure consists of:

  • Nodes: Each decision tree consists of what are called root nodes and decision nodes.
  • Branches: Represent the outcome of each decision taken across the nodes.

Image for post

Source: Photo by OpenClipart-Vectors from Pixabay.

In this example, a regression-based decision tree is formulated to predict ADR (average daily rate) for a hotel given certain customer attributes.

Background

This study focuses on hotel booking analysis. When it comes to hotel bookings, average daily rate (ADR) is a particularly important metric. This reflects the average rate per day that a particular customer pays throughout their stay.

This analysis is based on the original study by Antonio, Almeida, and Nunes, 2016.

Gauging ADR allows hotels to more accurately identify its most profitable customers and tailor its marketing strategies accordingly.

The chosen features that form the input for this neural network are as follows:

  1. IsCanceled
  2. Country of origin
  3. Market segment
  4. Deposit type
  5. Customer type
  6. Required car parking spaces
  7. Arrival Date: Year
  8. Arrival Date: Month
  9. Arrival Date: Week Number
  10. Arrival Date: Day of Month

#regression #python #machine-learning #data-science #decision-tree

Decision Tree -Classification

Decision Tree is one of the most widely used machine learning algorithm. It is a supervised learning algorithm that can perform both classification and regression operations.

As the name suggest, it uses a tree like structure to make decisions on the given dataset. Each internal node of the tree represent a “decision” taken by the model based on any of our attributes. From this decision, we can seperate classes or predict values.

Let’s look at both classification and regression operations one by one.

Classification

In Classification, each leaf node of our decision tree represents a **class **based on the decisions we make on attributes at internal nodes.

To understand it more properly let us look at an example. I have used the Iris Flower Dataset from sklearn library. You can refer the complete code on Github — Here.

Image for post

A node’s samples attribute counts how many training instances it applies to. For example, 100 training instances have a petal width ≤ 2.45 cm .

A node’s value attribute tells you how many training instances of each class this node applies to. For example, the bottom-right node applies to 0 Iris-Setosa, 0 Iris- Versicolor, and 43 Iris-Virginica.

And a node’s gini attribute measures its impurity: a node is “pure” (gini=0) if all training instances it applies to belong to the same class. For example, since the depth-1 left node applies only to Iris-Setosa training instances, it is pure and its gini score is 0.

Image for post

Gini Impurity Formula

where, pⱼ is the ratio of instances of class j among all training instances at that node.

Based on the decisions made at each internal node, we can sketch decision boundaries to visualize the model.

Image for post

But how do we find these boundaries ?

We use Classification And Regression Tree (CART) to find these boundaries.

CART is a simple algorithm that finds an attribute _k _and a threshold _t_ₖat which we get a purest subset. Purest subset means that either of the subsets contain maximum proportion of one particular class. For example, left node at depth-2 has maximum proportion of Iris-Versicolor class i.e 49 of 54. In the _CART cost function, _we split the training set in such a way that we get minimum gini impurity.The CART cost function is given as:

Image for post

After successfully splitting the dataset into two, we repeat the process on either sides of the tree.

We can directly implement Decision tree with the help of Scikit learn library. It has a class called DecisionTreeClassifier which trains the model for us directly and we can adjust the hyperparameters as per our requirements.

Image for post

#machine-learning #decision-tree #decision-tree-classifier #decision-tree-regressor #deep learning

Using Decision Trees to Predict Conversion Rate

Introduction

Vandelay Industries has collected some data about people that visit their online store. The data includes basic information about shoppers such as their country, age, how many pages they visited during a session, if they are a new or returning user, which marketing channel they entered the site through, and whether or not they made a purchase (converted).

Goal

Given this information, our task is to predict conversion rate, and make recommendations to the product team and the marketing team on ways to improve conversion rate.

Exploratory Data Analysis

The data set has been pre-cleaned and does not contain missing values. So we can dive right into EDA and plotting the feature distributions.

Image for post

Image: Deandra Alvear

By plotting the distribution of each feature, we gain several new insights about the shoppers that visit Vandelay Industries’s website.

  1. **More than half the shoppers in the data set are located in the US: **Vandelay Industries is most likely a US-based company.

2. **age** **seems to be between 17-123: **Upon closer examination, there are two shoppers between 100 and 123 years old. There’s no way to verify these records, but in a data set with 300,000+ shoppers, it is unlikely these two records will affect our results, so I’ll keep them in.

3. **There is twice as many new shoppers than returning shoppers: **Vandelay Industries is doing well at getting shoppers to return to its website.

4. Most shoppers enter the site by clicking a search engine result: this is opposed to them entering via an advertisement or directly typing in the website address.

5. Most shoppers visit less than 10 pages during a session

6. **The classes in the **converted** column are imbalanced: **a quick calculation shows the site’s current conversion rate is around 3%. A quick search for average conversion rate for an e-commerce platform is 1–2%, so the company isn’t performing poorly by any means. This will be our target attribute.

At this point I know quite a bit about this data set; now I can explore the relationships between the features. Since there is a mix of numerical and categorical features, some feature engineering will need to be done. This step is outside the scope of this article, but can be found in this Jupyter Notebook.

Image for post

It appears that none of the columns are correlated to each other except total_pages_visited and converted, and they seem to be positively correlated. This suggests that as shoppers view more pages, the possibility of them converting increases. Conversely, a negative correlation would imply that as shoppers visit more pages in a session, the possibility of them converting decreases. Typically we see this when a feature on a website isn’t functioning correctly.

Now I’ll implement a baseline model to predict conversion rate.

#business-analytics #decision-tree #machine-learning #conversion-rate #data-science #data analytic

Jones Brianna

Jones Brianna

1612588350

Hotel Booking App Development

https://www.mobiwebtech.com/create-online-hotel-booking-app-like-oyo-rooms-airbnb/https://www.mobiwebtech.com/create-online-hotel-booking-app-like-oyo-rooms-airbnb/

If you are interested in creating a Hotel booking app like Oyo Rooms, Mobiweb Technologies is the perfect partner for you in this technological world. Our developers have a huge experience in providing mobile app development solutions. Contact us by visiting our official website.

#hotel booking app development #hotel booking app developer #hotel booking app like oyo #hotel booking software development #create hotel booking app #hotel booking web development

Sowing the Seeds of Decision Tree Regression

Article 5 of Machine Learning Series

In this article we will discuss about decision trees, one of the supervised learning algorithm, commonly referred to as CART that can be used for both regression and classification problems.

As the name suggests, the primary role of this algorithm is to make a decision using a tree structure. To find solutions a decision tree makes a sequential, hierarchical decision about the outcome variable based on the predictor data. A decision tree is generally used while working with non-linear data.

Image for post

Because of their simplicity and the fact that they are easy to understand and implement, they are widely used in a large number of industries.

Getting Acquainted With Some New Terms

Now, before we move further it’s important that we understand some important terminologies associated with the algorithm. Decision Trees are made up of a number of nodes, each of which represents a particular feature. The first node of a decision tree is generally referred to as the Root Node.

Image for post

The depth of the tree is the total number of levels present in the tree excluding the root node. A branch denotes a decision and can be visualized as a link between different nodes. A leaf tells you what class each sample belongs to.

How Does The Decision Tree Work

Decision Trees progressively divide data sets into small data groups until they reach sets that are small enough to be described by some label. At the same time an associated decision tree is incrementally developed.

Decision trees apply a top-down approach to data. The splitting of a binary tree can either be binary or multiway. The algorithm partitions the data into a set of rectangles and fits the model over each of these rectangles. More the number of rectangles (splits) greater is the complexity.

Image for post

A downside of using super complex decision tree is that they are likely of fall into overfitting scenario as the model learns the training data so good that it has problems to generalize to new unseen data.

It then examines all the features of the dataset to find the best possible result by splitting the data into smaller and smaller subgroups until the tree concludes.

#supervised-learning #regression #entropy #machine-learning #decision-tree #deep learning