1595727180

Wouldn’t it be awesome to understand the underlying principles used to build a decision tree? In this post, I will demonstrate how to build a decision tree, in particular a classification tree, using two different criterion: _gini impurity _and _entropy _supplemented by a step-by-step explanation. I hope you will have a better appreciation of how decision trees are built by the end of this post! 🎓

Decision tree built on past experience to assess whether to watch a particular movie

To keep things manageable and hopefully a bit of fun, we will create a tiny fictitious data inspired by the 6 main characters from the sitcom Friends:

Note: the values in the data were adjusted to make it fit for the example

Let’s entertain the idea that this data is correct for the purpose of this post. We will build a decision tree to classify if a character is a parent using the rest of the columns. In other words, we will build a classification tree with the following inputs and output:

◼ ️**inputs | features:** *was_on_a_break, is_married, has_pet*

◼️ **output | target:** *is_parent*

If you enjoy math, I encourage you to manually calculate alongside this guide to make most of this blog. In this section, those characters who are parents are abbreviated as **pa **and the non-parents are abbreviated as **np **for brevity.

Decision trees are built by recursively splitting into binary nodes from top-down. We can find the right split for a node with the following steps:

**STEP 1: **Calculate gini impurity (here onwards gini) for the node to split from

**STEP 2: **Find all possible splits

**STEP 3: **Calculate gini for both nodes for each split

**STEP 4:** Calculate the weighted average gini for each split

**STEP 5: **Determine the best split: the one with lowest weighted average gini

**STEP 6:** Calculate information gain: split if information gain is positive

The very top node that includes everyone from the training data is known as *root node*. Let’s determine the best split for the root node with the steps.

#supervised-learning #machine-learning #data-science #classification #decision-tree

1596286260

**Decision Tree** is one of the most widely used machine learning algorithm. It is a supervised learning algorithm that can perform both classification and regression operations.

As the name suggest, it uses a tree like structure to make decisions on the given dataset. Each internal node of the tree represent a “decision” taken by the model based on any of our attributes. From this decision, we can seperate classes or predict values.

Let’s look at both classification and regression operations one by one.

In Classification, each **leaf node** of our decision tree represents a **class **based on the decisions we make on attributes at internal nodes.

To understand it more properly let us look at an example. I have used the Iris Flower Dataset from sklearn library. You can refer the complete code on Github — Here.

A node’s **samples** attribute counts how many training instances it applies to. For example, 100 training instances have a petal width ≤ 2.45 cm .

A node’s **value** attribute tells you how many training instances of each class this node applies to. For example, the bottom-right node applies to 0 Iris-Setosa, 0 Iris- Versicolor, and 43 Iris-Virginica.

And a node’s **gini** attribute measures its impurity: a node is “pure” (gini=0) if all training instances it applies to belong to the same class. For example, since the depth-1 left node applies only to Iris-Setosa training instances, it is pure and its gini score is 0.

Gini Impurity Formula

where, **pⱼ** is the ratio of instances of class j among all training instances at that node.

Based on the decisions made at each internal node, we can sketch *decision boundaries* to visualize the model.

But how do we find these boundaries ?

We useto find these boundaries.Classification And Regression Tree (CART)

**CART** is a simple algorithm that finds an attribute _k _and a threshold _t_ₖat which we get a purest subset. Purest subset means that either of the subsets contain maximum proportion of one particular class. For example, left node at depth-2 has maximum proportion of Iris-Versicolor class i.e 49 of 54. In the _CART cost function, _we split the training set in such a way that we get minimum gini impurity.The *CART cost function* is given as:

After successfully splitting the dataset into two, we repeat the process on either sides of the tree.

We can directly implement Decision tree with the help of Scikit learn library. It has a class called DecisionTreeClassifier which trains the model for us directly and we can adjust the hyperparameters as per our requirements.

#machine-learning #decision-tree #decision-tree-classifier #decision-tree-regressor #deep learning

1596285180

Both of `Regression Trees`

and `Classification Trees`

are a part of `CART (Classification And Regression Tree) Algorithm`

. As we mentioned in *Regression Trees* article, tree is composed of 3-major parts; root-node, decision-node and terminal/leaf-node.

The criteria used here in node splitting differs from that being used in Regression Trees. As before we will run our example and then learn how the model is being trained.

There are three commonly measures are used in the attribute selection `Gini impurity`

measure, is the one used by `CART`

classifier. For more information on these, see Wikipedia.

`iris`

data set```
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from six import StringIO
from IPython.display import Image
# pip/conda install pydotplus
import pydotplus
from sklearn import datasets
iris = datasets.load_iris()
xList = iris.data # Data will be loaded as an array
labels = iris.target
dataset = pd.DataFrame(data=xList,
columns=iris.feature_names)
dataset['target'] = labels
targetNames = iris.target_names
print(targetNames)
print(dataset)
```

Iris Flowers

When an observation or row is passed to a non-terminal node, the row answers the node’s question. If it answers yes, the row of attributes is passed to the leaf node below and to the left of the current node. If the row answers no, the row of attributes is passed to the leaf node below and to the right of the current node. The process continues recursively until the row arrives at a terminal (that is, leaf) node where a prediction value is assigned to the row. The value assigned by the leaf node is the mean of the outcomes of the all the training observations that wound up in the leaf node.

Classification trees split a node into two sub-nodes. Splitting into sub-nodes will increase the homogeneity of resultant sub-nodes. In other words, we can say that the purity of the node increases with respect to the target variable. The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous/pure sub-nodes.

There are major measures being used to determine which attribute/feature is used for splitting and which value within this attribute we will start with. Some of these measures are:

- Gini index (Default in SciKit Learn Classifier).
- Entropy.
- Information gain.

We will start with `Gini index`

measure and try to understand it

Gini index is an impurity measure used to evaluate splits in the dataset. It is calculated by getting the sum of the squared probabilities of each class (target-class) within a certain attribute/feature then benig subtacted from one.

#machine-learning #mls #decision-tree #decision-tree-classifier #classification #deep learning

1592847556

Binary Decision Trees. Binary Decision Trees

Binary decision trees is a supervised machine-learning technique operates by subjecting attributes to a series of binary (yes/no) decisions. Each decision leads to one of two possibilities. Each decision leads to another decision or it leads to prediction.

#decision-tree-regressor #decision-tree #artificial-intelligence #mls #machine-learning #programming

1602997200

Decision Trees are easy & Simple to implement & interpreted. Decision Tree is a diagram (flow) that is used to predict the course of action or a probability. Each branch of the decision tree represents an outcome or decision or a reaction. Decision Trees can be implemented in a variety of situations from personal to complex situations. The sequence of steps will give a better understanding easily.

In Programming, we regularly use If-else conditions, even the Decision Tree working process is similar to an If-else condition.

The below tree shows a simple implementation of different nodes in the decision tree.

- **Root Node: **Root Node is a top node with the base feature.
- **Parent Node: **Nodes that get their origin from a root node or this can be represented as a decision node where the decision of Yes/No or True/False or prediction turn happens.
- **Child Node: **these nodes get their origin from a parent node. If the decision made from a parent node is not satisfactory then these nodes will be created. Until we arrive at the final node where we have pure domination in a class (Yes/No) means that until we arrive at the leaf node.
- **Leaf Node: **Can also be called a terminal Node or a final decision node where we will conclude.

#gini-index #entropy #decision-tree #decision-tree-classifier #machine-learning

1596960480

A classification tree is very alike to a regression tree, besides that it is

used to predict a qualitative response rather than a quantitative one. classification tree, we predict that each observation belongs to the most ordinarily occurring class of training observations in the region to which it belongs.

In the classification, RSS cannot be used for making the binary splits. An alternative to RSS is the **classification error rate**. we assign an observation in a given region to the most commonly occurring class of training observations in that region,** the classification error rate is simply the fraction of the training observations in that region that do not belong to the most common class:**

p̂mk = proportion of training observations in the mth region that is from the kth class.

classification error is not good enough for tree-growing, and in practice, two other measures are favoured.

**Gini index**is defined by a measure of total variance across the K classes. It takes on a small value if all of the p̂mk ’s are close to 0 or 1. Therefore it is referred to as a **measure of node purity **— a small value shows that a node contains predominantly observations from a single class.

2.** Cross-Entropy **is given by

Since, 0 ≤ p̂mk ≤ 1, it follows that 0 ≤ −p̂mk log p̂mk. The cross-entropy will take on a value near zero if the p̂mk ’s are all near 0 or near 1. Therefore, the cross-entropy will take on a small value if the mth node is pure. It turns out that the Gini index and the cross-entropy are quite similar numerically.

Two approaches are more sensitive to node purity than is the classification error rate. Any of these three approaches might be used when pruning the tree, but the classification error rate is preferable if the prediction accuracy of the final pruned tree is the goal.

**Decision tree classifier** **using sklearn**

To implement the decision tree classifier, we’re going to use scikit-learn, and we’ll import our ‘’DecisionTreeClassifier’’ from sklearn.tree

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
```

**Load the Data**

Once the libraries are imported, our next step is to load the data, which is the **iris dataset,** itis a classic and very easy multi-class classification dataset available in sklearn datasets. This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray. The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.

#iris-dataset #decision-tree #decision-tree-classifier #machine-learning #sklearn #deep learning