Article 5 of Machine Learning Series

In this article we will discuss about decision trees, one of the supervised learning algorithm, commonly referred to as CART that can be used for both regression and classification problems.

As the name suggests, the primary role of this algorithm is to make a decision using a tree structure. To find solutions a decision tree makes a sequential, hierarchical decision about the outcome variable based on the predictor data. A decision tree is generally used while working with non-linear data.

Image for post

Because of their simplicity and the fact that they are easy to understand and implement, they are widely used in a large number of industries.

Getting Acquainted With Some New Terms

Now, before we move further it’s important that we understand some important terminologies associated with the algorithm. Decision Trees are made up of a number of nodes, each of which represents a particular feature. The first node of a decision tree is generally referred to as the Root Node.

Image for post

The depth of the tree is the total number of levels present in the tree excluding the root node. A branch denotes a decision and can be visualized as a link between different nodes. A leaf tells you what class each sample belongs to.

How Does The Decision Tree Work

Decision Trees progressively divide data sets into small data groups until they reach sets that are small enough to be described by some label. At the same time an associated decision tree is incrementally developed.

Decision trees apply a top-down approach to data. The splitting of a binary tree can either be binary or multiway. The algorithm partitions the data into a set of rectangles and fits the model over each of these rectangles. More the number of rectangles (splits) greater is the complexity.

Image for post

A downside of using super complex decision tree is that they are likely of fall into overfitting scenario as the model learns the training data so good that it has problems to generalize to new unseen data.

It then examines all the features of the dataset to find the best possible result by splitting the data into smaller and smaller subgroups until the tree concludes.

#supervised-learning #regression #entropy #machine-learning #decision-tree #deep learning

Sowing the Seeds of Decision Tree Regression
1.60 GEEK