Decision Tree from Scratch

For most of complex and non-linear data , tree based algorithms like Decision Tree, Random Forest, XGBoost, etc works better than most of the algorithms. But have you ever thought how these algorithms work?

In this article, let’s understand the working of a Decision Tree and then we will implement it in python.

What is a Decision Tree?

Image for post


Decision tree is a type of supervised learning algorithm. It consists of decision nodes and leaf nodes. A decision node has two or more branches whereas a leaf node. Leaf node represents a classification or decision (for regression). The topmost decision node in a tree which corresponds to the best predictor (most important feature) is called a root node.

This algorithm is also known as Hunt’s algorithm, which is both greedy, and recursive. Greedy meaning that at step it makes the most optimal decision and recursive meaning it splits the larger question into smaller questions and resolves them the same way.

To implement decision tree, we have many built-in algorithms like CART, ID3, C4.5 etc. In this blog, we are interested in CART. So, let’s talk about CART.

