What is Decision Tree Regression?

Decision trees are majorly used in classification problems however, let us try to understand its implications in regression and also, try to understand why using it in regression isn’t a great idea.

Decision tree regression enables one to divide the data into multiple splits. These splits typically answer a simple if-else condition. The algorithm decides the optimal number of splits in the data. Since this method of splitting data closely resembles the branches of a tree, this is probably is known as a decision tree. In fact, the last level (i.e., Fit, Unfit) are known as leaves.

For example, look at the image above, there are data regarding customers, their age, whether they eat pizza or not and whether they exercise or not. By performing decision tree regression, the data is split into 2 categories by age i.e., age< 30 and age> 30. Within the age< 30 category, the data is again split into 2 categories by their eating habits i.e., people eating pizza and people not eating pizza. The same goes for exercise as well. By doing these splits, we can simply account for the behavior of the customers based on their choices and we end up deciding whether they are fit or unfit.

Implementation in Python

Let us deep dive into python and build a polynomial regression model and try to predict the salary of an employee of 6.5 level(hypothetical).

Before you move forward, please download the CSV data file from my GitHub Gist.

https://gist.github.com/tharunpeddisetty/433e5fe5af0e6b6cdd9d7df3339931a5
Once you open the link, you can find "Download Zip" button on the top right corner of the window. Go ahead and download the files.
You can download 1) python file 2)data file (.csv)
Rename the folder accordingly and store it in desired location and you are all set.If you are a beginner I highly recommend you to open your python IDE and follow the steps below because here, I write detailed comments(statements after #.., these do not compile when our run the code) on the working of code. You can use the actual python as your backup file or for your future reference.

Importing Libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#python #machine-learning #analytics

Baby Steps Towards Data Science: Decision Tree Regression in Python
4.95 GEEK