# How to do Decision Trees in Python and R

Decision Tree is a supervised learning method that partitions data based on different conditions, or features of the data. For instance, maybe a pharmaceutical company hopes to evaluate how the effects of four different types of drugs vary based on information collected from clinical trials. The company may build a decision tree to recommend drugs for future patients based on similar conditions. In another example, a finance firm may build a similar type of classification tree to classify future investments into different types, given prior variables of investment data. Decision trees can also model continuous data types, and in this case, they use the same partitioning to develop a prediction of the outcome class.

Random Forests are decision trees thatbuild in randomness to the estimators, and often result in improved accuracy over decision trees. The tradeoff for this improved accuracy is that random forests are a black box algorithm: we can’t interpret or explain how the predictors lead to the outcome.Here, I demonstrate how to build decision trees in both Python and R using Spotify data. The goal is to build a tree, or trees, that will classify songs by genre.

## Overview of the Method

Decision trees are built using recursive partitioning, meaning that at each node, or feature of the dataset, the data is split based on a split that best explains the next step in the prediction. For instance, the root node begins with all the samples of the classes of interest. Perhaps age is the feature that best spits the classes of the data at a certain threshold. This split is bad in the next level, and so, on, until all features have been used.

## Evaluation

The best split is the one that reduces the error sum of squares the most. In order to quantify which attribute is the best or most predictive to split the data, the purity of the leaves is quantified. Pure splits are when the algorithm has high reliability in classifying one or the other (it makes the same decision each time). The goal is to minimize impurity at each step of the tree. In other words, we want to build a tree that shows high information gain, and low entropy (or information disorder). With high information gain, we can be more confident that the splits are reliable and reflect true features in the sample. The gini impurity (probability of a random sample being classified incorrectly) is one measure of this purity, where a measure of 0 indicates that the split is completely pure.

## Research Question

Using data from the Spotify API, let’s build a decision tree that will predict the genre of songs on a dataset of jazz, folk, and punk music, and then test the dataset on songs unseen by the algorithm.

## top 30 Python Tips and Tricks for Beginners

Welcome to my Blog , In this article, you are going to learn the top 10 python tips and tricks.

### 8) Check The Memory Usage Of An Object.

## Lambda, Map, Filter functions in python

Welcome to my Blog, In this article, we will learn python lambda function, Map function, and filter function.

Lambda function in python: Lambda is a one line anonymous function and lambda takes any number of arguments but can only have one expression and python lambda syntax is

Syntax: x = lambda arguments : expression

Now i will show you some python lambda function examples:

## R vs Python: What Should Beginners Learn?

### Let go of any doubts or confusion, make the right choice and then focus and thrive as a data scientist.

I currently lead a research group with data scientists who use both R and Python. I have been in this field for over 14 years. I have witnessed the growth of both languages over the years and there is now a thriving community behind both.

I did not have a straightforward journey and learned many things the hard way. However, you can avoid making the mistakes I made and lead a more focussed, more rewarding journey and reach your goals quicker than others.

Before I dive in, let’s get something out of the way. R and Python are just tools to do the same thing. Data Science. Neither of the tools is inherently better than the other. Both the tools have been evolving over years (and will likely continue to do so).

Therefore, the short answer on whether you should learn Python or R is: it depends.

The longer answer, if you can spare a few minutes, will help you focus on what really matters and avoid the most common mistakes most enthusiastic beginners aspiring to become expert data scientists make.

## Python Tricks Every Developer Should Know

Python is awesome, it’s one of the easiest languages with simple and intuitive syntax but wait, have you ever thought that there might ways to write your python code simpler?

In this tutorial, you’re going to learn a variety of Python tricks that you can use to write your Python code in a more readable and efficient way like a pro.

### Let’s get started

Swapping value in Python

Instead of creating a temporary variable to hold the value of the one while swapping, you can do this instead

``````>>> FirstName = "kalebu"
>>> LastName = "Jordan"
>>> FirstName, LastName = LastName, FirstName
>>> print(FirstName, LastName)
('Jordan', 'kalebu')
``````

## How to Remove all Duplicate Files on your Drive via Python

Today you’re going to learn how to use Python programming in a way that can ultimately save a lot of space on your drive by removing all the duplicates.

### Intro

In many situations you may find yourself having duplicates files on your disk and but when it comes to tracking and checking them manually it can tedious.

Heres a solution

But How do we do it?

If we were to read the whole file and then compare it to the rest of the files recursively through the given directory it will take a very long time, then how do we do it?

The answer is hashing, with hashing can generate a given string of letters and numbers which act as the identity of a given file and if we find any other file with the same identity we gonna delete it.

There’s a variety of hashing algorithms out there such as

• md5
• sha1
• sha224, sha256, sha384 and sha512

