Dedric  Reinger

Dedric Reinger

1599104280

Visualizing Decision Trees in Jupyter Notebook

Decision Tree Regressors and Classifiers are being widely used as separate algorithms or as components for more complex models. Visualizing them is crucial in order to correctly understand how certain decisions are being made inside the algorithm, which is always important for business applications.

In this short tutorial, I would like to briefly describe the process of visualizing Decision Tree models from sklearn library. Note: Graphviz installed and configured is required to run the code below.

As a toy dataset, I will be using a well known Iris dataset. Let’s import the main libraries and download the data for the experiment.

Now we will just create a simple Decision Tree Classifier and fit it on the full dataset.

Finally, the interesting steps are coming. We export our fitted decision tree as a .dot file, which is the standard extension for graphviz files. The tree.dot file will be saved in the same directory as your Jupyter Notebook script. Don’t forget to include the _feature_names _parameter, which indicates the feature names, that will be used when displaying the tree.

Now, by running the following command we will convert the .dot file to **.png **file. This will work only in Jupyter Notebook, as the “!” symbol indicates that the command will be performed directly in the console.

After this manipulation, the tree.png file will appear in the same folder. Now we can easily display it, using well-known libraries.

#decision-tree #machine-learning #data-science #python #visualization

What is GEEK

Buddha Community

Visualizing Decision Trees in Jupyter Notebook
Dedric  Reinger

Dedric Reinger

1599104280

Visualizing Decision Trees in Jupyter Notebook

Decision Tree Regressors and Classifiers are being widely used as separate algorithms or as components for more complex models. Visualizing them is crucial in order to correctly understand how certain decisions are being made inside the algorithm, which is always important for business applications.

In this short tutorial, I would like to briefly describe the process of visualizing Decision Tree models from sklearn library. Note: Graphviz installed and configured is required to run the code below.

As a toy dataset, I will be using a well known Iris dataset. Let’s import the main libraries and download the data for the experiment.

Now we will just create a simple Decision Tree Classifier and fit it on the full dataset.

Finally, the interesting steps are coming. We export our fitted decision tree as a .dot file, which is the standard extension for graphviz files. The tree.dot file will be saved in the same directory as your Jupyter Notebook script. Don’t forget to include the _feature_names _parameter, which indicates the feature names, that will be used when displaying the tree.

Now, by running the following command we will convert the .dot file to **.png **file. This will work only in Jupyter Notebook, as the “!” symbol indicates that the command will be performed directly in the console.

After this manipulation, the tree.png file will appear in the same folder. Now we can easily display it, using well-known libraries.

#decision-tree #machine-learning #data-science #python #visualization

Decision Tree -Classification

Decision Tree is one of the most widely used machine learning algorithm. It is a supervised learning algorithm that can perform both classification and regression operations.

As the name suggest, it uses a tree like structure to make decisions on the given dataset. Each internal node of the tree represent a “decision” taken by the model based on any of our attributes. From this decision, we can seperate classes or predict values.

Let’s look at both classification and regression operations one by one.

Classification

In Classification, each leaf node of our decision tree represents a **class **based on the decisions we make on attributes at internal nodes.

To understand it more properly let us look at an example. I have used the Iris Flower Dataset from sklearn library. You can refer the complete code on Github — Here.

Image for post

A node’s samples attribute counts how many training instances it applies to. For example, 100 training instances have a petal width ≤ 2.45 cm .

A node’s value attribute tells you how many training instances of each class this node applies to. For example, the bottom-right node applies to 0 Iris-Setosa, 0 Iris- Versicolor, and 43 Iris-Virginica.

And a node’s gini attribute measures its impurity: a node is “pure” (gini=0) if all training instances it applies to belong to the same class. For example, since the depth-1 left node applies only to Iris-Setosa training instances, it is pure and its gini score is 0.

Image for post

Gini Impurity Formula

where, pⱼ is the ratio of instances of class j among all training instances at that node.

Based on the decisions made at each internal node, we can sketch decision boundaries to visualize the model.

Image for post

But how do we find these boundaries ?

We use Classification And Regression Tree (CART) to find these boundaries.

CART is a simple algorithm that finds an attribute _k _and a threshold _t_ₖat which we get a purest subset. Purest subset means that either of the subsets contain maximum proportion of one particular class. For example, left node at depth-2 has maximum proportion of Iris-Versicolor class i.e 49 of 54. In the _CART cost function, _we split the training set in such a way that we get minimum gini impurity.The CART cost function is given as:

Image for post

After successfully splitting the dataset into two, we repeat the process on either sides of the tree.

We can directly implement Decision tree with the help of Scikit learn library. It has a class called DecisionTreeClassifier which trains the model for us directly and we can adjust the hyperparameters as per our requirements.

Image for post

#machine-learning #decision-tree #decision-tree-classifier #decision-tree-regressor #deep learning

Rodrigo Senra - Jupyter Notebooks

Nosso convidado de hoje é diretor técnico na Work & Co, PhD em Ciências da Computação, já contribuiu com inúmeros projetos open source em Python, ajudou a fundar a Associação Python Brasil e já foi premiado com o Prêmio Dorneles Tremea por contribuições para a comunidade Python Brasil.

#alexandre oliva #anaconda #apache zeppelin #associação python brasil #azure notebooks #beakerx #binder #c++ #closure #colaboratory #donald knuth #fernando pérez #fortran #graphql #guido van rossum #ipython #java #javascript #json #jupyter kenels #jupyter notebooks #jupyterhub #jupyterlab #latex #lisp #literate programming #lua #matlab #perl #cinerdia #prêmio dorneles tremea #python #r #rodrigo senra #scala #spark notebook #tcl #typescript #zope

Jupyter Notebooks in Visual Studio Code

In this episode, Robert is joined by Jeffrey Mew, who shows how you can natively edit Jupyter notebooks in Visual Studio Code. Jupyter is an open-source project that enables you to easily combine Markdown text and executable Python source code on one canvas called a notebook. These notebooks contain live code, equations, visualizations and narrative text. Jeffrey shows how easy it is to work with Jupyter notebooks in Visual Studio Code.

#vscode #python #jupyter #jupyter-notebooks #machine-learning

Johnson Duke

Johnson Duke

1578547566

How to work with Jupyter Notebook in Visual Studio Code

Jupyter Notebook in Visual Studio Code

How to work with Jupyter Notebook in Visual Studio Code

In this episode, Robert is joined by Jeffrey Mew, who shows how you can natively edit Jupyter notebooks in Visual Studio Code. Jupyter is an open-source project that enables you to easily combine Markdown text and executable Python source code on one canvas called a notebook. These notebooks contain live code, equations, visualizations and narrative text. Jeffrey shows how easy it is to work with Jupyter notebooks in Visual Studio Code.

#vscode #jupyter-notebook #jupyter #python #machine-learning