Introducing a customizable and interactable Decision Tree-Framework written in Python. This story will introduce yet another implementation of Decision Trees, which I wrote as part of my thesis.
Introducing a customizable and interactable Decision Tree-Framework written in Python
Fast Track
_examples_
directory of the repository or directly here (every illustration you see here will be generated in the notebook. You will be able to create them on your own)This story will introduce yet another implementation of Decision Trees, which I wrote as part of my thesis. The work will be divided into three chapters as follows:
Firstly, I will try to motivate why I have decided to take my time to come up with an own implementation of Decision Trees; I will list some of its featuresbut also will list the _disadvantages _of the current implementation.
Secondly, I will guide you through the basic usage of HDTreeusing code snippets and explaining some details along the way.
Lastly, there will be some hints on how to customize and extend the _HDTree _with your own chunks of ideas.
However, this article will not *guide you through all of the *basics of Decision Trees. There are really plenty of resources out there [1][2][3][16]. I think there is no need in repeating all of that again. Others have done that. I will not be able to do it better. You don’t need to be an *expert *in Decision Trees to understand this article. A basic level of understanding should be sufficient to follow up. However, some experience in the ML domain is a plus.
For my work I came along working with Decision Trees. My actual goal is to implement an human-centric ML-model, where _HDTree _(Human Decision Tree for that matter) is an optional ingredient which is used as part of an actual user interface for that model. While this story solely focuses on HDTree, I might write a follow-up describing the other components in detail.
Features of HDTree & Comparison with scikit learn Decision Trees
Naturally, I stumbled upon the scikit-learn-implementation⁴ of decision trees. I guess many practitioners do. And lets make something clear from the beginning: nothing is wrong with it.
The sckit-learn implementation has a lot of pros:
That said, surely it also has some shortcomings:
Features HDTree
HDTree comes with a solution to most of the shortcomings mentioned in the above list, while sacrificing many of the advantages of the scikit-learn implementation. We will come back to those points later, so don’t worry if you do not understand every part of the following list yet:
👍 interact with the learning-behavior
👍 core components are modular and fairly easy to extend (implement an interface)
👍 purely written in Python (more approachable)
👍 rich visualization
👍 support categorical data
👍 support for missing values
👍 support for multivariate splits
👍 easy interface to navigate through the tree structure
👍 supports for** n-ary splits** (> 2 child nodes)
👍 textual representations of decision paths
👍 encourages explainability by printing human-readable text
👎 slow
👎 not battle-tested (it _will _have bugs)
👎 mediocre software quality
👎 not so many pruning options (it supports some basic options, though)
⚠️ Although the disadvantages seem to be not too numerous, they are critical. Let us make that clear right away: Do not throw big data at it. You will wait forever. Do not use it in production. It may break unexpectedly. You have been warned!⚠️
Some of these problems may get fixed over time. However, the training speed probably will remain slow (inference is okay, though). You will have to come up with a better solution to fix that. You are very welcome to contribute 😃.
That said, what would be possible use cases?
decision-tree machine-learning data-science data-visualization data analytic
Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.
In the right side, we have a visualization of the output we get when we use a decision tree algorithm on data to predict the possibilities.
This article compiles the 38 top Python libraries for data science, data visualization & machine learning,
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
Visual Analytics and Advanced Data Visualization - How CanvasJS help enterprises in creating custom Interactive and Analytical Dashboards for advanced visual analytics for data visualization