Machine Learning: Algorithm Classification Overview

In this post, we are going to have a look at the most widely used machine learning algorithms. There is a huge variety of them, and it is easy to feel confused when you hear such terms as “instance-based learning algorithms” and “perceptron”.

Usually, all machine learning algorithms are divided into groups based on either their learning style, function, or the problems they solve. In this post, you will find a classification based on learning style. I will also mention the common tasks that these algorithms help to solve.

The number of machine learning algorithms that are used today is large, and I will not mention 100% of them. However, I would like to provide an overview of the most commonly used ones.

Supervised learning algorithms

Top supervised machine learning algorithms

If you’re not familiar with such terms as “supervised learning” and “unsupervised learning”, check out our AI vs. ML post where this topic is covered in detail. Now, let’s get familiar with the algorithms.

1. Classification algorithms

Naive Bayes

ML classifiers

Bayesian algorithms are a family of probabilistic classifiers used in ML based on applying Bayes’ theorem.

Naive Bayes classifier was one of the first algorithms used for machine learning. It is suitable for binary and multiclass classification and allows for making predictions and forecast data based on historical results. A classic example is spam filtering systems that used Naive Bayes up till 2010 and showed satisfactory results. However, when Bayesian poisoning was invented, programmers started to think of other ways to filter data.

Using Bayes’ theorem, it is possible to tell how the occurrence of an event impacts the probability of another event.

For example, this algorithm calculates the probability that a certain email is or isn’t spam based on the typical words used. Common spam words are “offer”, “order now”, or “additional income”. If the algorithm detects these words, there is a high possibility that the email is spam.

Naive Bayes assumes that the features are independent. Therefore, the algorithm is called naive.

Multinomial Naive Bayes

Apart from Naive Bayes classifier, there are other algorithms in this group. For example, Multinomial Naive Bayes, which is usually applied for document classification based on the frequency of certain words present in the document.

Bayesian algorithms are still used for text categorization and fraud detection. They can also be applied for machine vision (for example, face detection), market segmentation, and bioinformatics.

Logistic regression

Even though the name might seem contra-intuitive, logistic regression is actually a type of classification algorithm.

Logistic regression is a model that makes predictions using a logistic function to find the dependency between the output and input variables. Statquest made a great video where they explain the difference between linear and logistic regression taking as the example obese mice.

Decision trees

A decision tree is a simple way to visualize a decision-making model in the form of a tree. The advantages of decision trees are that they are easy to understand, interpret and visualize. Also, they demand little effort for data preparation.

However, they also have a big disadvantage. The trees can be unstable because of even the smallest variations (variance) in data. It is also possible to create over-complex trees that do not generalize well. This is called overfitting. Bagging, boosting, and regularization help to fight this problem. We are going to talk about them later in the post.

The elements of every decision tree are:

Root node that asks the main question. It has the arrows pointing down from it but no arrows pointing to it. For example, imagine you are building a tree for deciding what kind of pasta you should have for dinner.
Branches. A subsection of a tree is called a branch or sometimes a sub-tree.
Decision nodes. These are the subnodes for the root node that can also be splitting into more nodes. Your decision nodes can be “carbonara?” or “with mushrooms?”.
Leaves or Terminal nodes. These nodes do not split. They represent final decisions or predictions.

Also, it is important to mention splitting. This is the process of dividing a node into subnodes. For instance, if you’re not a vegetarian, carbonara is okay. But if you are, eat pasta with mushrooms. There is also a process of node removal called pruning.

Decision tree algorithms are referred to as CART (Classification and Regression Trees). Decision trees can work with categorical or numerical data.

Regression trees are used when the variables have numerical value.
Classification trees can be applied when the data is categorical (classes).

Decision trees are quite intuitive to understand and use. That is why tree diagrams are commonly applied in a broad range of industries and disciplines. GreyAtom provides a broad overview of different types of decision trees and their practical applications.

SVM (Support Vector Machine)

Support vector machines are another group of algorithms used for classification and, sometimes, regression tasks. SVM is great because it gives quite accurate results with minimum computation power.

The goal of the SVM is to find a hyperplane in an N-dimensional space (where N corresponds with the number of features) that distinctly classifies the data points. The accuracy of the results directly correlates with the hyperplane that we choose. We should find a plane that has the maximum distance between data points of both classes.

This hyperplane is graphically represented as a line that separates one class from another. Data points that fall on different sides of the hyperplane are attributed to different classes.

#algorithms