The 10 Algorithms every Machine Learning Engineer should know

The 10 Algorithms every Machine Learning Engineer should know

This article introduces you to the top 10 algorithms that every machine learning engineer must know. ...

This article introduces you to the top 10 algorithms that every machine learning engineer must know. ...

Computers are able to see, hear and learn. Welcome to the future.

And Machine Learning is the future. According to Forbes, Machine learning patents grew at a 34% Rate between 2013 and 2017 and this is only set to increase in coming times. Moreover, a Harvard Business review article called a Data Scientist as the “Sexiest Job of the 21st Century” (And that’s incentive right there!!!).

In these highly dynamic times, there are various machine learning algorithms developed to solve complex real-world problems. These algorithms are highly automated and self-modifying as they continue to improve over time with the addition of an increased amount of data and with minimum human intervention required. So this article deals with the Top 10 Machine Learning algorithms.

But to understand these algorithms, first, the different types they can belong to are explained briefly.

Types of Machine Learning Algorithms –

Machine Learning algorithms can be classified into 3 different types, namely:

Supervised Machine Learning Algorithms:

Imagine a teacher supervising a class. The teacher already knows the correct answers but the learning process doesn’t stop until the students learn the answers as well (poor kids!). This is the essence of Supervised Machine Learning Algorithms. Here, the algorithm is the student that learns from a training dataset and makes predictions that are corrected by the teacher. This learning process continues until the algorithm achieves the required level of performance.

Unsupervised Machine Learning Algorithms:

In this case, there is no teacher for the class and the poor students are left to learn for themselves! This means that for Unsupervised Machine Learning Algorithms, there is no specific answer to be learned and there is no teacher. The algorithm is left unsupervised to find the underlying structure in the data in order to learn more and more about the data itself.

**Reinforcement Machine Learning Algorithms: **

Well, here are hypothetical students learn from their own mistakes over time (that’s like life!). So the Reinforcement Machine Learning Algorithms learn optimal actions through trial and error. This means that the algorithm decides the next action by learning behaviors that are based on its current state and that will maximize the reward in the future.

Top Machine Learning Algorithms

There are specific machine learning algorithms that were developed to handle complex real-world data problems. So, now that we have seen the types of machine learning algorithms, let’s study the top machine learning algorithms that exist and are actually used by data scientists.

1. Naïve Bayes Classifier Algorithm –

What would happen if you had to classify data texts such as a web page, a document or an email manually? Well, you would go mad! But thankfully this task is performed by the Naïve Bayes Classifier Algorithm. This algorithm is based on the Bayes Theorem of Probability(you probably read that in maths) and it allocates the element value to a population from one of the categories that are available.

where, y is class variable and X is a dependent feature vector (of size n) where:

An example of the Naïve Bayes Classifier Algorithm usage is for Email Spam Filtering. Gmail uses this algorithm to classify an email as Spam or Not Spam.

2. K Means Clustering Algorithm –

Let’s imagine that you want to search the term “date” on Wikipedia. Now, “date” can refer to a fruit, a particular day or even a romantic evening with your love!!! So Wikipedia groups the web pages that talk about the same ideas using the K Means Clustering Algorithm (since it is a popular algorithm for cluster analysis).

K Means Clustering Algorithm in general uses K number of clusters to operate on a given data set. In this manner, the output contains K clusters with the input data partitioned among the clusters(As pages with different “date” meanings were partitioned).

3. Support Vector Machine Algorithm –

The Support Vector Machine Algorithm is used for classification or regression problems. In this, the data is divided into different classes by finding a particular line (hyperplane) which separates the data set into multiple classes. The Support Vector Machine Algorithm tries to find the hyperplane that maximizes the distance between the classes (known as margin maximization) as this increases the probability of classifying the data more accurately.

An example of the Support Vector Machine Algorithm usage is for comparison of stock performance for stocks in the same sector. This helps in managing investment making decisions by the financial institutions.

4. Apriori Algorithm –

The Apriori Algorithm generates association rules using the IF_THEN format. This means that IF event A occurs, then event B also occurs with a certain probability. For example: IF a person buys a car, THEN they also buy car insurance. The Apriori Algorithm generates this association rule by observing the number of people who bought car insurance after buying a car.

An example of the Apriori Algorithm usage is for Google auto-complete. When a word is typed in Google, the Apriori Algorithm looks for the associated words that are usually typed after that word and displays the possibilities.

5. Linear Regression Algorithm –

The Linear Regression Algorithm shows the relationship between an independent and a dependent variable. It demonstrates the impact on the dependent variable when the independent variable is changed in any way. So the independent variable is called the explanatory variable and the dependent variable is called the factor of interest.

An example of the Linear Regression Algorithm usage is for risk assessment in the insurance domain. Linear Regression analysis can be used to find the number of claims for customers of multiple ages and then deduce the increased risk as to the age of the customer increases.

6. Logistic Regression Algorithm –

The Logistic Regression Algorithm deals in discrete values whereas the Linear Regression Algorithm handles predictions in continuous values. So, Logistic Regression is suited for binary classification wherein if an event occurs, it is classified as 1 and if not, it is classified as 0. Hence, the probability of a particular event occurrence is predicted based on the given predictor variables.

An example of the Logistic Regression Algorithm usage is in politics to predict if a particular candidate will win or lose a political election.

7. Decision Trees Algorithm –

Suppose that you want to decide the venue for your birthday. So there are many questions that factor in your decision such as “Is the restaurant Italian?”, “Does the restaurant have live music?”, “Is the restaurant close to your house?” etc. Each of these questions has a YES or NO answer that contributes to your decision.

This is what basically happens in the Decision Trees Algorithm. Here all possible outcomes of a decision are shown using a tree branching methodology. The internal nodes are tests on various attributes, the branches of the tree are the outcomes of the tests and the leaf nodes are the decision made after computing all of the attributes.

An example of the Decision Trees Algorithm usage is in the banking industry to classify loan applicants by their probability of defaulting said loan payments.

8. Random Forests Algorithm –

The Random Forests Algorithm handles some of the limitations of Decision Trees Algorithm, namely that the accuracy of the outcome decreases when the number of decisions in the tree increases.

So, in the Random Forests Algorithm, there are multiple decision trees that represent various statistical probabilities. All of these trees are mapped to a single tree known as the CART model. (Classification and Regression Trees). In the end, the final prediction for the Random Forests Algorithm is obtained by polling the results of all the decision trees.

An example of the Random Forests Algorithm usage is in the automobile industry to predict the future breakdown of any particular automobile part.

9. K Nearest Neighbours Algorithm –

The K Nearest Neighbours Algorithm divides the data points into different classes based on a similar measure such as the distance function. Then a prediction is made for a new data point by searching through the entire data set for the K most similar instances (the neighbors) and summarizing the output variable for these K instances. For regression problems, this might be the mean of the outcomes and for classification problems, this might be the mode (most frequent class).

The K Nearest Neighbours Algorithm can require a lot of memory or space to store all of the data, but only performs a calculation (or learns) when a prediction is needed, just in time.

10. Artificial Neural Networks Algorithm –

The human brain contains neurons that are the basis of our retentive power and sharp wit(At least for some of us!) So the Artificial Neural Networks try to replicate the neurons in the human brain by creating nodes that are interconnected to each other. These neurons take in information through another neuron, perform various actions as required and then transfer the information to another neuron as output.

An example of Artificial Neural Networks is Human facial recognition. Images with human faces can be identified and differentiated from “non-facial” images. However, this could take multiple hours depending on the number of images in the database whereas the human mind can do this instantly.

Machine Learning Full Course - Learn Machine Learning

Machine Learning Full Course - Learn Machine Learning

This complete Machine Learning full course video covers all the topics that you need to know to become a master in the field of Machine Learning.

Machine Learning Full Course | Learn Machine Learning | Machine Learning Tutorial

It covers all the basics of Machine Learning (01:46), the different types of Machine Learning (18:32), and the various applications of Machine Learning used in different industries (04:54:48).This video will help you learn different Machine Learning algorithms in Python. Linear Regression, Logistic Regression (23:38), K Means Clustering (01:26:20), Decision Tree (02:15:15), and Support Vector Machines (03:48:31) are some of the important algorithms you will understand with a hands-on demo. Finally, you will see the essential skills required to become a Machine Learning Engineer (04:59:46) and come across a few important Machine Learning interview questions (05:09:03). Now, let's get started with Machine Learning.

Below topics are explained in this Machine Learning course for beginners:

  1. Basics of Machine Learning - 01:46

  2. Why Machine Learning - 09:18

  3. What is Machine Learning - 13:25

  4. Types of Machine Learning - 18:32

  5. Supervised Learning - 18:44

  6. Reinforcement Learning - 21:06

  7. Supervised VS Unsupervised - 22:26

  8. Linear Regression - 23:38

  9. Introduction to Machine Learning - 25:08

  10. Application of Linear Regression - 26:40

  11. Understanding Linear Regression - 27:19

  12. Regression Equation - 28:00

  13. Multiple Linear Regression - 35:57

  14. Logistic Regression - 55:45

  15. What is Logistic Regression - 56:04

  16. What is Linear Regression - 59:35

  17. Comparing Linear & Logistic Regression - 01:05:28

  18. What is K-Means Clustering - 01:26:20

  19. How does K-Means Clustering work - 01:38:00

  20. What is Decision Tree - 02:15:15

  21. How does Decision Tree work - 02:25:15 

  22. Random Forest Tutorial - 02:39:56

  23. Why Random Forest - 02:41:52

  24. What is Random Forest - 02:43:21

  25. How does Decision Tree work- 02:52:02

  26. K-Nearest Neighbors Algorithm Tutorial - 03:22:02

  27. Why KNN - 03:24:11

  28. What is KNN - 03:24:24

  29. How do we choose 'K' - 03:25:38

  30. When do we use KNN - 03:27:37

  31. Applications of Support Vector Machine - 03:48:31

  32. Why Support Vector Machine - 03:48:55

  33. What Support Vector Machine - 03:50:34

  34. Advantages of Support Vector Machine - 03:54:54

  35. What is Naive Bayes - 04:13:06

  36. Where is Naive Bayes used - 04:17:45

  37. Top 10 Application of Machine Learning - 04:54:48

  38. How to become a Machine Learning Engineer - 04:59:46

  39. Machine Learning Interview Questions - 05:09:03

Machine Learning | Machine Learning Guide for Beginners

Machine Learning | Machine Learning Guide for Beginners

Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering.

Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering.

In the following article, I am going to give a brief introduction to each of these three problems and will include a walkthrough in the popular python library scikit-learn.

Before I start I’ll give a brief explanation for the meaning behind the terms supervised and unsupervised learning.

Supervised Learning: In supervised learning, you have a known set of inputs (features) and a known set of outputs (labels). Traditionally these are known as X and y. The goal of the algorithm is to learn the mapping function that maps the input to the output. So that when given new examples of X the machine can correctly predict the corresponding y labels.

Unsupervised Learning: In unsupervised learning, you only have a set of inputs (X) and no corresponding labels (y). The goal of the algorithm is to find previously unknown patterns in the data. Quite often these algorithms are used to find meaningful clusters of similar samples of X so in effect finding the categories intrinsic to the data.

Classification

In classification, the outputs (y) are categories. These can be binary, for example, if we were classifying spam email vs not spam email. They can also be multiple categories such as classifying species of flowers, this is known as multiclass classification.

Let’s walk through a simple example of classification using scikit-learn. If you don’t already have this installed it can be installed either via pip or conda as outlined here.

Scikit-learn has a number of datasets that can be directly accessed via the library. For ease in this article, I will be using these example datasets throughout. To illustrate classification I will use the wine dataset which is a multiclass classification problem. In the dataset, the inputs (X) consist of 13 features relating to various properties of each wine type. The known outputs (y) are wine types which in the dataset have been given a number 0, 1 or 2.

The imports I am using for all the code in this article are shown below.

import pandas as pd
import numpy as np
from sklearn.datasets import load_wine
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import f1_score
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC, LinearSVC, NuSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn import linear_model
from sklearn.linear_model import ElasticNetCV
from sklearn.svm import SVR
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer
from yellowbrick.cluster import SilhouetteVisualizer

In the below code I am downloading the data and converting to a pandas data frame.

wine = load_wine()
wine_df = pd.DataFrame(wine.data, columns=wine.feature_names)
wine_df['TARGET'] = pd.Series(wine.target)

The next stage in a supervised learning problem is to split the data into test and train sets. The train set can be used by the algorithm to learn the mapping between inputs and outputs, and then the reserved test set can be used to evaluate how well the model has learned this mapping. In the below code I am using the scikit-learn model_selection function train_test_split to do this.

X_w = wine_df.drop(['TARGET'], axis=1)
y_w = wine_df['TARGET']
X_train_w, X_test_w, y_train_w, y_test_w = train_test_split(X_w, y_w, test_size=0.2)

In the next step, we need to choose the algorithm that will be best suited to learn the mapping in your chosen dataset. In scikit-learn there are many different algorithms to choose from, all of which use different functions and methods to learn the mapping, you can view the full list here.

To determine the best model I am running the following code. I am training the model using a selection of algorithms and obtaining the F1-score for each one. The F1 score is a good indicator of the overall accuracy of a classifier. I have written a detailed description of the various metrics that can be used to evaluate a classifier here.

classifiers = [
    KNeighborsClassifier(3),
    SVC(kernel="rbf", C=0.025, probability=True),
    NuSVC(probability=True),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    AdaBoostClassifier(),
    GradientBoostingClassifier()
    ]
for classifier in classifiers:
    model = classifier
    model.fit(X_train_w, y_train_w)  
    y_pred_w = model.predict(X_test_w)
    print(classifier)
    print("model score: %.3f" % f1_score(y_test_w, y_pred_w, average='weighted'))

A perfect F1 score would be 1.0, therefore, the closer the number is to 1.0 the better the model performance. The results above suggest that the Random Forest Classifier is the best model for this dataset.

Regression

In regression, the outputs (y) are continuous values rather than categories. An example of regression would be predicting how many sales a store may make next month, or what the future price of your house might be.

Again to illustrate regression I will use a dataset from scikit-learn known as the boston housing dataset. This consists of 13 features (X) which are various properties of a house such as the number of rooms, the age and crime rate for the location. The output (y) is the price of the house.

I am loading the data using the code below and splitting it into test and train sets using the same method I used for the wine dataset.

boston = load_boston()
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
boston_df['TARGET'] = pd.Series(boston.target)
X_b = boston_df.drop(['TARGET'], axis=1)
y_b = boston_df['TARGET']
X_train_b, X_test_b, y_train_b, y_test_b = train_test_split(X_b, y_b, test_size=0.2)

We can use this cheat sheet to see the available algorithms suited to regression problems in scikit-learn. We will use similar code to the classification problem to loop through a selection and print out the scores for each.

There are a number of different metrics used to evaluate regression models. These are all essentially error metrics and measure the difference between the actual and predicted values achieved by the model. I have used the root mean squared error (RMSE). For this metric, the closer to zero the value is the better the performance of the model. This article gives a really good explanation of error metrics for regression problems.

regressors = [
    linear_model.Lasso(alpha=0.1),
    linear_model.LinearRegression(),
    ElasticNetCV(alphas=None, copy_X=True, cv=5, eps=0.001, fit_intercept=True,
       l1_ratio=0.5, max_iter=1000, n_alphas=100, n_jobs=None,
       normalize=False, positive=False, precompute='auto', random_state=0,
       selection='cyclic', tol=0.0001, verbose=0),
    SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
    gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
    tol=0.001, verbose=False),
    linear_model.Ridge(alpha=.5)                
    ]
for regressor in regressors:
    model = regressor
    model.fit(X_train_b, y_train_b)  
    y_pred_b = model.predict(X_test_b)
    print(regressor)
    print("mean squared error: %.3f" % sqrt(mean_squared_error(y_test_b, y_pred_b)))


The RMSE score suggests that either the linear regression and ridge regression algorithms perform best for this dataset.

Unsupervised learning

There are a number of different types of unsupervised learning but for simplicity here I am going to focus on the clustering methods. There are many different algorithms for clustering all of which use slightly different techniques to find clusters of inputs.

Probably one of the most widely used methods is Kmeans. This algorithm performs an iterative process whereby a specified number of randomly generated means are initiated. A distance metric, Euclidean distance is calculated for each data point from the centroids, thus creating clusters of similar values. The centroid of each cluster then becomes the new mean and this process is repeated until the optimum result has been achieved.

Let’s use the wine dataset we used in the classification task, with the y labels removed, and see how well the k-means algorithm can identify the wine types from the inputs.

As we are only using the inputs for this model I am splitting the data into test and train using a slightly different method.

np.random.seed(0)
msk = np.random.rand(len(X_w)) < 0.8
train_w = X_w[msk]
test_w = X_w[~msk]

As Kmeans is reliant on the distance metric to determine the clusters it is usually necessary to perform feature scaling (ensuring that all features have the same scale) before training the model. In the below code I am using the MinMaxScaler to scale the features so that all values fall between 0 and 1.

x = train_w.values
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
X_scaled = pd.DataFrame(x_scaled,columns=train_w.columns)

With K-means you have to specify the number of clusters the algorithm should use. So one of the first steps is to identify the optimum number of clusters. This is achieved by iterating through a number of values of k and plotting the results on a chart. This is known as the Elbow method as it typically produces a plot with a curve that looks a little like the curve of your elbow. The yellowbrick library (which is a great library for visualising scikit-learn models and can be pip installed) has a really nice plot for this. The code below produces this visualisation.

model = KMeans()
visualizer = KElbowVisualizer(model, k=(1,8))
visualizer.fit(X_scaled)       
visualizer.show()

Ordinarily, we wouldn’t already know how many categories we have in a dataset where we are using a clustering technique. However, in this case, we know that there are three wine types in the data — the curve has correctly selected three as the optimum number of clusters to use in the model.

The next step is to initialise the K-means algorithm and fit the model to the training data and evaluate how effectively the algorithm has clustered the data.

One method used for this is known as the silhouette score. This measures the consistency of values within the clusters. Or in other words how similar to each other the values in each cluster are, and how much separation there is between the clusters. The silhouette score is calculated for each value and will range from -1 to +1. These values are then plotted to form a silhouette plot. Again yellowbrick provides a simple way to construct this type of plot. The code below creates this visualisation for the wine dataset.

model = KMeans(3, random_state=42)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')
visualizer.fit(X_scaled)      
visualizer.show()

A silhouette plot can be interpreted in the following way:

  • The closer the mean score (which is the red dotted line in the above) is to +1 the better matched the data points are within the cluster.
  • Data points with a score of 0 are very close to the decision boundary for another cluster (so the separation is low).
  • Negative values indicate that the data points may have been assigned to the wrong cluster.
  • The width of each cluster should be reasonably uniform if they aren’t then the incorrect value of k may have been used.

The plot for the wine data set above shows that cluster 0 may not be as consistent as the others due to most data points being below the average score and a few data points having a score below 0.

Silhouette scores can be particularly useful in comparing one algorithm against another or different values of k.

In this post, I wanted to give a brief introduction to each of the three types of machine learning. There are many other steps involved in all of these processes including feature engineering, data processing and hyperparameter optimisation to determine both the best data preprocessing techniques and the best models to use.

Thanks for reading!