The term ‘machine learning’ is used a lot these days. So, what exactly is it and why is it so popular? Simply put, machine learning is a sub-field of artificial intelligence, where we teach a machine how to learn with the help of input data.
Machine Learning Tutorial: What is Machine Learning?
Seems like you would have stumbled upon the term machine learning and must be wondering what exactly it is. Well, this machine learning tutorial will clear out all of your confusion!
Machine learning is a field of artificial intelligence with the help of which you can perform magic! Yes, you read it right. Let’s take some real-life examples to understand this. I believe all of you must have heard of Google’s self-driving car. A car which drives by itself without any human support; that is just amazing, isn’t it?
Now, how about virtual personal assistants such as Apple’s Siri or Microsoft’s Cortana? If you ask Siri what is the distance between Earth and Moon, it will immediately reply that the distance is 384,400km.
You also must have used Google maps. If you want to go from New Jersey to New York via road, google maps will show you the distance between these two places, the shortest route and also how much traffic is there along the road.
Now, you would agree with me that all of these are some magical applications, and the magic behind these applications is machine learning. So, simply put, machine learning is a sub-domain of artificial intelligence, where a machine is provided data to learn and make insightful decisions.
Now, that we have understood what is machine learning, let’s go ahead in this machine learning tutorial and look at the types of machine learning algorithms:
Now, let’s go ahead and understand each of these machine learning algorithms comprehensively.
Machine Learning Tutorial: Supervised Learning
In supervised learning the machine learns from data which is labelled i.e. the result for the input data is already known or in other words you can say that there is an input variable and an output variable in supervised learning and we have to map a function between the input and the output. Here the input variable is known as independent variable and the output variable is known as dependent variable.
Let’s take this example to understand supervised learning in a better way.
So, this is an apple, isn’t it? Now, how do you know, it’s an apple? Well, as a kid, you would have come across an apple and you were told that it’s an apple and your brain learnt that anything which looks like that is an apple.
Now, let’s apply the same analogy to a machine. Let’s say we feed in different images of apples to the machine and all of these images have the label “apple” associated with them.
Similarly, we will feed in different images of oranges to the machine and all of these images would have the label “orange” associated with them. So, here we are feeding in input data to the machine which is labelled.
So, this part in supervised learning, where the machine learns all the features of the input data along with it’s labels is known as ‘training’.
Once, the training is done, it will be fed new data or test data to determine, how well the training has been done.
So, here, if we feed in this new image of orange to the machine without it’s label, the machine should be able to predict the correct label based on all of its training.
This is the concept of supervised learning, where we train the machine using labelled data and then use this training to find new insights.
Moving on in this machine learning tutorial, we will understand these two comprehensively.
Since Regression is a supervised learning algorithm, there will be an input variable as well as an output variable and the point to keep in mind is that the output variable is a continuous numerical, i.e. the dependent variable is a continuous numerical.
Watch this complete Machine Learning Tutorial Video: https://www.youtube.com/watch?v=4gqZLajDWh8&feature=youtu.be
Let’s take this example to understand regression:
Let’s say you have two variables, “Number of hours studied” & “Number of marks scored”. Here we want to understand how does the number of marks scored by a student change with number of hours studied by the student, i.e. “Marks scored” is the dependent variable and “Hours studied” is the independent variable.
Now, based on this data, I want to know for how many hours should a student study to score exactly 60 marks. So, this is where regression techniques come in. The regression model would understand that there is an increment of 10 marks for every extra hour studied and to score 60 marks the student has to study for 6 hours.
You need to note that “marks scored” is the dependent variable and it is a continuous numerical.
So, this is how regression algorithms work. Now, let’s move onto the next type of supervised learning algorithms which are classification algorithms.
Classification algorithms also need both the input data as well as the output data. Here, the output variable or the dependent variable should be categorical in nature.
Let’s take this example to understand classification.
Consider these three variables, “Person has lung cancer or not”, “Weight of the person”, “Number of cigarettes smoked in a day”. Here, we want to understand does the person have lung cancer based on the weight of the person and the number of cigarettes he/she smokes in a day, i.e. “Having lung cancer” is the dependent variable and “weight” and “No of cigarettes smoked” are the independent variables.
Again, you need to note here that “Having lung cancer” is a categorical variable, which has two categories, “yes” and “No”. Based on the independent variables, we classify whether the person has lung cancer or not.
Now, there are a variety of classification algorithms available such as:
Support Vector Machine
Let’s go ahead and understand one of these algorithms -> “Decision Tree”.
Decision Tree Classifier
Decision tree is a very popular machine learning classifier. So, a decision tree as the name states, has an inverted tree like structure. The top most node in the tree is known as the root node and the nodes at the bottom of the tree are known as the leaf nodes. Every node has a test condition and based on that test condition, the tree splits into either it’s left child or right child.
Let’s go through this example on decision tree. Here, we are trying to determine whether a person would watch the movie “Avengers” based on a series of test conditions.
Here, the test condition on the root node is “likes action films”, so, if it evaluates to true, you go to the left child, else to the right child. Now, if you actually do like action films, then on the left child, there is another test condition, “Movie length greater than 2 hours”, so, if this evaluates to true, you go again go the left child, i.e. you are fine watching a movie which is greater than 2 hours. Again, when you go to the left child, there is another test condition, “Likes Robert Downey Jr”, and if this evaluates to true, it means that the person is interested to watch “Avengers”. So, this is how a decision tree classifier works.
Now that we have understood what exactly is supervised learning, let’s move ahead in this blog on machine learning tutorial and understand unsupervised learning.
In unsupervised Learning the machine learns from unlabeled data, i.e. the result for the input data is not known beforehand. Here, the algorithm tries to determine the underlying structure of the data.
Now, let’s go through this example to see how does unsupervised learning work.
Here, we have a bunch of fruits and none of these fruits have labels associated with them. Now, let’s take these fruits and feed them to an unsupervised learning model. So, the model determines the features associated with the data and understands that all the apples are similar in nature and thus groups them together. Similarly, it understands that all the bananas have the same features and thus group them together and same is the case with all the mangoes.
So, you need to understand that, even though there are no class labels associated with the data, the model was able to group them into different clusters on the basis of similarity of the data.
These are some unsupervised learning algorithms:
Principal Component Analysis
Further in this machine learning tutorial, we go through the next type of machine learning algorithm – Semi-supervised learning.
Machine Learning Tutorial: Semi-Supervised Learning
In semi-supervised learning the machine learns from a combination of labelled and unlabeled data, i.e. you can consider semi-supervised learning to be an amalgamation of both supervised learning and unsupervised learning.
Let’s go through this example. Here, we have a bunch of different items -> phones, apples, books and chairs. Now, as you see over here, only a minor proportion of the items are labelled and the rest are unlabeled. Here, the basic idea is to start off by grouping similar data together. So, all the phones would be put into one group, apples into another and same is the case with books and chairs.
Now we have four clusters containing similar data in them. Here, the algorithm assumes that all the data points which are in proximity tend to have the same label associated with them. Now, the semi-supervised algorithm uses the existing labelled data to assign labels to the rest of the unlabeled data.
So, this is the underlying concept of semi-supervised learning. Now, in this machine learning tutorial, let’s head onto the final type of machine learning algorithm, which is re-inforcement learning
In re-inforcement learning the algorithm learns through a system of rewards and punishment and the goal here is to maximize the total reward. So, let’s go through this example to understand re-inforcement learning.
So, here we have a self-driving car which is supposed to reach its destination without hitting any barricades. So, here, the self-driving car is the agent and the road is the environment.CAR-2-Intellipaat
Now, the car takes an action and goes straight, but when it goes straight, it directly hits the barricade. Now, since the car has taken a wrong action, it will be punished.
So, the car realizes that going straight is wrong and it has to go right. So, when it goes right, it will be given a reward. So, this process continues and the car learns how to drive by itself without hitting any barricades.
And this brings us to the end of this “Machine Learning Tutorial”. We comprehensively understood what is machine learning and then we looked at the types of machine learning.
Now, if you are interested in doing an end-to-end certification course in Machine Learning, you can check out Intellipaat’s Machine Learning Course with Python.
Originally published at www.intellipaat.com on August 26, 2019.
This complete Machine Learning full course video covers all the topics that you need to know to become a master in the field of Machine Learning.
Machine Learning Full Course | Learn Machine Learning | Machine Learning Tutorial
It covers all the basics of Machine Learning (01:46), the different types of Machine Learning (18:32), and the various applications of Machine Learning used in different industries (04:54:48).This video will help you learn different Machine Learning algorithms in Python. Linear Regression, Logistic Regression (23:38), K Means Clustering (01:26:20), Decision Tree (02:15:15), and Support Vector Machines (03:48:31) are some of the important algorithms you will understand with a hands-on demo. Finally, you will see the essential skills required to become a Machine Learning Engineer (04:59:46) and come across a few important Machine Learning interview questions (05:09:03). Now, let's get started with Machine Learning.
Below topics are explained in this Machine Learning course for beginners:
Basics of Machine Learning - 01:46
Why Machine Learning - 09:18
What is Machine Learning - 13:25
Types of Machine Learning - 18:32
Supervised Learning - 18:44
Reinforcement Learning - 21:06
Supervised VS Unsupervised - 22:26
Linear Regression - 23:38
Introduction to Machine Learning - 25:08
Application of Linear Regression - 26:40
Understanding Linear Regression - 27:19
Regression Equation - 28:00
Multiple Linear Regression - 35:57
Logistic Regression - 55:45
What is Logistic Regression - 56:04
What is Linear Regression - 59:35
Comparing Linear & Logistic Regression - 01:05:28
What is K-Means Clustering - 01:26:20
How does K-Means Clustering work - 01:38:00
What is Decision Tree - 02:15:15
How does Decision Tree work - 02:25:15
Random Forest Tutorial - 02:39:56
Why Random Forest - 02:41:52
What is Random Forest - 02:43:21
How does Decision Tree work- 02:52:02
K-Nearest Neighbors Algorithm Tutorial - 03:22:02
Why KNN - 03:24:11
What is KNN - 03:24:24
How do we choose 'K' - 03:25:38
When do we use KNN - 03:27:37
Applications of Support Vector Machine - 03:48:31
Why Support Vector Machine - 03:48:55
What Support Vector Machine - 03:50:34
Advantages of Support Vector Machine - 03:54:54
What is Naive Bayes - 04:13:06
Where is Naive Bayes used - 04:17:45
Top 10 Application of Machine Learning - 04:54:48
How to become a Machine Learning Engineer - 04:59:46
Machine Learning Interview Questions - 05:09:03
Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering.
Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering.
In the following article, I am going to give a brief introduction to each of these three problems and will include a walkthrough in the popular python library scikit-learn.
Before I start I’ll give a brief explanation for the meaning behind the terms supervised and unsupervised learning.
Supervised Learning: In supervised learning, you have a known set of inputs (features) and a known set of outputs (labels). Traditionally these are known as X and y. The goal of the algorithm is to learn the mapping function that maps the input to the output. So that when given new examples of X the machine can correctly predict the corresponding y labels.
Unsupervised Learning: In unsupervised learning, you only have a set of inputs (X) and no corresponding labels (y). The goal of the algorithm is to find previously unknown patterns in the data. Quite often these algorithms are used to find meaningful clusters of similar samples of X so in effect finding the categories intrinsic to the data.
In classification, the outputs (y) are categories. These can be binary, for example, if we were classifying spam email vs not spam email. They can also be multiple categories such as classifying species of flowers, this is known as multiclass classification.
Let’s walk through a simple example of classification using scikit-learn. If you don’t already have this installed it can be installed either via pip or conda as outlined here.
Scikit-learn has a number of datasets that can be directly accessed via the library. For ease in this article, I will be using these example datasets throughout. To illustrate classification I will use the wine dataset which is a multiclass classification problem. In the dataset, the inputs (X) consist of 13 features relating to various properties of each wine type. The known outputs (y) are wine types which in the dataset have been given a number 0, 1 or 2.
The imports I am using for all the code in this article are shown below.
import pandas as pd import numpy as np from sklearn.datasets import load_wine from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn import preprocessing from sklearn.metrics import f1_score from sklearn.metrics import mean_squared_error from math import sqrt from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC, LinearSVC, NuSVC from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis from sklearn import linear_model from sklearn.linear_model import ElasticNetCV from sklearn.svm import SVR from sklearn.cluster import KMeans from yellowbrick.cluster import KElbowVisualizer from yellowbrick.cluster import SilhouetteVisualizer
In the below code I am downloading the data and converting to a pandas data frame.
wine = load_wine() wine_df = pd.DataFrame(wine.data, columns=wine.feature_names) wine_df['TARGET'] = pd.Series(wine.target)
The next stage in a supervised learning problem is to split the data into test and train sets. The train set can be used by the algorithm to learn the mapping between inputs and outputs, and then the reserved test set can be used to evaluate how well the model has learned this mapping. In the below code I am using the scikit-learn model_selection function
train_test_split to do this.
X_w = wine_df.drop(['TARGET'], axis=1) y_w = wine_df['TARGET'] X_train_w, X_test_w, y_train_w, y_test_w = train_test_split(X_w, y_w, test_size=0.2)
In the next step, we need to choose the algorithm that will be best suited to learn the mapping in your chosen dataset. In scikit-learn there are many different algorithms to choose from, all of which use different functions and methods to learn the mapping, you can view the full list here.
To determine the best model I am running the following code. I am training the model using a selection of algorithms and obtaining the F1-score for each one. The F1 score is a good indicator of the overall accuracy of a classifier. I have written a detailed description of the various metrics that can be used to evaluate a classifier here.
classifiers = [ KNeighborsClassifier(3), SVC(kernel="rbf", C=0.025, probability=True), NuSVC(probability=True), DecisionTreeClassifier(), RandomForestClassifier(), AdaBoostClassifier(), GradientBoostingClassifier() ] for classifier in classifiers: model = classifier model.fit(X_train_w, y_train_w) y_pred_w = model.predict(X_test_w) print(classifier) print("model score: %.3f" % f1_score(y_test_w, y_pred_w, average='weighted'))
A perfect F1 score would be 1.0, therefore, the closer the number is to 1.0 the better the model performance. The results above suggest that the Random Forest Classifier is the best model for this dataset.
In regression, the outputs (y) are continuous values rather than categories. An example of regression would be predicting how many sales a store may make next month, or what the future price of your house might be.
Again to illustrate regression I will use a dataset from scikit-learn known as the boston housing dataset. This consists of 13 features (X) which are various properties of a house such as the number of rooms, the age and crime rate for the location. The output (y) is the price of the house.
I am loading the data using the code below and splitting it into test and train sets using the same method I used for the wine dataset.
boston = load_boston() boston_df = pd.DataFrame(boston.data, columns=boston.feature_names) boston_df['TARGET'] = pd.Series(boston.target) X_b = boston_df.drop(['TARGET'], axis=1) y_b = boston_df['TARGET'] X_train_b, X_test_b, y_train_b, y_test_b = train_test_split(X_b, y_b, test_size=0.2)
We can use this cheat sheet to see the available algorithms suited to regression problems in scikit-learn. We will use similar code to the classification problem to loop through a selection and print out the scores for each.
There are a number of different metrics used to evaluate regression models. These are all essentially error metrics and measure the difference between the actual and predicted values achieved by the model. I have used the root mean squared error (RMSE). For this metric, the closer to zero the value is the better the performance of the model. This article gives a really good explanation of error metrics for regression problems.
regressors = [ linear_model.Lasso(alpha=0.1), linear_model.LinearRegression(), ElasticNetCV(alphas=None, copy_X=True, cv=5, eps=0.001, fit_intercept=True, l1_ratio=0.5, max_iter=1000, n_alphas=100, n_jobs=None, normalize=False, positive=False, precompute='auto', random_state=0, selection='cyclic', tol=0.0001, verbose=0), SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False), linear_model.Ridge(alpha=.5) ] for regressor in regressors: model = regressor model.fit(X_train_b, y_train_b) y_pred_b = model.predict(X_test_b) print(regressor) print("mean squared error: %.3f" % sqrt(mean_squared_error(y_test_b, y_pred_b)))
The RMSE score suggests that either the linear regression and ridge regression algorithms perform best for this dataset.
There are a number of different types of unsupervised learning but for simplicity here I am going to focus on the clustering methods. There are many different algorithms for clustering all of which use slightly different techniques to find clusters of inputs.
Probably one of the most widely used methods is Kmeans. This algorithm performs an iterative process whereby a specified number of randomly generated means are initiated. A distance metric, Euclidean distance is calculated for each data point from the centroids, thus creating clusters of similar values. The centroid of each cluster then becomes the new mean and this process is repeated until the optimum result has been achieved.
Let’s use the wine dataset we used in the classification task, with the y labels removed, and see how well the k-means algorithm can identify the wine types from the inputs.
As we are only using the inputs for this model I am splitting the data into test and train using a slightly different method.
np.random.seed(0) msk = np.random.rand(len(X_w)) < 0.8 train_w = X_w[msk] test_w = X_w[~msk]
As Kmeans is reliant on the distance metric to determine the clusters it is usually necessary to perform feature scaling (ensuring that all features have the same scale) before training the model. In the below code I am using the MinMaxScaler to scale the features so that all values fall between 0 and 1.
x = train_w.values min_max_scaler = preprocessing.MinMaxScaler() x_scaled = min_max_scaler.fit_transform(x) X_scaled = pd.DataFrame(x_scaled,columns=train_w.columns)
With K-means you have to specify the number of clusters the algorithm should use. So one of the first steps is to identify the optimum number of clusters. This is achieved by iterating through a number of values of k and plotting the results on a chart. This is known as the Elbow method as it typically produces a plot with a curve that looks a little like the curve of your elbow. The yellowbrick library (which is a great library for visualising scikit-learn models and can be pip installed) has a really nice plot for this. The code below produces this visualisation.
model = KMeans() visualizer = KElbowVisualizer(model, k=(1,8)) visualizer.fit(X_scaled) visualizer.show()
Ordinarily, we wouldn’t already know how many categories we have in a dataset where we are using a clustering technique. However, in this case, we know that there are three wine types in the data — the curve has correctly selected three as the optimum number of clusters to use in the model.
The next step is to initialise the K-means algorithm and fit the model to the training data and evaluate how effectively the algorithm has clustered the data.
One method used for this is known as the silhouette score. This measures the consistency of values within the clusters. Or in other words how similar to each other the values in each cluster are, and how much separation there is between the clusters. The silhouette score is calculated for each value and will range from -1 to +1. These values are then plotted to form a silhouette plot. Again yellowbrick provides a simple way to construct this type of plot. The code below creates this visualisation for the wine dataset.
model = KMeans(3, random_state=42) visualizer = SilhouetteVisualizer(model, colors='yellowbrick') visualizer.fit(X_scaled) visualizer.show()
A silhouette plot can be interpreted in the following way:
The plot for the wine data set above shows that cluster 0 may not be as consistent as the others due to most data points being below the average score and a few data points having a score below 0.
Silhouette scores can be particularly useful in comparing one algorithm against another or different values of k.
In this post, I wanted to give a brief introduction to each of the three types of machine learning. There are many other steps involved in all of these processes including feature engineering, data processing and hyperparameter optimisation to determine both the best data preprocessing techniques and the best models to use.
Thanks for reading!