The Machine Learning Crash Course โ Part 2: Linear Regression - Welcome back to the second part of the Machine Learning Crash Course...๐๐๐๐๐
Welcome back to the second part of the Machine Learning Crash Course...๐๐๐๐๐
In the first part weโve covered the basic terminologies of Machine Learning and have taken a first look at Colab โ a Python-based development environment which is great for solving Machine Learning exercises with Python and TensorFlow.
In this second part weโll move on and start with the first practical machine learning scenario which is solving a simple linear regression problem. First, letโs clarify what linear regression is in general.
The first Machine Learning exercise weโre going to solve is a simple linear regression task. Linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables. If only one independent variable is used weโre talking about a simple linear regression. A simple linear regression is what weโll be using for the Machine Learning exercise in this tutorial:
y = 2x + 30
In this example x is the independant variable and y is the dependant variable. For every input value of x the corresponding output value y can be determined.
To get started letโs create a new Python 3 Colab notebook first. Go to https://colab.research.google.com login with your Google account and create a new notebook which is initially empty.
As the first step we need to make sure to import needed libraries. Weโll use TensorFlow, NumPy and Matplotlib. Insert the following lines of code in code cell:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
The first line of imports is just added because of compatibility reasons and can be ignored. The import statements for TensorFlow, NumPy and Matplotlib are working out of the box because all three libraries are preinstalled in the Colab environment.
Having imported the needed libraries the next step is to prepare the training data which should be used to train our model. Insert another code cell and insert the following Python code:
values_x = np.array([-10, 0, 2, 6, 12, 15], dtype=float)
values_y = np.array([10, 30, 34, 42, 54, 60], dtype=float)
for i,x in enumerate(values_x):
print("X: {} Y: {}".format(x, values_y[i]))
Two NumPy arrays are initialised here. The first array (values_x) is containing the x values of our linear regression. This is the independent variable of y = 2x + 30. For each of the x values in the first array the second array (values_y) contains the corresponding y value.
By using a for-loop the value pairs are printed out:
If you like you can also use Matplotlib to visualise the the linear regression function as a graph:
x = np.linspace(-10,10,100)
plt.title('Graph of y=2x+30')
plt.plot(x, x*2+30);
Next, weโre ready to create the model (neural network) which we need to solve our linear regression task. Insert the following code into the notebook:
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
model.compile(loss='mean_squared_error',
optimizer=tf.keras.optimizers.Adam(0.1))
Here weโre using the TensorFlow-integrated Keras API to create out neural network. In order to create a new sequential model the tf.keras.Sequential method is used.
Note:
Keras is a high-level interface for neural networks that runs on top of different back-ends. Its API is user-friendly, yet flexible enough to build all kinds of applications. Keras quickly gained traction after its introduction and in 2017, the Keras API was integrated into core Tensorflow as tf.keras
In Keras, you assemble layers to build models. A model is (usually) a graph of layers. The most common type of model is a stack of layers: the* tf.keras.Sequential* model.
The call of the Sequential method is expecting to get an array (stack) of layers. In our case it is just one layer of type Dense. A Dense layer can be seen as a linear operation in which every input is connected to every output by a weight and a bias. The number of inputs is specified by the first parameter units. The number of neurons in the layer is determined by the value of the parameter input_shape.
In our case we only need one input element because for the linear regression problem weโre trying to solve by the neural network weโve only defined one dependant variable (x). Furthermore the Dense layer is setup in the most simple way: it consists of just one neuron:
With that simple neural network defined itโs easy to take a look at some of the insights to further understand how the neurons work. Each neuron has a specific weight which is adapted when the training is performed. The weight of every neuron in the fully connected Dense layer is multiplied with each input variable. As we only have defined one input variable (x) this input is multiplied with the weight w1 of the first and only neuron of the defined Dense layer. Furthermore, for each Dense layer a bias (b1) is added to the formula:
Now we can see why it is sufficient to only ad a Dense layer with just one neuron to solve our simple linear regression problem. By training the model the weight of the neuron will approach a value of 2 and the bias will approach a value of 30. The trained neuron will then be able to provide the output for inputs of x.
Having added the Dense layer to the sequential model we finally need to compile the model in order to make it usable for training and prediction in the next step.
The compilation of the model is done by executing the method model.compile:
model.compile(loss='mean_squared_error',
optimizer=tf.keras.optimizers.Adam(0.1))
Here we need to specify which loss function and which type of optimizer to use.
Loss function:
Loss functions are a central concept in machine learning. By using loss functions the machine learning algorithm is able to measure how much a prediction deviates from the actual result. Based on that determination the machine algorithm knows if the prediction results are getting better or worse.
The mean squred error is a specific loss function which is suitable to train a model for a linear regession problem.
As the name suggests, Mean square error is measured as the average of squared difference between predictions and actual observations. Due to squaring, predictions which are far away from actual values are penalized heavily in comparison to less deviated predictions.
Optimizer:
Based on the outcome which is calculated by the loss function the optimizer is used to determine the learning rate which is applied for the parameters in the model (weights and biases).
In our example weโre making use of the Adam optimizer which is great for linear regression tasks.
The model is ready and the next thing we need to do is to train the model with the test data. This is being done by using the model.fit method:
history = model.fit(values_x, values_y, epochs=500, verbose=False)
As the first and the second argument weโre passing in the test values which are available in arrays values_x and values_y. The third argument is the number of epochs which will be used for training. An epoch is an iteration over the entire x
and y
data provided. In our example weโre using 500 iterations over the test data set to train the model.
After executing the training of the model letโs take a look inside the development of the loss over all 500 epochs. This can be printed out as a diagram by using the following three lines of code:
plt.xlabel("Epoch Number")
plt.ylabel("Loss Magnidute")
plt.plot(history.history['loss'])
The result should be a diagram that looks like the following:
Here you can see that the loss gets better and better from epoch to epoch. Over the 500 epochs used for training weโre able to see that the loss magnitude is approaching zero which shows that the model is able to predict values with a high accuracy.
Now that the model is fully trained letโs try to perform a prediction by calling function model.predict.
print(model.predict([20.0]))
The argument which is passed into the predict method is an array containing the *x *value for which the corresponding *y *value should be determined. The expected result should be somewhere near 70 (because of y=2x+30) . The output can be seen in the following:
Here weโre getting returned the value 70.05354 which is pretty close to 70.0, so that our model is working as expected.
Weโre able to get more model insights by taking a look at the weight and the bias which is determined for the first layer:
print("These are the layer variables: {}".format(model.layers[0].get_weights()))
As expected weโre getting returned two parameter for our first and only layer in the model:
The two parameters corresponds to the two variables we have in the model:
For the weight the value which is determined is near the target value of 2 and for the bias the value which is determined is near the target value of 30 (according to our linear regression formula: y = 2x + 30).
This complete Machine Learning full course video covers all the topics that you need to know to become a master in the field of Machine Learning.
Machine Learning Full Course | Learn Machine Learning | Machine Learning Tutorial
It covers all the basics of Machine Learning (01:46), the different types of Machine Learning (18:32), and the various applications of Machine Learning used in different industries (04:54:48).This video will help you learn different Machine Learning algorithms in Python. Linear Regression, Logistic Regression (23:38), K Means Clustering (01:26:20), Decision Tree (02:15:15), and Support Vector Machines (03:48:31) are some of the important algorithms you will understand with a hands-on demo. Finally, you will see the essential skills required to become a Machine Learning Engineer (04:59:46) and come across a few important Machine Learning interview questions (05:09:03). Now, let's get started with Machine Learning.
Below topics are explained in this Machine Learning course for beginners:
Basics of Machine Learning - 01:46
Why Machine Learning - 09:18
What is Machine Learning - 13:25
Types of Machine Learning - 18:32
Supervised Learning - 18:44
Reinforcement Learning - 21:06
Supervised VS Unsupervised - 22:26
Linear Regression - 23:38
Introduction to Machine Learning - 25:08
Application of Linear Regression - 26:40
Understanding Linear Regression - 27:19
Regression Equation - 28:00
Multiple Linear Regression - 35:57
Logistic Regression - 55:45
What is Logistic Regression - 56:04
What is Linear Regression - 59:35
Comparing Linear & Logistic Regression - 01:05:28
What is K-Means Clustering - 01:26:20
How does K-Means Clustering work - 01:38:00
What is Decision Tree - 02:15:15
How does Decision Tree work - 02:25:15ย
Random Forest Tutorial - 02:39:56
Why Random Forest - 02:41:52
What is Random Forest - 02:43:21
How does Decision Tree work- 02:52:02
K-Nearest Neighbors Algorithm Tutorial - 03:22:02
Why KNN - 03:24:11
What is KNN - 03:24:24
How do we choose 'K' - 03:25:38
When do we use KNN - 03:27:37
Applications of Support Vector Machine - 03:48:31
Why Support Vector Machine - 03:48:55
What Support Vector Machine - 03:50:34
Advantages of Support Vector Machine - 03:54:54
What is Naive Bayes - 04:13:06
Where is Naive Bayes used - 04:17:45
Top 10 Application of Machine Learning - 04:54:48
How to become a Machine Learning Engineerย - 04:59:46
Machine Learning Interview Questions - 05:09:03
Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering.
Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering.
In the following article, I am going to give a brief introduction to each of these three problems and will include a walkthrough in the popular python library scikit-learn.
Before I start Iโll give a brief explanation for the meaning behind the terms supervised and unsupervised learning.
Supervised Learning: In supervised learning, you have a known set of inputs (features) and a known set of outputs (labels). Traditionally these are known as X and y. The goal of the algorithm is to learn the mapping function that maps the input to the output. So that when given new examples of X the machine can correctly predict the corresponding y labels.
Unsupervised Learning: In unsupervised learning, you only have a set of inputs (X) and no corresponding labels (y). The goal of the algorithm is to find previously unknown patterns in the data. Quite often these algorithms are used to find meaningful clusters of similar samples of X so in effect finding the categories intrinsic to the data.
In classification, the outputs (y) are categories. These can be binary, for example, if we were classifying spam email vs not spam email. They can also be multiple categories such as classifying species of flowers, this is known as multiclass classification.
Letโs walk through a simple example of classification using scikit-learn. If you donโt already have this installed it can be installed either via pip or conda as outlined here.
Scikit-learn has a number of datasets that can be directly accessed via the library. For ease in this article, I will be using these example datasets throughout. To illustrate classification I will use the wine dataset which is a multiclass classification problem. In the dataset, the inputs (X) consist of 13 features relating to various properties of each wine type. The known outputs (y) are wine types which in the dataset have been given a number 0, 1 or 2.
The imports I am using for all the code in this article are shown below.
import pandas as pd
import numpy as np
from sklearn.datasets import load_wine
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import f1_score
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC, LinearSVC, NuSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn import linear_model
from sklearn.linear_model import ElasticNetCV
from sklearn.svm import SVR
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer
from yellowbrick.cluster import SilhouetteVisualizer
In the below code I am downloading the data and converting to a pandas data frame.
wine = load_wine()
wine_df = pd.DataFrame(wine.data, columns=wine.feature_names)
wine_df['TARGET'] = pd.Series(wine.target)
The next stage in a supervised learning problem is to split the data into test and train sets. The train set can be used by the algorithm to learn the mapping between inputs and outputs, and then the reserved test set can be used to evaluate how well the model has learned this mapping. In the below code I am using the scikit-learn model_selection function train_test_split
to do this.
X_w = wine_df.drop(['TARGET'], axis=1)
y_w = wine_df['TARGET']
X_train_w, X_test_w, y_train_w, y_test_w = train_test_split(X_w, y_w, test_size=0.2)
In the next step, we need to choose the algorithm that will be best suited to learn the mapping in your chosen dataset. In scikit-learn there are many different algorithms to choose from, all of which use different functions and methods to learn the mapping, you can view the full list here.
To determine the best model I am running the following code. I am training the model using a selection of algorithms and obtaining the F1-score for each one. The F1 score is a good indicator of the overall accuracy of a classifier. I have written a detailed description of the various metrics that can be used to evaluate a classifier here.
classifiers = [
KNeighborsClassifier(3),
SVC(kernel="rbf", C=0.025, probability=True),
NuSVC(probability=True),
DecisionTreeClassifier(),
RandomForestClassifier(),
AdaBoostClassifier(),
GradientBoostingClassifier()
]
for classifier in classifiers:
model = classifier
model.fit(X_train_w, y_train_w)
y_pred_w = model.predict(X_test_w)
print(classifier)
print("model score: %.3f" % f1_score(y_test_w, y_pred_w, average='weighted'))
A perfect F1 score would be 1.0, therefore, the closer the number is to 1.0 the better the model performance. The results above suggest that the Random Forest Classifier is the best model for this dataset.
In regression, the outputs (y) are continuous values rather than categories. An example of regression would be predicting how many sales a store may make next month, or what the future price of your house might be.
Again to illustrate regression I will use a dataset from scikit-learn known as the boston housing dataset. This consists of 13 features (X) which are various properties of a house such as the number of rooms, the age and crime rate for the location. The output (y) is the price of the house.
I am loading the data using the code below and splitting it into test and train sets using the same method I used for the wine dataset.
boston = load_boston()
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
boston_df['TARGET'] = pd.Series(boston.target)
X_b = boston_df.drop(['TARGET'], axis=1)
y_b = boston_df['TARGET']
X_train_b, X_test_b, y_train_b, y_test_b = train_test_split(X_b, y_b, test_size=0.2)
We can use this cheat sheet to see the available algorithms suited to regression problems in scikit-learn. We will use similar code to the classification problem to loop through a selection and print out the scores for each.
There are a number of different metrics used to evaluate regression models. These are all essentially error metrics and measure the difference between the actual and predicted values achieved by the model. I have used the root mean squared error (RMSE). For this metric, the closer to zero the value is the better the performance of the model. This article gives a really good explanation of error metrics for regression problems.
regressors = [
linear_model.Lasso(alpha=0.1),
linear_model.LinearRegression(),
ElasticNetCV(alphas=None, copy_X=True, cv=5, eps=0.001, fit_intercept=True,
l1_ratio=0.5, max_iter=1000, n_alphas=100, n_jobs=None,
normalize=False, positive=False, precompute='auto', random_state=0,
selection='cyclic', tol=0.0001, verbose=0),
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
tol=0.001, verbose=False),
linear_model.Ridge(alpha=.5)
]
for regressor in regressors:
model = regressor
model.fit(X_train_b, y_train_b)
y_pred_b = model.predict(X_test_b)
print(regressor)
print("mean squared error: %.3f" % sqrt(mean_squared_error(y_test_b, y_pred_b)))
The RMSE score suggests that either the linear regression and ridge regression algorithms perform best for this dataset.
There are a number of different types of unsupervised learning but for simplicity here I am going to focus on the clustering methods. There are many different algorithms for clustering all of which use slightly different techniques to find clusters of inputs.
Probably one of the most widely used methods is Kmeans. This algorithm performs an iterative process whereby a specified number of randomly generated means are initiated. A distance metric, Euclidean distance is calculated for each data point from the centroids, thus creating clusters of similar values. The centroid of each cluster then becomes the new mean and this process is repeated until the optimum result has been achieved.
Letโs use the wine dataset we used in the classification task, with the y labels removed, and see how well the k-means algorithm can identify the wine types from the inputs.
As we are only using the inputs for this model I am splitting the data into test and train using a slightly different method.
np.random.seed(0)
msk = np.random.rand(len(X_w)) < 0.8
train_w = X_w[msk]
test_w = X_w[~msk]
As Kmeans is reliant on the distance metric to determine the clusters it is usually necessary to perform feature scaling (ensuring that all features have the same scale) before training the model. In the below code I am using the MinMaxScaler to scale the features so that all values fall between 0 and 1.
x = train_w.values
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
X_scaled = pd.DataFrame(x_scaled,columns=train_w.columns)
With K-means you have to specify the number of clusters the algorithm should use. So one of the first steps is to identify the optimum number of clusters. This is achieved by iterating through a number of values of k and plotting the results on a chart. This is known as the Elbow method as it typically produces a plot with a curve that looks a little like the curve of your elbow. The yellowbrick library (which is a great library for visualising scikit-learn models and can be pip installed) has a really nice plot for this. The code below produces this visualisation.
model = KMeans()
visualizer = KElbowVisualizer(model, k=(1,8))
visualizer.fit(X_scaled)
visualizer.show()
Ordinarily, we wouldnโt already know how many categories we have in a dataset where we are using a clustering technique. However, in this case, we know that there are three wine types in the data โ the curve has correctly selected three as the optimum number of clusters to use in the model.
The next step is to initialise the K-means algorithm and fit the model to the training data and evaluate how effectively the algorithm has clustered the data.
One method used for this is known as the silhouette score. This measures the consistency of values within the clusters. Or in other words how similar to each other the values in each cluster are, and how much separation there is between the clusters. The silhouette score is calculated for each value and will range from -1 to +1. These values are then plotted to form a silhouette plot. Again yellowbrick provides a simple way to construct this type of plot. The code below creates this visualisation for the wine dataset.
model = KMeans(3, random_state=42)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')
visualizer.fit(X_scaled)
visualizer.show()
A silhouette plot can be interpreted in the following way:
The plot for the wine data set above shows that cluster 0 may not be as consistent as the others due to most data points being below the average score and a few data points having a score below 0.
Silhouette scores can be particularly useful in comparing one algorithm against another or different values of k.
In this post, I wanted to give a brief introduction to each of the three types of machine learning. There are many other steps involved in all of these processes including feature engineering, data processing and hyperparameter optimisation to determine both the best data preprocessing techniques and the best models to use.
Thanks for reading!