Classifying Urban Sounds using Deep learning

Classifying Urban Sounds using Deep learning

“Classifying Urban Sounds using Deep learning”, where I demonstrate how to classify different sounds using AI.The following is an overview of the project, outlining the approach, dataset and tools used and also the results.

I recently completed Udacity’s Machine Learning Engineer Nanodegree Capstone Project, titled “Classifying Urban Sounds using Deep learning”, where I demonstrate how to classify different sounds using AI.

The following is an overview of the project, outlining the approach, dataset and tools used and also the results. Full links to all the code, Jupyter notebooks, and report will be posted below.

I’ll be writing a full review of the Udacity experience soon for anyone considering taking the program…. watch this space.


Automatic environmental sound classification is a growing area of research with numerous real world applications. Whilst there is a large body of research in related audio fields such as speech and music, work on the classification of environmental sounds is comparatively scarce.

Likewise, observing the recent advancements in the field of image classification where convolutional neural networks are used to to classify images with high accuracy and at scale, it begs the question of the applicability of these techniques in other domains, such as sound classification.

There is a plethora of real world applications for this research, such as:

• Content-based multimedia indexing and retrieval
• Assisting deaf individuals in their daily activities
• Smart home use cases such as 360-degree safety and security capabilities
• Industrial uses such as predictive maintenance

Problem statement

The following will demonstrate how to apply Deep Learning techniques to the classification of environmental sounds, specifically focusing on the identification of particular urban sounds.

When given an audio sample in a computer readable format (such as a .wav file) of a few seconds duration, we want to be able to determine if it contains one of the target urban sounds with a corresponding Classification Accuracy score.


For this we will use a dataset called Urbansound8K. The dataset contains 8732 sound excerpts (<=4s) of urban sounds from 10 classes, which are:

• Air Conditioner
• Car Horn
• Children Playing
• Dog bark
• Drilling
• Engine Idling
• Gun Shot
• Jackhammer
• Siren
• Street Music

A sample of this dataset is included with the accompanying git repo and the full dataset can be downloaded from here.

Audio file overview

These sound excerpts are digital audio files in .wav format. Sound waves are digitised by sampling them at discrete intervals known as the sampling rate (typically 44.1kHz for CD quality audio meaning samples are taken 44,100 times per second).

Each sample is the amplitude of the wave at a particular time interval, where the bit depth determines how detailed the sample will be also known as the dynamic range of the signal (typically 16bit which means a sample can range from 65,536 amplitude values).

A sound wave, in red, represented digitally, in blue (after sampling and 4-bit quantisation), with the resulting array shown on the right. Original © Aquegg | Wikimedia Commons

The above image shows how a sound excerpt is taken from a waveform and turned into a one dimensional array or vector of amplitude values.

The following is a great Youtube video for explaining the fundamental concepts of digital audio sampling.

Data Exploratory

From a visual inspection we can see that it is tricky to visualise the difference between some of the classes. Particularly, the waveforms for repetitive sounds for air conditioner, drilling, engine idling and jackhammer are similar in shape.

Next, we will do a deeper dive into each of the audio files properties extracting, number of audio channels, sample rate and bit-depth using the following code.

# Load various imports 
import pandas as pd
import os
import librosa
import librosa.display

from helpers.wavfilehelper import WavFileHelper
wavfilehelper = WavFileHelper()

audiodata = []
for index, row in metadata.iterrows():
    file_name = os.path.join(os.path.abspath('/UrbanSound8K/audio/'),'fold'+str(row["fold"])+'/',str(row["slice_file_name"]))
    data = wavfilehelper.read_file_properties(file_name)

# Convert into a Panda dataframe
audiodf = pd.DataFrame(audiodata, columns=['num_channels','sample_rate','bit_depth'])

import struct

class WavFileHelper():
    def read_file_properties(self, filename):

        wave_file = open(filename,"rb")
        riff =
        fmt =
        num_channels_string = fmt[10:12]
        num_channels = struct.unpack('<H', num_channels_string)[0]

        sample_rate_string = fmt[12:16]
        sample_rate = struct.unpack("<I",sample_rate_string)[0]
        bit_depth_string = fmt[22:24]
        bit_depth = struct.unpack("<H",bit_depth_string)[0]

        return (num_channels, sample_rate, bit_depth)

Here we can see that the dataset has a range of varying audio properties that will need standardising before we can use it to train our model.

Audio channels most of the samples have two audio channels (meaning stereo) with a few with just the one channel (mono).

Sample rate there is a wide range of Sample rates that have been used across all the samples which is a concern (ranging from 96kHz to 8kHz).

Bit-depth there is also a range of bit-depths (ranging from 4bit to 32bit).

Data pre-processing

In the previous section we identified the following audio properties that need preprocessing to ensure consistency across the whole dataset:

• Audio Channels
• Sample rate
• Bit-depth

Librosa is a Python package for music and audio processing by Brian McFee and will allow us to load audio in our notebook as a numpy array for analysis and manipulation.

For much of the preprocessing we will be able to use Librosa’s load() function, which by default converts the sampling rate to 22.05 KHz, normalise the data so the bit-depth values range between -1 and 1 and flattens the audio channels into mono.

Extract features

The next step is to extract the features we will need to train our model. To do this, we are going to create a visual representation of each of the audio samples which will allow us to identify features for classification, using the same techniques used to classify images with high accuracy.

Spectrograms are a useful technique for visualising the spectrum of frequencies of a sound and how they vary during a very short period of time. We will be using a similar technique known as Mel-Frequency Cepstral Coefficients (MFCC).

The main difference is that a spectrogram uses a linear spaced frequency scale (so each frequency bin is spaced an equal number of Hertz apart), whereas an MFCC uses a quasi-logarithmic spaced frequencyscale, which is more similar to how the human auditory system processes sounds.

The image below compares three different visual representations of a sound wave, the first being the time domain representation, comparing amplitude over time. The next is a spectrogram showing the energy in different frequency bands changing over time, then finally an MFCC which we can see is very similar to a spectrogrambutwithmore distinguishable detail.

For each audio file in the dataset, we will extract an MFCC (meaning we have an image representation for each audio sample) and store it in a Panda Dataframe along with it’s classification label. For this we will use Librosa’s mfcc() function which generates an MFCC from time series audio data.

def extract_features(file_name):
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
        mfccsscaled = np.mean(mfccs.T,axis=0)
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None 
    return mfccsscaled
# Load various imports 
import pandas as pd
import os
import librosa

# Set the path to the full UrbanSound dataset 
fulldatasetpath = '/Urban Sound/UrbanSound8K/audio/'

metadata = pd.read_csv(fulldatasetpath + '../metadata/UrbanSound8K.csv')

features = []

# Iterate through each sound file and extract the features 
for index, row in metadata.iterrows():
    file_name = os.path.join(os.path.abspath(fulldatasetpath),'fold'+str(row["fold"])+'/',str(row["slice_file_name"]))
    class_label = row["class_name"]
    data = extract_features(file_name)
    features.append([data, class_label])

# Convert into a Panda dataframe 
featuresdf = pd.DataFrame(features, columns=['feature','class_label'])

print('Finished feature extraction from ', len(featuresdf), ' files')
Converting the data and labels then splitting the dataset

We will use sklearn.preprocessing.LabelEncoder to encode the categorical text data into model-understandable numerical data.

Then we will use sklearn.model_selection.train_test_split to split the dataset into training and testing sets. The testing set size will be 20% and we will set a random state.

from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical

# Convert features and corresponding classification labels into numpy arrays
X = np.array(featuresdf.feature.tolist())
y = np.array(featuresdf.class_label.tolist())

# Encode the classification labels
le = LabelEncoder()
yy = to_categorical(le.fit_transform(y)) 

# split the dataset 
from sklearn.model_selection import train_test_split 

x_train, x_test, y_train, y_test = train_test_split(X, yy, test_size=0.2, random_state = 42)
Building our model

The next step will be to build and train a Deep Neural Network with these data sets and make predictions.

Here we will use a Convolutional Neural Network (CNN). CNN’s typically make good classifiers and perform particular well with image classification tasks due to their feature extraction and classification parts. I believe that this will be very effective at finding patterns within the MFCC’s much like they are effective at finding patterns within images.

We will use a sequential model, starting with a simple model architecture, consisting of four Conv2D convolution layers, with our final output layer being a dense layer. Our output layer will have 10 nodes (num_labels) which matches the number of possible classifications.

See the full report for an in-depth breakdown of the chosen layers, we also compare the performance of the CNN with a more traditional MLP.

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 

num_rows = 40
num_columns = 174
num_channels = 1

x_train = x_train.reshape(x_train.shape[0], num_rows, num_columns, num_channels)
x_test = x_test.reshape(x_test.shape[0], num_rows, num_columns, num_channels)

num_labels = yy.shape[1]
filter_size = 2

# Construct model 
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, input_shape=(num_rows, num_columns, num_channels), activation='relu'))

model.add(Conv2D(filters=32, kernel_size=2, activation='relu'))

model.add(Conv2D(filters=64, kernel_size=2, activation='relu'))

model.add(Conv2D(filters=128, kernel_size=2, activation='relu'))

model.add(Dense(num_labels, activation='softmax'))

For compiling our model, we will use the following three parameters:

# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

# Display model architecture summary 

# Calculate pre-training accuracy 
score = model.evaluate(x_test, y_test, verbose=1)
accuracy = 100*score[1]

print("Pre-training accuracy: %.4f%%" % accuracy) 

Here we will train the model. As training a CNN can take a significant amount of time, we will start with a low number of epochs and a low batch size. If we can see from the output that the model is converging, we will increase both numbers.

from keras.callbacks import ModelCheckpoint 
from datetime import datetime 

num_epochs = 72
num_batch_size = 256

checkpointer = ModelCheckpoint(filepath='saved_models/', 
                               verbose=1, save_best_only=True)
start =, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(x_test, y_test), callbacks=[checkpointer], verbose=1)

duration = - start
print("Training completed in time: ", duration)

The following will review the accuracy of the model on both the training and test data sets.

# Evaluating the model on the training and testing set
score = model.evaluate(x_train, y_train, verbose=0)
print("Training Accuracy: ", score[1])

score = model.evaluate(x_test, y_test, verbose=0)
print("Testing Accuracy: ", score[1])

Our trained model obtained a Training accuracy of 98.19% and a Testing accuracy of 91.92%.

The performance is very good and the model has generalised well, seeming to predict well when tested against new audio data.


It was earlier noted in our data exploration, that it is difficult to visualise the difference between some of the classes. In particular, the following sub-groups are similar in shape:

• Repetitive sounds for air conditioner, drilling, engine idling and jackhammer.
• Sharp peaks for dog barking and gun shot.
• Similar pattern for children playing and street music.

Using a confusion matrix we will examine if the final model also struggled to differentiate between these classes.

The Confusion Matrix tells a different story. Here we can see that our model struggles the most with the following sub-groups:

• air conditioner, jackhammer and street music.
• car horn, drilling, and street music.
• air conditioner, children playing and engine idling.
• jackhammer and drilling.
• air conditioner, car horn, children playing and street music.

This shows us that the problem is more nuanced than our initial assessment and gives some in- sights into the features that the CNN is extracting to make its classifications. For example, street music is one of the commonly misclassified classes and could be to a wide variety of different samples within the class.

Next steps

The next logical step is to see how the above can be applied both to Real-time streaming audio and using real world sounds.

Audio in the real world is much more of a ‘messier’ challenge, as we will need to accommodate for different background sounds, different volume levels of the target sound and the likelihood of echos.

Doing this in Real-time also poses its challenges as the model will have to perform well with low-latency, as will the MFCC calculation, all whilst synchronising with the audio buffer thread without any delays.

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

What’s new in HTML6

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

Artificial Intelligence vs. Machine Learning vs. Deep Learning

Artificial Intelligence vs. Machine Learning vs. Deep Learning

Learn the Difference between the most popular Buzzwords in today's tech. World — AI, Machine Learning and Deep Learning

In this article, we are going to discuss we difference between Artificial Intelligence, Machine Learning, and Deep Learning.

Furthermore, we will address the question of why Deep Learning as a young emerging field is far superior to traditional Machine Learning.

Artificial Intelligence, Machine Learning, and Deep Learning are popular buzzwords that everyone seems to use nowadays.

But still, there is a big misconception among many people about the meaning of these terms.

In the worst case, one may think that these terms describe the same thing — which is simply false.

A large number of companies claim nowadays to incorporate some kind of “ Artificial Intelligence” (AI) in their applications or services.

But artificial intelligence is only a broader term that describes applications when a machine mimics “ cognitive “ functions that humans associate with other human minds, such as “learning” and “problem-solving”.

On a lower level, an AI can be only a programmed rule that determines the machine to behave in a certain way in certain situations. So basically Artificial Intelligence can be nothing more than just a bunch of if-else statements.

An if-else statement is a simple rule explicitly programmed by a human. Consider a very abstract, simple example of a robot who is moving on a road. A possible programmed rule for that robot could look as follows:

Instead, when speaking of Artificial Intelligence it's only worthwhile to consider two different approaches: Machine Learning and Deep Learning. Both are subfields of Artificial Intelligence

Machine Learning vs Deep Learning

Now that we now better understand what Artificial Intelligence means we can take a closer look at Machine Learning and Deep Learning and make a clearer distinguishment between these two.

Machine Learning incorporates “ classical” algorithms for various kinds of tasks such as clustering, regression or classification. Machine Learning algorithms must be trained on data. The more data you provide to your algorithm, the better it gets.

The “training” part of a Machine Learning model means that this model tries to optimize along a certain dimension. In other words, the Machine Learning models try to minimize the error between their predictions and the actual ground truth values.

For this we must define a so-called error function, also called a loss-function or an objective function … because after all the model has an objective. This objective could be for example classification of data into different categories (e.g. cat and dog pictures) or prediction of the expected price of a stock in the near future.

When someone says they are working with a machine-learning algorithm, you can get to the gist of its value by asking: What’s the objective function?

At this point, you may ask: How do we minimize the error?

One way would be to compare the prediction of the model with the ground truth value and adjust the parameters of the model in a way so that next time, the error between these two values is smaller. This is repeated again and again and again.

Thousands and millions of times, until the parameters of the model that determine the predictions are so good, that the difference between the predictions of the model and the ground truth labels are as small as possible.

In short machine learning models are optimization algorithms. If you tune them right, they minimize their error by guessing and guessing and guessing again.

Machine Learning is old…

Machine Learning is a pretty old field and incorporates methods and algorithms that have been around for dozens of years, some of them since as early as the sixties.

Some known methods of classification and prediction are the Naive Bayes Classifier and the Support Vector Machines. In addition to the classification, there are also clustering algorithms such as the well-known K-Means and tree-based clustering. To reduce the dimensionality of data to gain more insights about it’ nature methods such as Principal component analysis and tSNE are used.

Deep Learning — The next big Thing

Deep Learning, on the other hand, is a very young field of Artificial Intelligence that is powered by artificial neural networks.

It can be viewed again as a subfield of Machine Learning since Deep Learning algorithms also require data in order to learn to solve tasks. Although methods of Deep Learning are able to perform the same tasks as classic Machine Learning algorithms, it is not the other way round.

Artificial neural networks have unique capabilities that enable Deep Learning models to solve tasks that Machine Learning models could never solve.

All recent advances in intelligence are due to Deep Learning. Without Deep Learning we would not have self-driving cars, chatbots or personal assistants like Alexa and Siri. Google Translate app would remain primitive and Netflix would have no idea which movies or TV series we like or dislike.

We can even go so far as to say that the new industrial revolution is driven by artificial neural networks and Deep Learning. This is the best and closest approach to true machine intelligence we have so far. The reason is that Deep Learning has two major advantages over Machine Learning.

Why is Deep Learning better than Machine Learning?

The first advantage is the needlessness of Feature Extraction. What do I mean by this?

Well if you want to use a Machine Learning model to determine whether a given picture shows a car or not, we as humans, must first program the unique features of a car (shape, size, windows, wheels etc.) into the algorithm. This way the algorithm would know what to look after in the given pictures.

In the case of a Deep Learning model, is step is completely unnecessary. The model would recognize all the unique characteristics of a car by itself and make correct predictions.

In fact, the needlessness of feature extraction applies to any other task for a deep learning model. You simply give the neural network the raw data, the rest is done by the model. While for a machine learning model, you would need to perform additional steps, such as the already mentioned extraction of the features of the given data.

The second huge advantage of Deep Learning and a key part in understanding why it’s becoming so popular is that it’s powered by massive amounts of data. The “Big Data Era” of technology will provide huge amounts of opportunities for new innovations in deep learning. To quote Andrew Ng, the chief scientist of China’s major search engine Baidu and one of the leaders of the Google Brain Project:

The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms.

Deep Learning models tend to increase their accuracy with the increasing amount of training data, where’s traditional machine learning models such as SVM and Naive Bayes classifier stop improving after a saturation point.

How to Perform Face Detection with Deep Learning in Keras

How to Perform Face Detection with Deep Learning in Keras

Perform Face Detection with Deep Learning in Keras .If you’re a regular user of Google Photos, you may have noticed how the application automatically extracts and groups faces of people from the photos that you back up to the cloud.

A photo application such as Google’s achieves this through the detection of faces of humans (and pets too!) in your photos and by then grouping similar faces together. Detection and then classification of faces in images is a common task in deep learning with neural networks.

In the first step of this tutorial, we’ll use a pre-trained MTCNN model in Keras to detect faces in images. Once we’ve extracted the faces from an image, we’ll compute a similarity score between these faces to find if they belong to the same person.


Before you start with detecting and recognizing faces, you need to set up your development environment. First, you need to “read” images through Python before doing any processing on them. We’ll use the plotting library matplotlib to read and manipulate images. Install the latest version through the installer pip:

pip3 install matplotlib

To use any implementation of a CNN algorithm, you need to install keras. Download and install the latest version using the command below:

pip3 install keras

The algorithm that we’ll use for face detection is MTCNN (Multi-Task Convoluted Neural Networks), based on the paper Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (Zhang et al., 2016). An implementation of the MTCNN algorithm for TensorFlow in Python3.4 is available as a package. Run the following command to install the package through pip:

pip3 install mtcnn

To compare faces after extracting them from images, we’ll use the VGGFace2 algorithm developed by the Visual Geometry Group at the University of Oxford. A TensorFlow-based Keras implementation of the VGG algorithm is available as a package for you to install:

pip3 install keras_vggface

While you may feel the need to build and train your own model, you’d need a huge training dataset and vast processing power. Since this tutorial focuses on the utility of these models, it uses existing, trained models by experts in the field.

Now that you’ve successfully installed the prerequisites, let’s jump right into the tutorial!

Step 1: Face Detection with the MTCNN Model

The objectives in this step are as follows:

  • retrieve images hosted externally to a local server
  • read images through matplotlib‘s imread() function
  • detect and explore faces through the MTCNN algorithm
  • extract faces from an image.

1.1 Store External Images

You may often be doing an analysis from images hosted on external servers. For this example, we’ll use two images of Lee Iacocca, the father of the Mustang, hosted on the BBC and The Detroit News sites.

To temporarily store the images locally for our analysis, we’ll retrieve each from its URL and write it to a local file. Let’s define a function store_image for this purpose:

import urllib.request

def store_image(url, local_file_name):
  with urllib.request.urlopen(url) as resource:
    with open(local_file_name, 'wb') as f:

You can now simply call the function with the URL and the local file in which you’d like to store the image:


After successfully retrieving the images, let’s detect faces in them.

1.2 Detect Faces in an Image

For this purpose, we’ll make two imports — matplotlib for reading images, and mtcnn for detecting faces within the images:

from matplotlib import pyplot as plt
from mtcnn.mtcnn import MTCNN

Use the imread() function to read an image:

image = plt.imread('iacocca_1.jpg')

Next, initialize an MTCNN() object into the detector variable and use the .detect_faces() method to detect the faces in an image. Let’s see what it returns:

detector = MTCNN()

faces = detector.detect_faces(image)
for face in faces:

For every face, a Python dictionary is returned, which contains three keys. The box key contains the boundary of the face within the image. It has four values: x- and y- coordinates of the top left vertex, width, and height of the rectangle containing the face. The other keys are confidence and keypoints. The keypoints key contains a dictionary containing the features of a face that were detected, along with their coordinates:

{'box': [160, 40, 35, 44], 'confidence': 0.9999798536300659, 'keypoints': {'left_eye': (172, 57), 'right_eye': (188, 57), 'nose': (182, 64), 'mouth_left': (173, 73), 'mouth_right': (187, 73)}}

1.3 Highlight Faces in an Image

Now that we’ve successfully detected a face, let’s draw a rectangle over it to highlight the face within the image to verify if the detection was correct.

To draw a rectangle, import the Rectangle object from matplotlib.patches:

from matplotlib.patches import Rectangle

Let’s define a function highlight_faces to first display the image and then draw rectangles over faces that were detected. First, read the image through imread() and plot it through imshow(). For each face that was detected, draw a rectangle using the Rectangle() class.

Finally, display the image and the rectangles using the .show() method. If you’re using Jupyter notebooks, you may use the %matplotlib inline magic command to show plots inline:

def highlight_faces(image_path, faces):
  # display image
    image = plt.imread(image_path)

    ax = plt.gca()

    # for each face, draw a rectangle based on coordinates
    for face in faces:
        x, y, width, height = face['box']
        face_border = Rectangle((x, y), width, height,
                          fill=False, color='red')

Let’s now display the image and the detected face using the highlight_faces() function:

highlight_faces('iacocca_1.jpg', faces)

![Detected face in an image of Lee Iacocca]( face in an image of Lee Iacocca. Source: [BBC](

Let’s display the second image and the face(s) detected in it:

image = plt.imread('iacocca_2.jpg')
faces = detector.detect_faces(image)

highlight_faces('iacocca_2.jpg', faces)

In these two images, you can see that the MTCNN algorithm correctly detects faces. Let’s now extract this face from the image to perform further analysis on it.

1.4 Extract Face for Further Analysis

At this point, you know the coordinates of the faces from the detector. Extracting the faces is a fairly easy task using list indices. However, the VGGFace2 algorithm that we use needs the faces to be resized to 224 x 224 pixels. We’ll use the PIL library to resize the images.

The function extract_face_from_image() extracts all faces from an image:

from numpy import asarray
from PIL import Image

def extract_face_from_image(image_path, required_size=(224, 224)):
  # load image and detect faces
    image = plt.imread(image_path)
    detector = MTCNN()
    faces = detector.detect_faces(image)

    face_images = []

    for face in faces:
        # extract the bounding box from the requested face
        x1, y1, width, height = face['box']
        x2, y2 = x1 + width, y1 + height

        # extract the face
        face_boundary = image[y1:y2, x1:x2]

        # resize pixels to the model size
        face_image = Image.fromarray(face_boundary)
        face_image = face_image.resize(required_size)
        face_array = asarray(face_image)

    return face_images

extracted_face = extract_face_from_image('iacocca_1.jpg')

# Display the first face from the extracted faces

Here is how the extracted face looks from the first image.

Step 2: Face Recognition with VGGFace2 Model

In this section, let’s first test the model on the two images of Lee Iacocca that we’ve retrieved. Then, we’ll move on to compare faces from images of the starting eleven of the Chelsea football team in 2018 and 2019. You’ll then be able to assess if the algorithm identifies faces of common players between the images.

2.1 Compare Two Faces

In this section, you need to import three modules: VGGFace to prepare the extracted faces to be used in the face recognition models, and the cosine function from SciPy to compute the distance between two faces:

from keras_vggface.utils import preprocess_input
from keras_vggface.vggface import VGGFace
from scipy.spatial.distance import cosine

Let’s define a function that takes the extracted faces as inputs and returns the computed model scores. The model returns a vector, which represents the features of a face:

def get_model_scores(faces):
    samples = asarray(faces, 'float32')

    # prepare the data for the model
    samples = preprocess_input(samples, version=2)

    # create a vggface model object
    model = VGGFace(model='resnet50',
      input_shape=(224, 224, 3),

    # perform prediction
    return model.predict(samples)

faces = [extract_face_from_image(image_path)
         for image_path in ['iacocca_1.jpg', 'iacocca_2.jpg']]

model_scores = get_model_scores(faces)

Since the model scores for each face are vectors, we need to find the similarity between the scores of two faces. We can typically use a Euclidean or Cosine function to calculate the similarity.

Vector representation of faces are suited to the cosine similarity. Here’s a detailed comparison between cosine and Euclidean distances with an example.

The cosine() function computes the cosine distance between two vectors. The lower this number, the better match your faces are. In our case, we’ll put the threshold at 0.4 distance. This threshold is debatable and would vary with your use case. You should set this threshold based on case studies on your dataset:

if cosine(model_scores[0], model_scores[1]) <= 0.4:
  print("Faces Matched")

In this case, the two faces of Lee Iacocca matched.

Faces Matched

2.2 Compare Multiple Faces in Two Images

Let’s put the model to good use in this section of the tutorial. We’ll compare the faces in two images of starting elevens of the Chelsea Football Club in a Europa League match vs Slavia Prague in the 2018–19 season and the UEFA Super Cup match vs Liverpool in the 2019–20 season. While many of the players feature in both match day squads, let’s see if the algorithm is able to detect all common players.

First, let’s retrieve the resources from the URLs, detect the faces in each image and highlight them:


image = plt.imread('chelsea_1.jpg')
faces_staring_xi = detector.detect_faces(image)

highlight_faces('chelsea_1.jpg', faces_staring_xi)


image = plt.imread('chelsea_2.jpg')
faces = detector.detect_faces(image)

highlight_faces('chelsea_2.jpg', faces)

Before we proceed further, here are the starting elevens from both matches:

We have eight players who are common to both starting XIs and who ideally should be matched by the algorithm.

Let’s first compute scores:

slavia_faces = extract_face_from_image('chelsea_1.jpg')
liverpool_faces = extract_face_from_image('chelsea_2.jpg')

model_scores_starting_xi_slavia = get_model_scores(slavia_faces)
model_scores_starting_xi_liverpool = get_model_scores(liverpool_faces)
for idx, face_score_1 in enumerate(model_scores_starting_xi_slavia):
  for idy, face_score_2 in enumerate(model_scores_starting_xi_liverpool):
    score = cosine(face_score_1, face_score_2)
    if score <= 0.4:
      # Printing the IDs of faces and score
      print(idx, idy, score)
      # Displaying each matched pair of faces

Here’s the list of pairs of faces that the algorithm matched. Notice that it has been able to match all eight pairs of faces.

While we were successfully able to match each face in our images, I’d like to take a step back to discuss ramifications of the scores. As discussed earlier, there’s no universal threshold that would match two images together. You may need to re-define these thresholds with new data coming into the analysis. For instance, even Google Photos takes your inputs when it’s unable to programmatically determine the best threshold for a pair.

The best way forward is to carefully assess cases when matching different types of faces. Emotions of the faces and their angles play a role in determining the precision too. In our use case, notice how I have deliberately used photos of starting elevens as players are staring right into the camera! You can try matching the starting eleven faces with those of a trophy celebration and I’m pretty sure the accuracy would drop.


In this tutorial, we first detected faces in images using the MTCNN model and highlighted them in the images to determine if the model worked correctly. Next, we used the VGGFace2 algorithm to extract features from faces in the form of a vector and matched different faces to group them together.

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Downloadable PDF of Best AI Cheat Sheets in Super High Definition

Let’s begin.

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Data Science in HD

Part 1: Neural Networks Cheat Sheets

Neural Networks Cheat Sheets

Neural Networks Basics

Neural Networks Basics Cheat Sheet

An Artificial Neuron Network (ANN), popularly known as Neural Network is a computational model based on the structure and functions of biological neural networks. It is like an artificial human nervous system for receiving, processing, and transmitting information in terms of Computer Science.

Basically, there are 3 different layers in a neural network :

  1. Input Layer (All the inputs are fed in the model through this layer)
  2. Hidden Layers (There can be more than one hidden layers which are used for processing the inputs received from the input layers)
  3. Output Layer (The data after processing is made available at the output layer)

Neural Networks Graphs

Neural Networks Graphs Cheat Sheet

Graph data can be used with a lot of learning tasks contain a lot rich relation data among elements. For example, modeling physics system, predicting protein interface, and classifying diseases require that a model learns from graph inputs. Graph reasoning models can also be used for learning from non-structural data like texts and images and reasoning on extracted structures.

Part 2: Machine Learning Cheat Sheets

Machine Learning Cheat Sheets

>>> If you like these cheat sheets, you can let me know here.<<<

Machine Learning with Emojis

Machine Learning with Emojis Cheat Sheet

Machine Learning: Scikit Learn Cheat Sheet

Scikit Learn Cheat Sheet

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines is a simple and efficient tools for data mining and data analysis. It’s built on NumPy, SciPy, and matplotlib an open source, commercially usable — BSD license

Scikit-learn Algorithm Cheat Sheet

Scikit-learn algorithm

This machine learning cheat sheet will help you find the right estimator for the job which is the most difficult part. The flowchart will help you check the documentation and rough guide of each estimator that will help you to know more about the problems and how to solve it.

If you like these cheat sheets, you can let me know here.### Machine Learning: Scikit-Learn Algorythm for Azure Machine Learning Studios

Scikit-Learn Algorithm for Azure Machine Learning Studios Cheat Sheet

Part 3: Data Science with Python

Data Science with Python Cheat Sheets

Data Science: TensorFlow Cheat Sheet

TensorFlow Cheat Sheet

TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.

If you like these cheat sheets, you can let me know here.### Data Science: Python Basics Cheat Sheet

Python Basics Cheat Sheet

Python is one of the most popular data science tool due to its low and gradual learning curve and the fact that it is a fully fledged programming language.

Data Science: PySpark RDD Basics Cheat Sheet

PySpark RDD Basics Cheat Sheet

“At a high level, every Spark application consists of a driver program that runs the user’s main function and executes various parallel operations on a cluster. The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing Scala collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures.” via Spark.Aparche.Org

Data Science: NumPy Basics Cheat Sheet

NumPy Basics Cheat Sheet

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

***If you like these cheat sheets, you can let me know ***here.

Data Science: Bokeh Cheat Sheet

Bokeh Cheat Sheet

“Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.” from

Data Science: Karas Cheat Sheet

Karas Cheat Sheet

Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.

Data Science: Padas Basics Cheat Sheet

Padas Basics Cheat Sheet

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license.

If you like these cheat sheets, you can let me know here.### Pandas Cheat Sheet: Data Wrangling in Python

Pandas Cheat Sheet: Data Wrangling in Python

Data Wrangling

The term “data wrangler” is starting to infiltrate pop culture. In the 2017 movie Kong: Skull Island, one of the characters, played by actor Marc Evan Jackson is introduced as “Steve Woodward, our data wrangler”.

Data Science: Data Wrangling with Pandas Cheat Sheet

Data Wrangling with Pandas Cheat Sheet

“Why Use tidyr & dplyr

  • Although many fundamental data processing functions exist in R, they have been a bit convoluted to date and have lacked consistent coding and the ability to easily flow together → leads to difficult-to-read nested functions and/or choppy code.
  • R Studio is driving a lot of new packages to collate data management tasks and better integrate them with other analysis activities → led by Hadley Wickham & the R Studio teamGarrett Grolemund, Winston Chang, Yihui Xie among others.
  • As a result, a lot of data processing tasks are becoming packaged in more cohesive and consistent ways → leads to:
  • More efficient code
  • Easier to remember syntax
  • Easier to read syntax” via Rstudios

Data Science: Data Wrangling with ddyr and tidyr

Data Wrangling with ddyr and tidyr Cheat Sheet

If you like these cheat sheets, you can let me know here.### Data Science: Scipy Linear Algebra

Scipy Linear Algebra Cheat Sheet

SciPy builds on the NumPy array object and is part of the NumPy stack which includes tools like Matplotlib, pandas and SymPy, and an expanding set of scientific computing libraries. This NumPy stack has similar users to other applications such as MATLAB, GNU Octave, and Scilab. The NumPy stack is also sometimes referred to as the SciPy stack.[3]

Data Science: Matplotlib Cheat Sheet

Matplotlib Cheat Sheet

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented APIfor embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is also a procedural “pylab” interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of matplotlib.

Pyplot is a matplotlib module which provides a MATLAB-like interface matplotlib is designed to be as usable as MATLAB, with the ability to use Python, with the advantage that it is free.

Data Science: Data Visualization with ggplot2 Cheat Sheet

Data Visualization with ggplot2 Cheat Sheet

>>> If you like these cheat sheets, you can let me know here. <<<

Data Science: Big-O Cheat Sheet

Big-O Cheat Sheet


Special thanks to DataCamp, Asimov Institute, RStudios and the open source community for their content contributions. You can see originals here:

Big-O Algorithm Cheat Sheet:

Bokeh Cheat Sheet:

Data Science Cheat Sheet:

Data Wrangling Cheat Sheet:

Data Wrangling:

Ggplot Cheat Sheet:

Keras Cheat Sheet:


Machine Learning Cheat Sheet:

Machine Learning Cheat Sheet:

ML Cheat Sheet::

Matplotlib Cheat Sheet:


Neural Networks Cheat Sheet:

Neural Networks Graph Cheat Sheet:

Neural Networks:

Numpy Cheat Sheet:


Pandas Cheat Sheet:


Pandas Cheat Sheet:

Pyspark Cheat Sheet:

Scikit Cheat Sheet:


Scikit-learn Cheat Sheet:

Scipy Cheat Sheet:


TesorFlow Cheat Sheet:

Tensor Flow: