How to classify butterflies with deep learning in Keras

How to classify butterflies with deep learning in Keras

A while ago I read an interesting blog post on the website of the Dutch organization Vlinderstichting. Every year they organize a count of butterflies. Volunteers help in determining the different butterfly species in their garden. The Vlinderstichting gathers and analyses the results.

As the determination of the butterfly species is done by the volunteers, inevitably this process is prone to errors. As a result, the Vlinderstichting has*** ***to manually check the submissions, which is time-consuming.

Specifically, there are three butterflies for which the Vlinderstichting receives many wrong determinations. These are

In this article, I will describe the steps to fit a deep learning model that helps to make the distinction between the first two butterflies.

Downloading images with the Flickr API

To train a convolutional neural network I need to find images of butterflies with the correct label. Surely I could take pictures myself of the butterflies that I want to classify. They sometimes fly around in my garden…

Just kidding, that would take ages. For this, I need an automated way to get the images. To do that I use the Flickr API via Python.

Setting up the Flickr API

Firstly, I install the flickrapi package with pip. Then I create the necessary API keys on the Flickr website to connect to the Flickr API.

Besides the flickrapi package, I import the os and urllib packages for downloading the images and setting up the directories.

from flickrapi import FlickrAPI
import urllib
import os
import config

In the config module, I define the public and secret keys for the Flickr API. So this is simply a Python script ( with the code below:

API_KEY = 'XXXXXXXXXXXXXXXXX'  // replace with your key
API_SECRET = 'XXXXXXXXXXXXXXXXX'  // replace with your secret
IMG_FOLDER = 'XXXXXXXXXXXXXXXXX'  // replace with your folder to store the images

I keep these keys in a separate file for security reasons. As a result, you can save the code in a public repository like GitHub or BitBucket and putting the in .gitignore. Consequently, you can share your code with others while not having to worry about someone having access to your credentials.

To extract images of different butterfly species, I wrote a function download_flickr_photos. I will explain this function step by step. In addition, I’ve made the full code available on GitHub.

Input parameters

First of all, I check if the input parameters are of the correct type or values. If not, I raise an error. The explanation of the parameters can be found in the docstring of the function.

if not (isinstance(keywords, str) or isinstance(keywords, list)):
    raise AttributeError('keywords must be a string or a list of strings')
if not (size in ['thumbnail', 'square', 'medium', 'original']):
    raise AttributeError('size must be "thumbnail", "square", "medium" or "original"')
if not (max_nb_img == -1 or (max_nb_img > 0 and isinstance(max_nb_img, int))):
    raise AttributeError('max_nb_img must be an integer greater than zero or equal to -1')

Secondly, I define some of the parameters that will be used in the walk method later on. I create a list for the keywords and determine from which URL the images need to be downloaded.

if isinstance(keywords, str):
    keywords_list = []
    keywords_list = keywords
if size == 'thumbnail':
    size_url = 'url_t'
elif size == 'square':
    size_url = 'url_q'
elif size == 'medium':
    size_url = 'url_c'
elif size == 'original':
    size_url = 'url_o'

Connecting to the Flickr API

Next, I connect to the Flickr API. In the FlickrAPI call I use the API keys defined in the config module.

flickr = FlickrAPI(config.API_KEY, config.API_SECRET)

Creating subfolders per butterfly species

I save the images of each butterfly species in a separate subfolder. The name of each subfolder is the butterfly species’ name, given by the keyword. If the subfolder does not exist yet, I create it.

results_folder = config.IMG_FOLDER + keyword.replace(" ", "_") + "/"
if not os.path.exists(results_folder):

Walking around in the Flickr library
photos = flickr.walk(

I use the walk method of the Flickr API to search for images for the specified keyword. This walk method has the same parameters as the search method in the Flickr API.

In the text parameter***,**** I use the keyword to search for images related to this keyword. Secondly, in the extras parameter**,*** I specify url_m for a small, medium size of the images. More explanation on the image sizes and their respective URL is given in this Flickcurl C library.

Thirdly, in the license parameter***,*** I select images with a non-commercial license. More on the license codes and their meaning can be found on the Flickr API platform. Finally, the per_page parameter specifies how many images I allow per page.

As a result, I have a generator called photos to download the images.

Downloading Flickr images

With the photos generator, I can download all the images found for the search query. First I get the specific URL at which I will download the image. Then I increment the count variable and use this counter to create the image filenames.

With the urlretrieve method, I download the image and save it in the folder for the butterfly species. If an error occurs I print out the error message.

for photo in photos:
        count += 1
        urllib.request.urlretrieve(url,  results_folder + str(count) +".jpg")
    except Exception as e:
        print(e, 'Download failure')

To download multiple butterfly species, I create a list and call the function download_flickr_photos in a for loop. For simplicity, I only download two butterfly species of the three mentioned above.

butterflies = ['meadow brown butterfly', 'gatekeeper butterfly']
for butterfly in butterflies:

Data augmentation of images

Training a convnet on a small number of images will result in overfitting. Consequently, the model will make errors in classifying new, unseen images. Data augmentation can help to avoid this. Luckily Keras has some nice tools to transform images easily.

I’d like to compare it with how my son classifies cars on the road. At the moment he’s only 2 years old and hasn’t seen as many cars as an adult. So you could say his training set of images is rather small. Therefore he’s more likely to misclassify cars. For instance, he sometimes takes an ambulance mistakenly for a police van.

As he will grow older, he will see more ambulances and police vans, with the corresponding label that I will give him. So his training set will become larger and thus he will classify them more correctly.

For that reason, we need to provide the convnet with more butterfly images than we have at the moment. An easy solution for that is data augmentation. In short, this means applying a set of transformations to the Flickr images.

Keras provides a wide range of image transformations. But first, we’ll have to convert the images so that Keras can work with them.

Converting an image to numbers

We start by importing the Keras module. We will demonstrate the image transformations with one example image. For that purpose, we use the load_img method.

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
i = load_img('data/train/maniola_jurtina/1.jpg' )
x = img_to_array(i)
x = x.reshape((1,) + x.shape)

The load_img method creates a Python Image Library file. We’ll need to convert this to a Numpy array to use it in the ImageDataGenerator method later on. That’s done with the handy img_to_array method. As a result, we have an array of shape 75x75x3. These dimensions reflect the width, height and RGB values.

In fact, each pixel of the image has 3 RGB values. These range between 0 and 255 and represent the intensity of Red, Green and Blue. A lower value stands for higher intensity and a higher value for lower intensity. For instance, one pixel can be represented as a list of these three values [ 78, 136, 60]. Black would represented as [0, 0, 0].

Finally, we need to add an extra dimension to avoid a ValueError when applying the transformations. This is done with the reshape function.

Alright, now we have something to work with. Let’s continue with the transformations.


By specifying a value between 0 and 180, Keras will randomly choose an angle to rotate the image. It will do this clockwise or counter-clockwise. In our example, the image will be rotated with maximum of 90 degrees.

ImageDataGenerator also has a parameter fill_mode. The default value is ‘nearest’. By rotating the image within the width and height of the original image we end up with “empty” pixels. The fill_mode then uses the nearest pixels to fill this empty space.

imgGen = ImageDataGenerator(rotation_range = 90)
i = 1
for batch in imgGen.flow(x, batch_size=1, save_to_dir='example_transformations', save_format='jpeg', save_prefix='trsf'):
    i += 1
    if i > 3:

In the flow method, we specify where to save the transformed images. Make sure this directory exists! We also prefix the newly created images for convenience. The flow method would run infinitely, but for this example, we only generate three images. So when our counter reaches this value, we break the for loop. You can see the result below.

Width shift

In the width_shift_range parameter, you specify the ratio of the original width by which the image can be shifted to the left or right. Again, the fill_mode will fill up the newly created empty pixels. For the remaining examples, I will only show how to instantiate the ImageDataGenerator with the respective parameter. The code to generate the images is the same as in the rotation example.

imgGen = ImageDataGenerator(width_shift_range = 90)

In the transformed images we see that the image is shifted to the right. The empty pixels are filled which gives it a bit of a stretched look.

The same can be done for shifting up or down by specifying a value for the height_shift_range parameter.


Rescaling an image will multiply the RGB values of each pixel by a chosen value before any other preprocessing. In our example, we apply min-max scaling to the values. As a result, these values will range between 0 and 1. This makes the values smaller and easier for the model to process.

imgGen = ImageDataGenerator(rescale = 1./255)


With the shear_range parameter, we can specify how the shearing transformations must be applied. This transformation can produce rather weird images when the value is set too high. So don’t set it too high.

imgGen = ImageDataGenerator(shear_range = 0.2)


This transformation will zoom inside the picture. Just like the shearing parameter, this value should not be exaggerated to keep the images realistic.

imgGen = ImageDataGenerator(zoom_range = 0.2)

Horizontal flip

This transformation flips an image horizontally. Life can be simple sometimes…

imgGen = ImageDataGenerator(horizontal_flip = True)

All transformations combined

Now that we have seen the effect of each transformation separately, we apply all the combinations together.

imgGen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)
i = 1
for batch in imgGen.flow(x, batch_size=1, save_to_dir='example_transformations', save_format='jpeg', save_prefix='all'):
    i += 1
    if i > 3:

Setting up the folder structure

We need to store these images in a specific folder structure. As such we can use the method flow_from_directory to augment the images and create the corresponding labels. This folder structure needs to look like this:

  • train
  • maniola_jurtina
  • 0.jpg
  • 1.jpg
  • pyronia_tithonus
  • 0.jpg
  • 1.jpg
  • validation
  • maniola_jurtina
  • 0.jpg
  • 1.jpg
  • pyronia_tithonus
  • 0.jpg
  • 1.jpg

To create this folder structure I created a gist Feel free to use it in your projects.

Creating the generators

Just as before, we specify the configuration parameters for the training generator. The validation images will not be transformed as the training images. We only divide the RGB values to make them smaller.

The flow_from_directory method takes the images from the train or validation folder and generates batches of 32 transformed images. By setting the class_mode to ‘binary’ a one-dimensional label is created based on the image’s folder name.

train_datagen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True)
validation_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
validation_generator = validation_datagen.flow_from_directory(

What about different image sizes?

The Flickr API lets you download images of specific sizes. However, in real-world applications the image sizes are not always constant. If the aspect ratio of the images is the same, we can simply resize the images. Otherwise, we can crop the images. Unfortunately, it is difficult to crop the image while keeping the object we want to classify intact.

Keras can deal with different image sizes. When configuring the model you can specify None for the width and height in input_shape.

input_shape=(3, None, None)  # Theano
input_shape=(None, None, 3)  # Tensorflow

I wanted to show that it is possible to work with different image sizes, however, it has*** ***some drawbacks.

  • not all layers (e.g. Flatten) will work with None as an input dimension
  • it can be computationally heavy to run
Building the deep learning model

For the remainder of this article, I will discuss the structure of a convolutional neural network, illustrated with some examples for our butterfly project. At the end of this article, we’ll have our first classification results.

What layers does a convolutional neural network consist of?

Of course, you can choose how many layers and their type to add to your convolutional neural network (also called CNN or convnet). In this project we will start with the following structure:

Let’s understand what each layer does and how we create them with Keras.

Input layer

These different versions of the images were modified via several transformations. Then, these images are converted into a numerical representation or a matrix.

The dimensions of this matrix will be width x height x number of (color) channels*.* For RGB images the number of channels will be three. For grayscale images, this is equal to one. Below you can see a numerical representation of a 7×7 RGB image.

As our images are of size 75×75, we need to specify that in the input_shape parameter when adding the first convolutional layer.

cnn = Sequential()
cnn.add(Conv2D(32,(3,3), input_shape = (3 ,75 ,75)))

Convolutional layer

In the first layers, the convolutional neural network will look for lower-level features, like horizontal or vertical edges. The further we go in the network it will look for higher-level features, such as a wing of a butterfly, for example. But how does it detect features when it gets only numbers as input? That’s where filters come in.

Filters (or kernels)

You can think of a filter as a searchlight of a specific size that scans over the image. The filter example below has dimensions of 3x3x3 and contains weights that will detect a vertical edge. For a grayscale image, the dimensions would have been 3x3x1. Usually, a filter has smaller dimensions than the image we want to classify. 3×3, 5×5 or 7×7 are typically used. The third dimension should always be equal to the number of channels.

While scanning the image, the RGB values are transformed. It does this transformation by multiplying the RGB values with the filter’s weights. Finally, the multiplied values are then summed over all channels. In our 7x7x3 image example and the 3x3x3 filter, this would result in a 5x5x1 outcome.

The animation below illustrates this convolutional operation. For simplicity, we only look for a vertical edge in the Red channel. Thus, the weights for the Green and Blue channels are all equal to zero. But you should keep in mind that the multiplication results for these channels are added to the result of the Red channel.

As shown below the convolutional layer will produce numerical outcomes. When you have higher numbers, this means that the filter came across the feature it was looking for. In our example, a vertical edge.

We can specify that we want more than one filter. These filters could have their own feature to look for in an image. Suppose we use 32 filters of size 3x3x3. The result of all filters is stacked and we end up with a 5x5x32 volume in our example. In the code snippet above we added 32 filters of size 3x3x3.


In the example above we saw that the filter moves up one pixel at a time. This is the so-called stride. We could increase the number of pixels the filter moves up. Increasing the stride will reduce the dimensions of the original image much faster. In the example below, you see how the filter moves around with a stride of 2, which would result in a 3x3x1 outcome for a 3x3x3 filter and a 7x7x3 image.


By applying a filter, the dimensions of the original image are quickly reduced. Especially the pixels at the edges of the image are only used once in the convolutional operation. This results in a loss of information. If you want to avoid that, you can specify padding. Padding adds “extra pixels” around the image.

Suppose we add padding of one pixel around the 7x7x3 image. This results in a 9x9x3 image. If we apply a 3x3x3 filter and a stride of 1, we end up with a 7x7x1 outcome. So, in that case, we preserve the dimensions of the original image and the outer pixels are used more than once.

You can calculate the resulting outcome of the convolutional operation with specific padding and stride as follows:

1 + [(original dimension + padding x 2 — filter dimension) / stride size]

For example, suppose we have this set-up of our conv layer:

  • 7x7x3 image
  • 3x3x3 filter
  • padding of 1 pixel
  • stride of 2 pixels

That will give 1 + [(7 + 1 x 2–3) / 2] = 4

Why do we need convolutional layers?

A benefit of using conv layers is that the number of parameters to estimate is much lower. Much lower compared to having a normal hidden layer. Suppose we continue with our example image of 7x7x3 and a filter of 3x3x3 with no padding and stride of 1. The convolutional layer would have 5x5x1 + 1 bias = 26 weights to estimate. In a neural network with 7x7x3 inputs and 5x5x1 neurons in the hidden layer, we would need to estimate 3.675 weights. Imagine what this number is when you have larger images…

ReLu layer

Or Rectified Linear unit layer. This layer adds nonlinearity to the network. The convolutional layer is a linear layer as it sums up the multiplications of the filter weights and RGB values.

The outcome of a ReLu function is equal to zero for all values of x <= 0. Otherwise, it is equal to the value of x. The code in Keras to add a ReLu layer is:



Pooling aggregates the input volume in order to reduce the dimensions further. This speeds up computation time as the number of parameters to be estimated are reduced. Besides that, it helps to avoid overfitting by making the network more robust. Below we illustrate max pooling with a size of 2×2 and stride of 2.

The code in Keras to add pooling with a size of 2×2 is:

cnn.add(MaxPooling2D(pool_size = (2 ,2)))

Fully connected layer

At the end, the convnet is able to detect higher level features in the input images. This can then serve as an input for a fully connected layer. Before we can do that, we will flatten the output of the last ReLu layer. Flattening means we convert it to a vector. The vector values are then connected to all neurons in the fully connected layer. To do that in Python we use the following Keras functions:

cnn.add(Flatten()) cnn.add(Dense(64))


Just like pooling, dropout can help to avoid overfitting. It randomly sets a specified fraction of the inputs to zero, during the training of the model. A dropout rate between 20 and 50% is considered to work well.


Sigmoid activation

Because we want to produce a probability that the image is one of two butterfly species (i.e. binary classification), we can use a sigmoid activation layer.

cnn.add(Activation( 'sigmoid'))

Applying the convolutional neural network on the butterfly images

Now we can define the complete convolutional neural network structure as displayed at the beginning of this post. First, we need to import the necessary Keras modules. Then we can start adding the layers that we explained above.

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Flatten, Dense, Dropout
from keras.preprocessing.image import ImageDataGenerator
import time
IMG_SIZE = # Replace with the size of your images
NB_CHANNELS = # 3 for RGB images or 1 for grayscale images
BATCH_SIZE = # Typical values are 8, 16 or 32
NB_TRAIN_IMG = # Replace with the total number training images
NB_VALID_IMG = # Replace with the total number validation images

I made some additional parameters explicit for the conv layers. Here is a short explanation:

  • kernel_size specifies the filter size. So for the first conv layer this is size 2×2
  • padding = ‘same’ means applying zero padding as such the original image size is preserved.
  • padding = ‘valid’ means we do not apply any padding.
  • data_format = ‘channels_last’ is just to specify that the number of color channels is specified last in the input_shape argument.
cnn = Sequential()
cnn.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Finally, we compile this network structure and set the loss parameter to binary_crossentropy which is good for binary targets and use accuracy as the evaluation metric.

After having specified the network structure, we create the generators for the training and validation samples. On the training samples, we apply data augmentation as explained above. On the validation samples, we do not apply any augmentation as they are just used to evaluate the model performance.

train_datagen = ImageDataGenerator(
    rotation_range = 40,                  
    width_shift_range = 0.2,                  
    height_shift_range = 0.2,                  
    rescale = 1./255,                  
    shear_range = 0.2,                  
    zoom_range = 0.2,                     
    horizontal_flip = True)
validation_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
    batch_size = BATCH_SIZE)
validation_generator = validation_datagen.flow_from_directory(
    batch_size = BATCH_SIZE)

With the flow_from_directory* *method on the generators we can easily go through all the images in the specified directories.

Lastly, we can fit the convolutional neural network on the training data and evaluate with the validation data. The resulting weights of the model can be saved and reused later on.

start = time.time()
end = time.time()
print('Processing time:',(end - start)/60)

The number of epochs is arbitrarily set to 50. An epoch is the cycle of forward propagation, checking the error and then adjusting the weights during backpropagation.

The steps_per_epoch is set to the number of training images divided by the batch size (by the way, the double division symbol will make sure the result is an integer and not a float). Specifying a batch size greater than 1 will speed up the process. Idem for the validation_steps parameter.


After running 50 epochs, we have a training accuracy of 0.8091 and validation accuracy of 0.7359. So the convolutional neural network still suffers from quite some overfitting. We also see that the validation accuracy varies quite a lot. This is because we have a small set of validation samples. It would be better to do k-fold cross-validation for each evaluation round. But that would take quite some time.

To address the overfitting we could:

  • increase the dropout rate
  • apply dropout at each layer
  • find more training data

We’ll look into the first two options and monitor the result. The results of our first model will serve as a baseline. After applying an extra dropout layer and increasing the dropout rates, the model is a bit less overfitted.

I hope you’ve all enjoyed reading this post and learned something new. The full code is available on Github. Cheers!

Further reading:

Python for NLP: Developing an Automatic Text Filler using N-Grams

Machine Learning Pipelines: Nonlinear Model Stacking

Build up a Neural Network with python

A Deep Dive into NLP with PyTorch

Maintainable Code in Data Science

Tensorflow Tutorial - Modelling with Tensorflow 2.0

Python Tutorial for Data Science

Learn TensorFlow.js - Deep Learning and Neural Networks with JavaScript

Learn TensorFlow.js - Deep Learning and Neural Networks with JavaScript

This full course introduces the concept of client-side artificial neural networks. We will learn how to deploy and run models along with full deep learning applications in the browser! To implement this cool capability, we’ll be using TensorFlow.js (TFJS), TensorFlow’s JavaScript library.

By the end of this video tutorial, you will have built and deployed a web application that runs a neural network in the browser to classify images! To get there, we'll learn about client-server deep learning architectures, converting Keras models to TFJS models, serving models with Node.js, tensor operations, and more!

⭐️Course Sections⭐️

⌨️ 0:00 - Intro to deep learning with client-side neural networks

⌨️ 6:06 - Convert Keras model to Layers API format

⌨️ 11:16 - Serve deep learning models with Node.js and Express

⌨️ 19:22 - Building UI for neural network web app

⌨️ 27:08 - Loading model into a neural network web app

⌨️ 36:55 - Explore tensor operations with VGG16 preprocessing

⌨️ 45:16 - Examining tensors with the debugger

⌨️ 1:00:37 - Broadcasting with tensors

⌨️ 1:11:30 - Running MobileNet in the browser

Deep Learning Using TensorFlow

Deep Learning Using TensorFlow

Deep Learning Using TensorFlow. In this TensorFlow tutorial for professionals and enthusiasts who are interested in applying Deep Learning Algorithm using TensorFlow to solve various problems.

In this TensorFlow tutorial for professionals and enthusiasts who are interested in applying Deep Learning Algorithm using TensorFlow to solve various problems.

TensorFlow is an open source deep learning library that is based on the concept of data flow graphs for building models. It allows you to create large-scale neural networks with many layers. Learning the use of this library is also a fundamental part of the AI & Deep Learning course curriculum. Following are the topics that will be discussed in this TensorFlow tutorial:

  1. What is TensorFlow
  2. TensorFlow Code Basics
  3. TensorFlow UseCase
What are Tensors?

In this TensorFlow tutorial, before talking about TensorFlow, let us first understand what are tensors. **Tensors **are nothing but a de facto for representing the data in deep learning.

As shown in the image above, tensors are just multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set. In fact, the name “TensorFlow” has been derived from the operations which neural networks perform on tensors. It’s literally a flow of tensors. Since, you have understood what are tensors, let us move ahead in this **TensorFlow **tutorial and understand – what is TensorFlow?

What is TensorFlow?

**TensorFlow **is a library based on Python that provides different types of functionality for implementing Deep Learning Models. As discussed earlier, the term TensorFlow is made up of two terms – Tensor & Flow:

In TensorFlow, the term tensor refers to the representation of data as multi-dimensional array whereas the term flow refers to the series of operations that one performs on tensors as shown in the above image.

Now we have covered enough background about TensorFlow.

Next up, in this TensorFlow tutorial we will be discussing about TensorFlow code-basics.

TensorFlow Tutorial: Code Basics

Basically, the overall process of writing a TensorFlow program involves two steps:

  1. Building a Computational Graph
  2. Running a Computational Graph

Let me explain you the above two steps one by one:

1. Building a Computational Graph

So, what is a computational graph? Well, a computational graph is a series of TensorFlow operations arranged as nodes in the graph. Each nodes take 0 or more tensors as input and produces a tensor as output. Let me give you an example of a simple computational graph which consists of three nodes – a, b & c as shown below:

Explanation of the Above Computational Graph:

**What is TensorFlowTensorFlow Code BasicsTensorFlow UseCase **
Basically, one can think of a computational graph as an alternative way of conceptualizing mathematical calculations that takes place in a TensorFlow program. The operations assigned to different nodes of a Computational Graph can be performed in parallel, thus, providing a better performance in terms of computations.

Here we just describe the computation, it doesn’t compute anything, it does not hold any values, it just defines the operations specified in your code.

2. Running a Computational Graph

Let us take the previous example of computational graph and understand how to execute it. Following is the code from previous example:

Example 1:

import tensorflow as tf
# Build a graph
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b

Now, in order to get the output of node c, we need to run the computational graph within a session. Session places the graph operations onto Devices, such as CPUs or GPUs, and provides methods to execute them.

A session encapsulates the control and state of the *TensorFlow *runtime i.e. it stores the information about the order in which all the operations will be performed and passes the result of already computed operation to the next operation in the pipeline. Let me show you how to run the above computational graph within a session (Explanation of each line of code has been added as a comment):

# Create the session object
sess = tf.Session()
#Run the graph within a session and store the output to a variable
output_c =
#Print the output of node c
#Close the session to free up some resources

So, this was all about session and running a computational graph within it. Now, let us talk about variables and placeholders that we will be using extensively while building deep learning model using TensorFlow.

Constants, Placeholder and Variables

In TensorFlow, constants, placeholders and variables are used to represent different parameters of a deep learning model. Since, I have already discussed constants earlier, I will start with placeholders.


A TensorFlow constant allows you to store a value but, what if, you want your nodes to take inputs on the run? For this kind of functionality, placeholders are used which allows your graph to take external inputs as parameters. Basically, a placeholder is a promise to provide a value later or during runtime. Let me give you an example to make things simpler:

import tensorflow as tf
# Creating placeholders
a = tf. placeholder(tf.float32)
b = tf. placeholder(tf.float32)
# Assigning multiplication operation w.r.t. a &amp; b to node mul
mul = a*b
# Create session object
sess = tf.Session()
# Executing mul by passing the values [1, 3] [2, 4] for a and b respectively
output =, {a: [1,3], b: [2, 4]})
print('Multiplying a b:', output)
[2. 12.]

Points to Remember about placeholders:

**What is TensorFlowTensorFlow Code BasicsTensorFlow UseCase **
Now, let us move ahead and understand – what are variables?


In deep learning, placeholders are used to take arbitrary inputs in your model or graph. Apart from taking input, you also need to modify the graph such that it can produce new outputs w.r.t. same inputs. For this you will be using variables. In a nutshell, a variable allows you to add such parameters or node to the graph that are trainable i.e. the value can be modified over the period of a time. Variables are defined by providing their initial value and type as shown below:

var = tf.Variable( [0.4], dtype = tf.float32 )

**Note: **
**What is TensorFlowTensorFlow Code BasicsTensorFlow UseCase **
Constants are initialized when you call tf.constant, and their value can never change. On the contrary, variables are not initialized when you call tf.Variable. To initialize all the variables in a TensorFlow program, you must explicitly call a special operation as shown below:

init = tf.global_variables_initializer()

Always remember that a variable must be initialized before a graph is used for the first time.

Note: TensorFlow variables are in-memory buffers that contain tensors, but unlike normal tensors that are only instantiated when a graph is run and are immediately deleted afterwards, variables survive across multiple executions of a graph.

Now that we have covered enough basics of TensorFlow, let us go ahead and understand how to implement a linear regression model using TensorFlow.

Linear Regression Model Using TensorFlow

Linear Regression Model is used for predicting the unknown value of a variable (Dependent Variable) from the known value of another variables (Independent Variable) using linear regression equation as shown below:

Therefore, for creating a linear model, you need:

  1. Building a Computational Graph
  2. Running a Computational Graph

So, let us begin building linear model using TensorFlow:

Copy the code by clicking the button given below:

# Creating variable for parameter slope (W) with initial value as 0.4
W = tf.Variable([.4], tf.float32)
#Creating variable for parameter bias (b) with initial value as -0.4
b = tf.Variable([-0.4], tf.float32)
# Creating placeholders for providing input or independent variable, denoted by x
x = tf.placeholder(tf.float32)
# Equation of Linear Regression
linear_model = W * x + b
# Initializing all the variables
sess = tf.Session()
init = tf.global_variables_initializer()
# Running regression model to calculate the output w.r.t. to provided x values
print( {x: [1, 2, 3, 4]})) 


[ 0.     0.40000001 0.80000007 1.20000005]

The above stated code just represents the basic idea behind the implementation of regression model i.e. how you follow the equation of regression line so as to get output w.r.t. a set of input values. But, there are two more things left to be added in this model to make it a complete regression model:
**What is TensorFlowTensorFlow Code BasicsTensorFlow UseCase **
Now let us understand how can I incorporate the above stated functionalities into my code for regression model.

Loss Function – Model Validation

A loss function measures how far apart the current output of the model is from that of the desired or target output. I’ll use a most commonly used loss function for my linear regression model called as Sum of Squared Error or SSE. SSE calculated w.r.t. model output (represent by linear_model) and desired or target output (y) as:

y = tf.placeholder(tf.float32)
error = linear_model - y
squared_errors = tf.square(error)
loss = tf.reduce_sum(squared_errors)
print(, {x:[1,2,3,4], y:[2, 4, 6, 8]})


As you can see, we are getting a high loss value. Therefore, we need to adjust our weights (W) and bias (b) so as to reduce the error that we are receiving.

tf.train API – Training the Model

TensorFlow provides optimizers that slowly change each variable in order to minimize the loss function or error. The simplest optimizer is gradient descent. It modifies each variable according to the magnitude of the derivative of loss with respect to that variable.

#Creating an instance of gradient descent optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
for i in range(1000):, {x:[1, 2, 3, 4], y:[2, 4, 6, 8]})
print([W, b]))

 [array([ 1.99999964], dtype=float32), array([ 9.86305167e-07], dtype=float32)]

So, this is how you create a linear model using TensorFlow and train it to get the desired output.

Deep Learning from Scratch and Using Tensorflow in Python

Deep Learning from Scratch and Using Tensorflow in Python

In this article, we will learn how deep learning works and get familiar with its terminology — such as backpropagation and batch size

Originally published by Milad Toutounchian at
Deep learning is one of the most popular models currently being used in real-world, Data Science applications. It’s been an effective model in areas that range from image to text to voice/music. With the increase in its use, the ability to quickly and scalably implement deep learning becomes paramount. The rise of deep learning platforms such as Tensorflow, help developers implement what they need to in easier ways.

In this article, we will learn how deep learning works and get familiar with its terminology — such as backpropagation and batch size. We will implement a simple deep learning model — from theory to scratch implementation — for a predefined input and output in Python, and then do the same using deep learning platforms such as Keras and Tensorflow. We have written this simple deep learning model using Keras and Tensorflow version 1.x and version 2.0 with three different levels of complexity and ease of coding.

Deep Learning Implementation from Scratch

Consider a simple multi-layer-perceptron with four input neurons, one hidden layer with three neurons and an output layer with one neuron. We have three data-samples for the input denoted as X, and three data-samples for the desired output denoted as yt. So, each input data-sample has four features.

# Inputs and outputs of the neural net:
import numpy as np

X=np.array([[1.0, 0.0, 1.0, 0.0],[1.0, 0.0, 1.0, 1.0],[0.0, 1.0, 0.0, 1.0]])

The x*(m) in this figure is one-sample of Xh(m) is the output of the hidden layer for input x(m), and Wi* and Wh are the weights.

The goal of a neural net (NN) is to obtain weights and biases such that for a given input, the NN provides the desired output. But, we do not know the appropriate weights and biases in advance, so we update the weights and biases such that the error between the output of NN, yp(m), and desired ones, yt(m), is minimized. This iterative minimization process is called the NN training.

Assume the activation functions for both hidden and output layers are sigmoid functions. Therefore,

The size of weights, biases and the relationships between input and outputs of the neural net

Where activation function is the sigmoid, m is the mth data-sample and yp(m) is the NN output.

The error function, which measures the difference between the output of NN with the desired one, can be expressed mathematically as:

The Error defined for the neural net which is squared error

The pseudocode for the above NN has been summarized below:

pseudocode for the neural net training

From our pseudocode, we realize that the partial derivative of Error (E) with respect to parameters (weights and biases) should be computed. Using the chain rule from calculus we can write:

We have two options here for updating the weights and biases in backward path (backward path means updating weights and biases such that error is minimized):

  1. Use all *N * samples of the training data
  2. Use one sample (or a couple of samples)

For the first one, we say the batch size is N. For the second one, we say batch size is 1, if use one sample to updates the parameters. So batch size means how many data samples are being used for updating the weights and biases.

You can find the implementation of the above neural net, in which the gradient of the error with respect to parameters is calculated Symbolically, with different batch sizes here.

As you can see with the above example, creating a simple deep learning model from scratch involves methods that are very complex. In the next section, we will see how deep learning frameworks can assist in introducing scalability and greater ease of implementation to our model.

Deep Learning implementation using Keras, Tensorflow 1.x and 2.0

In the previous section, we computed the gradient of Error w.r.t. parameters from using the chain rule. We saw first-hand that it is not an easy or scalable approach. Also, keep in mind that we evaluate the partial derivatives at each iteration, and as a result, the Symbolic Gradient is not needed although its value is important. This is where deep-learning frameworks such as Keras and Tensorflow can play their role. The deep-learning frameworks use an AutoDiff method for numerical calculations of partial gradients. If you’re not familiar with AutoDiff, StackExchange has a great example to walk through.

The AutoDiff decomposes the complex expression into a set of primitive ones, i.e. expressions consisting of at most a single function call. As the differentiation rules for each separate expression are already known, the final results can be computed in an efficient way.

We have implemented the NN model with three different levels in Keras, Tensorflow 1.x and Tensorflow 2.0:

1- High-Level (Keras and Tensorflow 2.0): High-Level Tensorflow 2.0 with Batch Size 1

2- Medium-Level (Tensorflow 1.x and 2.0): Medium-Level Tensorflow 1.x with Batch Size 1 , Medium-Level Tensorflow 1.x with Batch Size NMedium-Level Tensorflow 2.0 with Batch Size 1Medium-Level Tensorflow v 2.0 with Batch Size N

3- Low-Level (Tensorflow 1.x): Low-Level Tensorflow 1.x with Batch Size N

Code Snippets:

For the High-Level, we have accomplished the implementation using Keras and Tensorflow v 2.0 with model.train_on_batch:

# High-Level implementation of the neural net in Tensorflow:
model.compile(loss=mse, optimizer=optimizer)
for _ in range(2000):
    for step, (x, y) in enumerate(zip(X_data, y_data)):
        model.train_on_batch(np.array([x]), np.array([y]))

In the Medium-Level using Tensorflow 1.x, we have defined:

E = tf.reduce_sum(tf.pow(ypred - Y, 2))
optimizer = tf.train.GradientDescentOptimizer(0.1)
grads = optimizer.compute_gradients(E, [W_h, b_h, W_o, b_o])
updates = optimizer.apply_gradients(grads)

This ensures that in the for loop, the updates variable will be updated. For Medium-Level, the gradients and their updates are defined outside the for_loop and inside the for_loop updates is iteratively updated. In the Medium-Level using Tensorflow v 2.x, we have used:

# Medium-Level implementation of the neural net in Tensorflow

# In for_loop
with tf.GradientTape() as tape:
   x = tf.convert_to_tensor(np.array([x]), dtype=tf.float64)
   y = tf.convert_to_tensor(np.array([y]), dtype=tf.float64)
   ypred = model(x)
   loss = mse(y, ypred)
gradients = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))

In Low-Level implementation, each weight and bias is updated separately. In the Low-Level using Tensorflow v 1.x, we have defined:

# Low-Level implementation of the neural net in Tensorflow:
E = tf.reduce_sum(tf.pow(ypred - Y, 2))
dE_dW_h = tf.gradients(E, [W_h])[0]
dE_db_h = tf.gradients(E, [b_h])[0]
dE_dW_o = tf.gradients(E, [W_o])[0]
dE_db_o = tf.gradients(E, [b_o])[0]
# In for_loop:
evaluated_dE_dW_h =,
                                     feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
        W_h_i = W_h_i - 0.1 * evaluated_dE_dW_h
        evaluated_dE_db_h =,
                                     feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
        b_h_i = b_h_i - 0.1 * evaluated_dE_db_h
        evaluated_dE_dW_o =,
                                     feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
        W_o_i = W_o_i - 0.1 * evaluated_dE_dW_o
        evaluated_dE_db_o =,
                                     feed_dict={W_h: W_h_i, b_h: b_h_i, W_o: W_o_i, b_o: b_o_i, X: X_data.T, Y: y_data.T})
        b_o_i = b_o_i - 0.1 * evaluated_dE_db_o

As you can see with the above low level implementation, the developer has more control over every single step of numerical operations and calculations.


We have now shown that implementing from scratch even a simple deep learning model by using Symbolic gradient computation for weight and bias updates is not an easy or scalable approach. Using deep learning frameworks accelerates this process as a result of using AutoDiff, which is basically a stable numerical gradient computation for updating weights and biases.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python for Data Science and Machine Learning Bootcamp

Machine Learning, Data Science and Deep Learning with Python

Deep Learning A-Z™: Hands-On Artificial Neural Networks

Artificial Intelligence A-Z™: Learn How To Build An AI

A Complete Machine Learning Project Walk-Through in Python

Machine Learning: how to go from Zero to Hero

Top 18 Machine Learning Platforms For Developers

10 Amazing Articles On Python Programming And Machine Learning

100+ Basic Machine Learning Interview Questions and Answers