In this video, we are going to build a convolutional neural network using CIFAR10 dataset in Python.

We will discuss the following in this video:
🕕 (0:00:31) Introduction
🕕 (0:01:36) Building a CNN using CIFAR10 Dataset

PyTorch For Deep Learning — Convolutional Neural Networks ( Fashion-MNIST )

Fashion MNIST

Fashion Mnist is a Dataset created by Zolando Fashion Wear to replace the Original Mnist and at the same time increasing the difficulty.

This blog post is all about how to create a model to predict fashion mnist images and shows how to implement convolutional layers in the network

Let’s look at the code

Build your own Neural Network for CIFAR-10 using PyTorch

In 6 simple steps

Neural network seems like a black box to many of us. What happens inside it, how does it happen, how to build your own neural network to classify the images in datasets like MNIST, CIFAR-10 etc. are the questions that keep popping up. Let’s try to understand a Neural Network in brief and jump towards building it for CIFAR-10 dataset. By the end of this article you will have answers to :

  1. What are neural networks?
  2. How to build a neural network model for cifar-10 dataset by using PyTorch?

What are neural networks?

Neural networks(NN) are inspired by the human brain. A neuron in a human brain, individually is at rest until it collects signals from others through a structure called dendrites, when the excitation that it receives is sufficiently high, the neuron is fired up(gets activated) and it passes on the information. Artificial neural networks(ANN) are made up of interconnected model/artificial neurons(known as perceptron) that take many weighted inputs , add them up and pass it through a non-linearity to produce an output. Sounds simple!

Convolutional Neural Networks(CNN’s) — A practical perspective

Hello everyone,

Hoping that everyone is safe and doing well. In this blog, we will look into some concepts of CNN’s for image classification that are often missed or misunderstood by beginners (including me till some time back). This blog requires the reader to have some basic idea of how CNN’s work. However, we will cover the important aspects of CNN’s before getting deeper into advanced topics.

After this, we will look at a machine learning technique called Transfer learning and how it is useful in training a model with less data on a deep learning framework. We will train an image classification model on top of Resnet34 architecture using the data that contains digitally recorded heartbeats of human beings in the form of audio (.wav) files. In the process, we will convert each of these audio files into an image by converting them to spectrograms using a popular python audio library called Librosa. In the end, we will examine the model with popular error metrics and check its performance.

CNN’s for image classification:

Neural networks with more than 1 convolution operations are called convolutional neural networks (CNN). Input of a CNN contains images with numerical values in each pixel arranged spatially along the width, height and depth (channels). The goal of the total architecture is to get a probability score of an image belonging to a certain class by learning from these numerical values arranged spatially. In the process, we perform operations like pooling and convolutions on these numerical values to squeeze and stretch them along the depth.

An image typically contains three layers namely RGB (Red, Green, Blue).

Image by Purit Punyawiwat, Source : Datawow

Main operations in CNN’s

Convolution operation

Convolution operation is (w.x+b) applied to all the different spatial localities in the input volume. Using more number of convolution operations helps to learn a particular shape even if its location in the image is changed.

Example: Generally clouds are present on the top of a landscape image. If an inverted image is fed into a CNN, more number of convolutional operations makes sure that the model identifies the cloud portion even if it is inverted.

Mathematical Expression: x_new = w.x + b where w is the filter/kernel, b is the bias and x is part of a hidden layer output. Both w and  _b _are different for every convolution operation applied on different hidden layers.

Convolution Operations(Source: Lecture 22-EECS251,


Pooling reduces the spatial dimensions of each activation map (output after convolution operation) while aggregating the localised spatial information. Pooling helps to squeeze the output from hidden layers along height and width. If we consider maximum value within the non-overlapping sub-regions then it is called Max-pooling. Max-pooling also adds non-linearity to the model.

Convolutional Neural Networks-An Intuitive approach

A simple yet comprehensive approach to the concepts

Convolutional Neural Networks

Artificial intelligence has seen a tremendous growth over the last few years, The gap between machines and humans is slowly but steadily decreasing. One important difference between humans and machines is (or rather was!) with regards to human’s perception of images and sound.How do we train a machine to recognize images and sound as we do?

At this point we can ask ourselves a few questions!!!

How would the machines perceive images and sound ?

How would the machines be able to differentiate between different images for example say between a cat and a dog?

Can machines identify and differentiate between different human beings for example lets say differentiate a male from a female or identify Leonardo Di Caprio or Brad Pitt by just feeding their images to it?

Let’s attempt to find out!!!

The Colour coding system:

Lets get a basic idea of what the colour coding system for machines is

RGB decimal system: It is denoted as rgb(255, 0, 0). It consists of three channels representing RED , BLUE and GREEN respectively . RGB defines how much red, green or blue value you’d like to have displayed in a decimal value somewhere between 0, which is no representation of the color, and 255, the highest possible concentration of the color. So, in the example rgb(255, 0, 0), we’d get a very bright red. If we wanted all green, our RGB would be rgb(0, 255, 0). For a simple blue, it would be rgb(0, 0, 255).As we know all colours can be obtained as a combination of Red , Green and Blue , we can obtain the coding for any colour we want.

Gray scale: Gray scale consists of just 1 channel (0 to 255)with 0 representing black and 255 representing white. The colors in between represent the different shades of Gray.

Computers ‘see’ in a different way than we do. Their world consists of only numbers.

Every image can be represented as 2-dimensional arrays of numbers, known as pixels.

But the fact that they perceive images in a different way, doesn’t mean we can’t train them to recognize patterns, like we do. We just have to think of what an image is in a different way.

Now that we have a basic idea of how images can be represented , let us try and understand The architecture of a CNN

CNN architecture

Convolutional Neural Networks have a different architecture than regular Neural Networks. Regular Neural Networks transform an input by putting it through a series of hidden layers. Every layer is made up of a set of neurons, where each layer is fully connected to all neurons in the layer before. Finally, there is a last fully-connected layer — the output layer — that represent the predictions.

Convolutional Neural Networks are a bit different. First of all, the layers are organised in 3 dimensions: width, height and depth. Further, the neurons in one layer do not connect to all the neurons in the next layer but only to a small region of it. Lastly, the final output will be reduced to a single vector of probability scores, organized along the depth dimension

A typical CNN architecture

As can be seen above CNNs have two components:

  • The Hidden layers/Feature extraction part

In this part, the network will perform a series of **convolutions **and pooling operations during which the features are detected. If you had a picture of a tiger , this is the part where the network would recognize the stripes , 4 legs , 2 eyes , one nose , distinctive orange colour etc.

  • The Classification part

Here, the fully connected layers will serve as a classifier on top of these extracted features. They will assign a** probability** for the object on the image being what the algorithm predicts it is.

Before we proceed any further we need to understand what is “convolution”, we will come back to the architecture later:

What do we mean by the “convolution” in Convolutional Neural Networks?

Let us decode!!!

A Comparative Analysis of Recurrent Neural Networks

Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. RNN models are mostly used in the fields of natural language processing and speech recognition.

The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.

Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU.

1D Convolution_ layer_ creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. It is very effective for deriving features from a fixed-length segment of the overall dataset. A 1D CNN works well for natural language processing (NLP).

DATASET: IMDb Movie Review

TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as [](, enabling easy-to-use and high-performance input pipelines.


This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. It provides a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.

Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Load the Dataset

import tensorflow as tf
import tensorflow_datasets

imdb, info=tensorflow_datasets.load("imdb_reviews", with_info=True, as_supervised=True)

Training and Testing Data

train_data, test_data=imdb['train'], imdb['test']

for s,l in train_data:
for s,l in test_data:

Tokenization and Padding

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer= Tokenizer(num_words=vocab_size, oov_token=oov_tok)
padded=pad_sequences(sequences, maxlen=max_length, truncating=trunc_type)
testing_padded=pad_sequences(testing_sequences, maxlen=max_length)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Embedding

Multi-layer Bidirectional LSTM

