Convolutional Neural Networks for Computer Vision: MIT 6.S191

MIT Introduction to Deep Learning 6.S191: Lecture 3
Convolutional Neural Networks for Computer Vision
Lecturer: Alexander Amini
January 2021

For all lectures, slides, and lab materials:

Lecture Outline

  • 0:00 - Introduction
  • 2:47 - Amazing applications of vision
  • 7:56 - What computers “see”
  • 14:02 - Learning visual features
  • 18:50 - Feature extraction and convolution
  • 22:20 - The convolution operation
  • 27:27 - Convolution neural networks
  • 34:05 - Non-linearity and pooling
  • 38:59 - End-to-end code example
  • 40:25 - Applications
  • 42:02 - Object detection
  • 50:52 - End-to-end self driving cars
  • 54:00 - Summary

#data-science #machine-learning #deep-learning

What is GEEK

Buddha Community

Convolutional Neural Networks for Computer Vision: MIT 6.S191

Convolutional Neural Networks-An Intuitive approach

A simple yet comprehensive approach to the concepts

Image for post

Convolutional Neural Networks

Artificial intelligence has seen a tremendous growth over the last few years, The gap between machines and humans is slowly but steadily decreasing. One important difference between humans and machines is (or rather was!) with regards to human’s perception of images and sound.How do we train a machine to recognize images and sound as we do?

At this point we can ask ourselves a few questions!!!

How would the machines perceive images and sound ?

How would the machines be able to differentiate between different images for example say between a cat and a dog?

Can machines identify and differentiate between different human beings for example lets say differentiate a male from a female or identify Leonardo Di Caprio or Brad Pitt by just feeding their images to it?

Let’s attempt to find out!!!

The Colour coding system:

Lets get a basic idea of what the colour coding system for machines is

RGB decimal system: It is denoted as rgb(255, 0, 0). It consists of three channels representing RED , BLUE and GREEN respectively . RGB defines how much red, green or blue value you’d like to have displayed in a decimal value somewhere between 0, which is no representation of the color, and 255, the highest possible concentration of the color. So, in the example rgb(255, 0, 0), we’d get a very bright red. If we wanted all green, our RGB would be rgb(0, 255, 0). For a simple blue, it would be rgb(0, 0, 255).As we know all colours can be obtained as a combination of Red , Green and Blue , we can obtain the coding for any colour we want.

Gray scale: Gray scale consists of just 1 channel (0 to 255)with 0 representing black and 255 representing white. The colors in between represent the different shades of Gray.

Computers ‘see’ in a different way than we do. Their world consists of only numbers.

Every image can be represented as 2-dimensional arrays of numbers, known as pixels.

But the fact that they perceive images in a different way, doesn’t mean we can’t train them to recognize patterns, like we do. We just have to think of what an image is in a different way.

Image for post

Now that we have a basic idea of how images can be represented , let us try and understand The architecture of a CNN

CNN architecture

Convolutional Neural Networks have a different architecture than regular Neural Networks. Regular Neural Networks transform an input by putting it through a series of hidden layers. Every layer is made up of a set of neurons, where each layer is fully connected to all neurons in the layer before. Finally, there is a last fully-connected layer — the output layer — that represent the predictions.

Convolutional Neural Networks are a bit different. First of all, the layers are organised in 3 dimensions: width, height and depth. Further, the neurons in one layer do not connect to all the neurons in the next layer but only to a small region of it. Lastly, the final output will be reduced to a single vector of probability scores, organized along the depth dimension

Image for post

Image for post

A typical CNN architecture

As can be seen above CNNs have two components:

  • The Hidden layers/Feature extraction part

In this part, the network will perform a series of **convolutions **and pooling operations during which the features are detected. If you had a picture of a tiger , this is the part where the network would recognize the stripes , 4 legs , 2 eyes , one nose , distinctive orange colour etc.

  • The Classification part

Here, the fully connected layers will serve as a classifier on top of these extracted features. They will assign a** probability** for the object on the image being what the algorithm predicts it is.

Before we proceed any further we need to understand what is “convolution”, we will come back to the architecture later:

What do we mean by the “convolution” in Convolutional Neural Networks?

Let us decode!!!

#convolutional-neural-net #convolution #computer-vision #neural networks

CNN Series Part 1: How do computers see images?

In this article, we will learn about how computers see images & the issues faced while performing a computer vision task. We will see how deep learning comes into the picture & how with the power of neural networks, we can build a powerful computer vision system capable of solving extraordinary problems.

Image for post

One example of how deep learning is transforming computer vision is facial recognition or face detection. On the top left, you can see the icon of the human eye which visually represents vision coming into the deep neural network in the form of images, pixels, videos & on the output on the bottom you can see a depiction of the human face or detection of the human face or this could also be recognizing different human faces or emotions on the face and also the key facial features, etc.

#convolution-neural-net #computer-vision #neural-networks #cnn #convolutional-network #series

Convolutional Neural Network (CNN)

Image for post

Hello fellow people, It is instructive for instance to trace the computer industry’s to decline in vision, idealism, creativity, romance and sheer fun as it becomes more important and prosperous.

Here, Lets look into computational neural network architecture and constructing a cnn model for detection of ship using satellite imagery.

What is CNN ?

1. They are extensively used in computer vision problems. They differ from Multi-Layer Perceptron in manner and relatively cheap computing.

2. They are mainly used to classify, detect or recognize objects from image or video data.


Image for post

  1. Input Layer: Read the required input data, scale it to pixel dimension between 0–255 and specify the range of bandwidth either grayscale or RGB
  2. Convolution Layer:

Image for post

Extract the features of the object present on the image by detecting specific patterns within the picture. The computer will scan a part of the image, usually with a matrix dimension known as Filter i.e.,3x3 matrix. The output of the convolution layer is called a Feature map.

Strides: The number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on.

3.Pooling Layer:

Image for post

Pooling is done to reduce the dimensionality of the input image.

Eg: View the diagram. “pooling” will screen a 4x4 feature map and return the maximum value. The pooling takes the maximum value of a 2x2 array and then move this windows by two pixels.

4. Fully Connected Layer: Fully Connected layers in a neural networks are those layerswhere all the inputs from one layer are connected to every activation unit of the next layer.

5. Dense Layer or Output Layer: It takes the input and return the output using appropriate activation function.

#image-recognition #computer-vision #data-science #convolutional-network #neural network

Convolutional Neural Networks, Explained

A Convolutional Neural Network, also known as CNN or ConvNet, is a class of neural networks that specializes in processing data that has a grid-like topology, such as an image. A digital image is a binary representation of visual data. It contains a series of pixels arranged in a grid-like fashion that contains pixel values to denote how bright and what color each pixel should be.

Image for post

Figure 1: Representation of image as a grid of pixels (Source)

The human brain processes a huge amount of information the second we see an image. Each neuron works in its own receptive field and is connected to other neurons in a way that they cover the entire visual field. Just as each neuron responds to stimuli only in the restricted region of the visual field called the receptive field in the biological vision system, each neuron in a CNN processes data only in its receptive field as well. The layers are arranged in such a way so that they detect simpler patterns first (lines, curves, etc.) and more complex patterns (faces, objects, etc.) further along. By using a CNN, one can enable sight to computers.

Convolutional Neural Network Architecture

A CNN typically has three layers: a convolutional layer, a pooling layer, and a fully connected layer.

Image for post

Figure 2: Architecture of a CNN (Source)

Convolution Layer

The convolution layer is the core building block of the CNN. It carries the main portion of the network’s computational load.

This layer performs a dot product between two matrices, where one matrix is the set of learnable parameters otherwise known as a kernel, and the other matrix is the restricted portion of the receptive field. The kernel is spatially smaller than an image but is more in-depth. This means that, if the image is composed of three (RGB) channels, the kernel height and width will be spatially small, but the depth extends up to all three channels.

#computer-vision #writing-nn #convolution-network #cnn #fashion-mnist #neural networks

Angela  Dickens

Angela Dickens


Satellite image classification with a convolutional neural network.

My latest project at Flatiron was to use neural networks to classify satellite image tiles. I chose to use a convolutional neural network (CNN) and create a dataset of webscraped images to train the model with. This will just be a quick rundown of what went into the project with additional links to my articles to more of the technical parts. This way, it can help to familiarize you with the topics or help to share more about my work with those who have similar interests in computer vision and machine learning.

I chose to use a CNN because I read some school lessons on computer vision about how a CNN has advantages with image classification. A CNN uses pooling layers that filter through patches of the image pixels, finding common patterns, which develop into more complex patterns in order to help determine image class. I chose to work on a computer vision project with satellite images because there are possible use cases for solutions on Earth as well as use cases on other planets. I’ve read articles about organizations looking at different geological patterns on the Mars surface in search of the possible presence of water or perhaps its pre-existence on the planet. This led me to try and build the model to recognize river delta patterns here on Earth, with the next step being to train the model and locate delta patterns on Mars. The model could also eventually be useful for looking at changes to river deltas on Earth, for possible use in agriculture, climate change or even real estate. For now, the project is ongoing as of my writing this blog post, with training and testing performed on the Earth images. The Mars images will be the next part I’ll begin after graduation.

A land image tile.

A river delta image tile.

A Mars delta image tile.

In order to obtain the images for a dataset, I looked into some different API’s and webscraping with Beautiful Soup. Afterwards, I decided to use Selenium to scrape some images from an image search on Google. This method was able to scroll through the page interactively, which was necessary in order to have access to all of the images. I wrote a separate article about that process here. This method was useful as a starting point, in order to go through the process of building the dataset, creating the model, training, testing and just getting everything to work. The disadvantage was that there were a lot of images that were not clean, contained pieces of text or other image artifacts and overall led to less accurate results. There are some example images below so you can see what I mean. I do not claim copyright to any of the used images as they were used for an educational project for school and will remove them if anyone objects to their display on my article.

#convolutional-network #computer-vision #python #image-classification #machine-learning #neural networks