Image Captioning refers to the process of generating textual description from an image — based on the objects and actions in the image. This is a 3 part series to implement Image Captioning as presented by Andrej Karapathyin his PHD thesis paper at Stanford.

In the process, we would learn basics of Neural Network, create a Convolutional Neural Network(CNN) in Keras(wrapper around Tensorflow), explore State of The Art NLP models(Sequence to Sequence, Glove, BERT etc) and stack together CNN and NLP model using LSTM to generate captions of an image.

We would take it from there and create Recommender systems based on Pre-Trained Image and Captions vector followed by a live WebApp as the testing ground for Caption Generation as well as Recommendations.

Image for post

Recommendations based on pattern of first Footwear

Table of Content(For Part 1):

  • Basics Of Neural Networks
  • Convolution Neural Network for Image Recognition
  • Setting up a Google Colab Notebook
  • Creating a Neural Network in Keras for Image Classification

Neural Network Basics:

A neural network is a type of machine learning which models itself after the human brain. This creates an artificial neural network via an algorithm which allows the computer to learn by incorporating new data.

It takes several input, processes it through multiple neurons from multiple hidden layers and returns the result using an output layer. This result estimation process is technically known as “Forward Propagation“.

Next, we compare the result with actual output. The task is to make the output to neural network as close to actual (desired) output. Each of these neurons are contributing some error to final output. How do you reduce the error?

We try to minimize the value/ weight of neurons those are contributing more to the error and this happens while traveling back to the neurons of the neural network and finding where the error lies. This process is known as “Backward Propagation“. Backward Propagation(BP) updates the weights to minimize the error resulting from each neuron.

In order to reduce these number of iterations to minimize the error, the neural networks use a common algorithm known as “Gradient Descent”, which helps to optimize the task quickly and efficiently. More about Gradient descent here.

The aim of multiple epochs(Forward and Backward Propagation) is just to optimize the weights and biases of multiple layers so as to minimize the errors.

Various Categories of Neural Nets:

  • Convolutional Neural Network(CNN)
  • Recurrent Neural Network(RNN)
  • LSTM and GRUs

Lets delve deep into CNN here and I will cover other categories in subsequent posts.

#deep-learning #machine-learning #neural-networks #nlp

A picture is worth a Thousand Words- Lets figure out the relevant ones
1.25 GEEK