Deep Learning method for object detection: R-CNN explained

CNN’s have been extensively used to classify images. But to detect an object in an image and to draw bounding boxes around them is a tough problem to solve. To solve this problem, R-CNN algorithm was published in 2014. After R-CNN, many of its variants like Fast-R-CNN, Faster-R-CNN and Mask-R-CNN came which improvised the task of object detection. To understand the latest R-CNN variants, it is important to have a clear understanding of R-CNN. Once this is understood, then all other variations can be understood easily.

This post will assume that the reader has familiarity with SVM, image classification using CNNs and linear regression.

Overview

The R-CNN paper[1] was published in 2014. It was the first paper to show that CNN can lead to high performance in object detection. This algorithm does object detection in the following way:

The method takes an image as input and extracts around 2000 region proposals from the image(Step 2 in the above image).
Each region proposal is then warped(reshaped) to a fixed size to be passed on as an input to a CNN.
The CNN extracts a fixed-length feature vector for each region proposal(Step 3 in the above image).
These features are used to classify region proposals using category-specific linear SVM(Step 4 in the above image).
The bounding boxes are refined using bounding box regression so that the object is properly captured by the box.

Now the post will dive into details explaining how the model is trained and how it predicts the bounding boxes.

#machine-learning #artificial-intelligence #r-cnn #deep-learning

Overview

towardsdatascience.com

Deep Learning method for object detection: R-CNN explained