Segmentation and Object Detection

Welcome back to deep learning! So today, we want to discuss the single-shot detectors and how we can actually approach real-time object detection.

Image for post

The general idea of single-shot detectors. Image under CC BY 4.0 from the Deep Learning Lecture.

Okay, the fourth part of segmentation and object detection — the single-shot detectors. So, can’t we just use the region proposal network as a detector in you look only once fashion? This is the idea of YOLO that is a single-shot detector. You only look once — you combine the bounding box prediction and the classification into a single network.

Image for post

The YOLO Algorithm. Image under CC BY 4.0 from the Deep Learning Lecture.

This is done by subdividing the image essentially into S times S cells and for every cell, you do in parallel the class probability map computation and you produce bounding boxes and confidence. This then gives you for each cell B bounding boxes with a confidence score and the class confidence and that is produced from a CNN. So the CNN predicts S times S times (5 B + C) values, where C is the number of classes. In the end, to produce the final object detection, you compute the overlap of the bounding box with the respective class probability map. This then allows you to compute the average within this bounding box to produce the final class of that respective object. This way you are able to solve complex scenes like this one and this is really real-time.

Image for post

Specs of YOLO9000 Image under CC BY 4.0 from the Deep Learning Lecture.

So there’s YOLO9000 which is an improved version of YOLO which is advertised as better, faster, and stronger. So it’s better because the batch normalization is used. They also do high-res classification to improve the mean average precision by up to 6%. The anchor boxes that are found by the clustering over the training data improves the recall by 7%. Training over multiple scales allows YOLO9000 to detect objects at different resolutions more easily. It’s faster because it’s using a difference CNN architecture which speeds up the forward pass. Finally, it’s stronger because it has this hierarchical detection on a tree that allows combining different object detection datasets. All in this allows YOLO9000 to detect up to 9,000 classes in real-time or faster.

YOLO9000 in action. Image created using gifify. Source: YouTube

There is also the single-shot multi-box detector in [24]. It’s a popular alternative to YOLO. It is also a single-shot detector like Yolo with only one forward pass through the CNN.

#artificial-intelligence #data-science #fau-lecture-notes #machine-learning #deep-learning #machine learning

towardsdatascience.com

Segmentation and Object Detection