Getting free accuracy boost on almost any architecture

Image for post

Source: FPN Paper

I have planned to read major object detection papers (although I have skimmed through most of them, now I will be reading them in detail, good enough to write a blog about them). The papers are related to deep learning-based object detection. Feel free to give suggestions or ask doubts will try my best to help everyone. Anyone starting with the field can skip a lot of these papers. I will also write the priority/importance of the papers once I read them all.

I have written the blog considering readers similar to me and still learning. Although I will try my best to write the crux of the paper by understanding paper in depth from various sources including blogs, codes, and videos, in case you find any error feel free to highlight it or add a comment on the blog. I have mentioned the list of papers that I will be covering at the end of the blog.

Let’s get started :)


Yah subtitle is correct, FPN is a very simple method that can be used with almost any model to improve results. We will jump into technicalities of the paper soon, but for this blog, there are some pre-requisites. You should have a high-level idea about the following Fast RCNN, Faster RCNN, anchor boxes, knowledge of SSD will come in handy. I have blogs for all these papers as well you can check them(links at the end of this blog). FPN is a relatively simpler if you understand all the prerequisites well.

Image pyramids(multiple images of multiple scales) are often used at the time of predictions to improve the results. But computing results using modern deep learning architectures is often an expensive process in terms of both computing and time.

FPN is based on exploiting the inherent multi-scale pyramidal hierarchy of deep CNN. It is analogous to the difference between RCNN and Fast RCNN, RCNN is a region-based object detector in which first we find ROI’s using an algorithm such as selective search and then crop these ROI’s(around 2000) from the image and feeding them into CNN to get results and in Fast RCNN the initial layers of CNN are shared for complete image and the ROI cropping is done on the extracted feature map thus saving a lot of time. In the case of FPN, the research is based on exploiting internal multi-scale nature, and the image pyramid is somehow implemented internally to architecture and sharing most parts of the network. We will jump into technical details now.

CNN is based on the hierarchical structure in which the resolution of the feature map is reduced after each layer but semantics captured by every deeper layer is stronger than the previous layer. The semantically stronger features are spatially coarser because of downsampling. FPN creates an architecture where the semantically stronger features are merged with the features from previous layers(which are subsampled fewer times and thus have more accurate localization information).

The architecture consists of two pathways:

  1. Bottom-up pathway (Normal feed-forward CNN)
  2. Top-down pathway (New architecture used for merging features)

Image for post

#computer-vision #deep-learning #object-detection #machine-learning #tutorial #deep learning

FPN(feature pyramid networks)
15.75 GEEK