The computer vision community has converged on the metric mAP to compare the performance of object detection systems. In this post, we will dive into the intuition behind how mean Average Precision (mAP) is calculated and why mAP has become the preferred metric for object detection.

A Quick Overview of Object Detection

Before we consider how to calculate a mean average precision, we will first define the task it is measuring.

Object detection models seek to identify the presence of relevant objects in images and classify those objects into relevant classes. For example, in medical images, we might want to be able to count the number of red blood cells (RBC), white blood cells (WBC), and platelets in the bloodstream. In order to do this automatically, we need to train an object detection model to recognize each one of those objects and classify them correctly. (I did this in a Colab notebook to compare EfficientDet and YOLOv3, two state-of-the-art models for image detection.)

Example outputs from EfficientDet (green) versus YOLOv3 (yellow) in my notebook

The models both predict bounding boxes which surround the cells in the picture. They then assign a class to each one of those boxes. For each assignment, the network models a sense of confidence in its prediction. You can see here that we have a total of three classes (RBC, WBC, and Platelets).

How should we decide which model is better? Looking at the image, it looks like EfficientDet (green) has drawn a few too many RBC boxes and missed some cells on the edge of the picture. That is certainly how it feels based on the looks of things — but can we trust an image and intuition? If so, by how much is it better? (Hint: it’s not — skip to the bottom if you don’t believe.)

#research #evaluation #artificial-intelligence #computer-vision #object-detection #artificial intelligence

What is Mean Average Precision (mAP) in Object Detection?
6.75 GEEK