An example of a false positive caused by missing ground truth on the Open Images dataset
As the performance of deep learning models trained on massive datasets continues to advance, large-scale dataset competitions have become the proving ground for the latest and greatest computer vision models. We’ve come a long way as a community from the times where MNIST — a dataset with only 70,000 28x28 pixel images — was the de facto standard. New, larger datasets have arisen out of a desire to train more complex models to solve more challenging tasks: ImageNet, COCO and Google’s Open Images are among the most popular.
But even on these huge datasets the differences in performance of top models is becoming narrow. The 2019 Open Images Detection Challenge shows the top five teams fighting for less than a 0.06 margin in mean average precision (mAP). It’s even less for COCO.
There’s no doubt that our research community is delivering when it comes to developing innovative new techniques to improve model performance, but the model is only half of the picture. Recent findings have made it increasingly clear that the other half — the data — plays at least as critical of a role, perhaps even greater.
Just this year…
And here’s what two leaders of the field are saying about this:
How many times have you found yourself spending hours, days, weeks pouring over samples in your data? Have you been surprised by how much manual inspection was necessary? Or can you think of a time when you trusted macro statistics perhaps more than you should?
The computer vision community is starting to wake up to the idea that we need to be close to the data. If we want accurate models that behave as expected, it’s not enough to have a large dataset; it needs to have the right data and it needs to be accurately labeled.
Every year, researchers are battling it out to climb to the top of a leaderboard with razor thin margins determining fates. But do we really know what’s going on with these datasets? Is a 0.01 margin in mAP even meaningful?
#open-images #fiftyone #visualization #evaluation #machine-learning