State-Of-The-Art Image Classification Algorithm: FixEfficientNet-L2. Combining FixRes and EfficientNet from Facebook and Google AI Team.

FixEfficientNet is a technique combining two existing techniques: The FixRes from the Facebook AI Team[2]and the EfficientNet [3] first presented from the Google AI Research Team. FixRes is the short form for Fix Resolution and tries to keep a fixed size for either the RoC (Region of Classification) used for train time or the crop used for test time. The EfficientNet is a compound scaling of the dimensions of a CNN which improves both accuracy and efficiency. This article is meant to explain both techniques and why they are state-of-the-art.

The FixEfficientNet has been presented first with the corresponding paper on the 20th April 2020 from the Facebook AI Research Team [1]. The technique is used for Image Classification and consecutively a task of the field of Computer Vision. It is currently the state-of-the-art and has the best results on the ImageNet Dataset with 480M params, a top-1 accuracy of 88.5%, and top-5 accuracy of 98,7%.

But let’s dive in a bit deeper to get a better understanding of the combined techniques:

Understanding FixRes

Training Time

Until the Facebook AI Research Team proposed the FixRes technique the state-of-the-art was to extract a random square of pixels out of an image. This was used as RoC for the training time. (Be aware that using this technique the amount of data is artificially increased). The image has then been resized to obtain an image of a fixed size (=crop). This was then fed to the Convolutional Neural Network [2].

RoC = rectangle/square in input image

crop = pixels of RoC rescaled with a biliniear interpolation to a certain resolution

