In recent years we have seen considerable research in the area of image classification using very large databases like ILSVRC[ImageNet Large Scale Visual Recognition Challenge] [11] dataset.

The application of deep cnn gave considerable improvement over older models. It started with deep cnn architecture proposed by Alex Krizhevsky et. al named AlexNet [8] then another architecture with similar depth as AlexNet was proposed by Zeiler and Fergus [15]. The successful result of these models on ImageNet data set paved way for deeper convolution neural networks. Visual Geometry Group from Oxford university proposed deeper architecture of which VGG16 and VGG19 [12] are publicly available. This increase in image classification efficiency caught eye of big IT companies like Google and Microsoft. Google invested in research on deeper convolution nets and soon the Google team, Szegedy et al came up with Inception network [14], another deep cnn architecture from Google was created by Francois Chollet in 2017 named as Xception [5]. Microsoft came up with it’s own series of deep networks as ResNets proposed by He et al [7]. Since their first proposals, all of these researchers have come up with further enhancements and better architectures to overcome short coming of previous ones and to further improve the accuracy and generalization capabilities of the proposed models.

All these models follow Supervised Learning methods. In Supervised Learning we feed the model with sample data which contains input features and corresponding output. Model is supposed to learn a generalized solution to given data. Other types of learning are Unsupervised learning and Reinforcement Learning. The models learns useful representations of the input data which enables them to solve complex problems like image recognition and classification efficiently. [3]

Transfer learning has increased the efficiency of Deep Learning models manifold due to this reuse capability.

In a generic setup, a user uses an state of the art pre-trained model and weights to get useful representation of the input which then is fed to a user designed top model. This top model usually is a much smaller model containing one or two hidden layers. Thus run time reduces amazingly and the problem can be solved much efficiently. If we pick the correct model to gain representation, the accuracy of the overall setup is also good. We will see an application of Transfer Learning to image classification in this study.

Representation Learning

Useful representations have been in use for a long time in our daily life and computer systems alike. Representations determine the performance in many information processing tasks. Correct representation makes is easy to understand, visualize and solve the problem in less time, less effort and less space. For example, it takes O(1) time to access a number from numbers stored in an array and it takes O(n) time to access a number from numbers stored in a stack.

In machine learning and deep learning as well useful representations makes the learning task easy. The selection of a useful representation mainly depends on the problem at hand i.e. the learning task.

In deep learning the feed forward neural network can be viewed as performing representation learning when trained in supervised manner. Input layer in combination with all hidden layers is supposed to convert the input in useful representation. This representation acts as input to the last layer which usually is a classification layer. This classifier which usually is a softmax regression classifier, linear regression or logistic regression classifier make use of the representation received from earlier layers to successfully solve the target problem. e.g. in case of a model which is trained to detect cats in an image, the representation to the last layer is such that it can segregate images containing cats and non cat images.

You can read more on representation learning concept in another article that i wrote earlier

#face-recognition #image-classification #deep-learning #deep learning

Study of state of the art Image classification models
1.10 GEEK