Recently, machine learning algorithms have contributed greatly to developing very efficient visual search workflows using Convolutional Neural Networks (CNNs). Since data volumes have been increased in recent times, object recognition models are able to recognize the objects and generalize the image features at scale. However, in a challenging classification setting where the number of classes is huge, there are several constraints one needs to address to design effective visual search workflows.

  • An increasing number of object categories also increases the number of weights in the penultimate layer of CNN. It makes it hard to deploy them on the device as they also end up increasing the model size.
  • When there exist only a_ few images per class_ it is difficult to achieve better convergence, making it hard to achieve good performance under various illumination differences, object scale, background, occlusion, etc.
  • In an engineering space, it is often required to design visual search workflows that are adaptive to the product ecosystem that contains non-stationary products or it changes as per seasonal trends or geographic locations. Such circumstances make it tricky to train/finetune model in _recurring _(training after certain time intervals) or online (training for realtime data) fashion.

Deep Metric Learning

Image for post

Fig.1 Given two images of a chair and table, the idea of metric learning is to quantify the image similarity using an appropriate distance metric. When we aim to differentiate the objects and not to recognize them, it improves the model scalability to a great extent as we are no longer dependent on a class given image belongs to.

To alleviate these issues, deep learning and metric learning collectively form the concept of Deep Metric Learning(DML), also known as Distance Metric Learning. It proposes to train a CNN based nonlinear feature extraction module (or an encoder), that embeds the extracted image features (also called embeddings) that are semantically similar, onto nearby locations while pushing dissimilar image features apart using an appropriate distance metric e.g. Euclidean or Cosine distance. Together with the discriminative classification algorithms, such as K-nearest neighbor, support vector machines, and Naïve Bayes, we can perform object recognition tasks using extracted image features without being conditioned about the number of classes. Note that, the discriminative power of such a trained CNN module describes features with both the compact intra-class variations and separable inter-class differences. These features are also generalized enough, even for distinguishing new unseen classes. In the following section, we will formalize the procedure to train and evaluate a CNN for DML using a pair based training paradigm.

#artificial-intelligence #machine-learning #metric-learning #convolutional-network #object-recognition

Learning To Differentiate using Deep Metric Learning
1.80 GEEK