Background and challenges πŸ“‹

In a modern deep learning algorithm, the dependence on manual annotation of unlabeled data is one of the major limitations. To train a good model, usually, we have to prepare a vast amount of labeled data. In the case of a small number of classes and data, we can use the pre-trained model from the labeled public dataset and fine-tune a few last layers with your data. However, in real life, it’s easily faced with the problem when your data is considerably large (the products in the store or the face of a human,…) and it will be difficult for the model to learn with just a few trainable layers. Furthermore, the amount of unlabeled data (e.g. document text, images on the Internet) is uncountable. Labeling all of them for the task is almost impossible but not utilizing them is definitely a waste.

In this case, training a deep model again from scratch with a new dataset will be an option but it takes a lot of time and effort for labeling data while using a pre-trained deep model seems no longer helpful. That is the reason why Self-supervised learning was born. The idea behind this is simple, which serves two main tasks:

  • **Surrogate task: **the deep model will learn generalizable representations from unlabeled data without annotation, and then will be able to self-generate a supervisory signal exploiting implicit information.
  • **Downstream task: **representations will be fine-tuned for supervised-learning taskse.g. classification and image retrieval with less number of labeled data (the number of labeled data depending on the performance of model based on your requirement)

There are much different training approaches proposed to learn such representations: **Relative position [1]: **themodel needs to understand the spatial context of objects to tell the relative position between parts; **Jigsaw puzzle [2]: **the model needs to place 9 shuffled patches back to the original locations; Colorization [3]: the model has trained to color a grayscale input image; precisely the task is to map this image to a distribution over quantized color value outputs; **Counting features [4]: **The model learns a feature encoder using feature counting relationship of input images transforming by _Scaling_and_Tiling; _**SimCLR [5]: **The model learns representations for visual inputs by maximizing agreement between differently augmented views of the same sample via a contrastive loss in the latent space.

However, I would like to introduce one interesting approach that is able to recognize things like a human. The key factor in human learning is the acquisition of new knowledge by comparing relating and different entities. So, it is a nontrivial solution if we can apply a similar mechanism in self-supervised machine learning via the Relational reasoning approach [6].

The relational reasoning paradigm is based on a key design principle: the use of a relation network as a learnable function on the unlabeled dataset to quantify the relationships between views of the same object (intra-reasoning) and relationships between different objects in different scenes (inter-reasoning). The possibility to exploit a similar mechanism in self-supervised machine learning via relational reasoning was evaluated by the performance on standard datasets (CIFAR-10, CIFAR-100, CIFAR-100–20, STL-10, tiny-ImageNet, SlimageNet), learning schedule, and backbones (both shallow and deep). The results show that the Relational reasoning approach largely outperforms the best competitor in all conditions by an average 14% accuracy and the most recent state-of-the-art method by 3% indicating in this paper [6].

#deep-learning #computer-vision #machine-learning #self-supervised-learning #data-science #machine learning

Train without labeling data using Self-Supervised Learning by Relational Reasoning
2.90 GEEK