Transfer learning make use of the knowledge gained while solving one problem and applying it to a different but related problem. They utilize knowledge acquired for one task to solve related ones.
The project is divided into seven steps and I am going to walk you through each of these steps.
Dog Data Set
The dog data set is divided and loaded into train, validation and test groups. There are a total of 8351 images each belonging to one of the 133 dog breeds. The division for the train, validation and test set are made in the ratio of 80:10:10 .
Human Data Set
There are a total of 13233 human images loaded into an array.
In this step we make use of the Haar feature-based cascade classifiers for detecting the human face in the images. We make use of a pre-trained model from OpenCV, haarcascades for the face detection.
A sample output of the face detection model used in this project
We then assess this face detector by passing in images of human and dogs. The model is able to detect human faces with a probability of 100% however when we pass an image of the dog the model detects the face only 11% of time.
#data-science #data analysis
Recently, researchers from Google proposed the solution of a very fundamental question in the machine learning community — What is being transferred in Transfer Learning? They explained various tools and analyses to address the fundamental question.
The ability to transfer the domain knowledge of one machine in which it is trained on to another where the data is usually scarce is one of the desired capabilities for machines. Researchers around the globe have been using transfer learning in various deep learning applications, including object detection, image classification, medical imaging tasks, among others.
#developers corner #learn transfer learning #machine learning #transfer learning #transfer learning methods #transfer learning resources
Project walkthrough on Convolution neural networks using transfer learning
From 2 years of my master’s degree, I found that the best way to learn concepts is by doing the projects. Let’s start implementing or in other words learning.
Take an image as input and return a corresponding dog breed from 133 dog breed categories. If a dog is detected in the image, it will provide an estimate of the dog’s breed. If a human is detected, it will give an estimate of the dog breed that is most resembling the human face. If there’s no human or dog present in the image, we simply print an error.
Let’s break this problem into steps
For all these steps, we use pre-trained models.
Pre-trained models are saved models that were trained on a huge image-classification task such as Imagenet. If these datasets are huge and generalized enough, the saved weights can be used for multiple image detection task to get a high accuracy quickly.
For detecting humans, OpenCV provides many pre-trained face detectors. We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images.
### returns "True" if face is detected in image stored at img_path def face_detector(img_path): img = cv2.imread(img_path) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray) return len(faces) > 0
For detecting dogs, we use a pre-trained ResNet-50 model to detect dogs in images, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks.
from keras.applications.resnet50 import ResNet50 ### define ResNet50 model ResNet50_model_detector = ResNet50(weights='imagenet') ### returns "True" if a dog is detected def dog_detector(img_path): prediction = ResNet50_predict_labels(img_path) return ((prediction <= 268) & (prediction >= 151))
For classifying Dog breeds, we use transfer learning
Transfer learning involves taking a pre-trained neural network and adapting the neural network to a new, different data set.
To illustrate the power of transfer learning. Initially, we will train a simple CNN with the following architecture:
Train it for 20 epochs, and it gives a test accuracy of just 3% which is better than a random guess from 133 categories. But with more epochs, we can increase accuracy, but it takes up a lot of training time.
To reduce training time without sacrificing accuracy, we will train the CNN model using transfer learning.
#data-science #transfer-learning #project-based-learning #cnn #deep-learning #deep learning
Stuck behind the paywall? Click here to read the full story with my friend link!
According to dogtime.com, there are 266 different breeds of dogs, and by alone thinking about this number, it frightens me to distinguish them. And most of the people, if they’re normal, just know about 5–10 breeds because you don’t see the chapter “266 Different Dog Breeds” in a Bachelor’s Curriculum.
The main aim of this project is to build an algorithm to classify the different Dog Breeds from the dataset.
This seems like a simple task but when we think of Machine Learning, then it is not! The Images are in random order, having dogs at random spaces in the images, the images are shot in different lightenings, there is no preprocessing done on the data, it’s just a dataset containing simple dogs pictures.
So, the first step is to give the dataset a look.
The Dataset used for this project is Stanford Dogs Dataset. The Dataset contains a total of 20,580 images of 120 different dog breeds.
The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for the task of fine-grained image categorization.
import os import sys import keras import tarfile import numpy as np import tensorflow as tf import matplotlib.pyplot as plt from keras.models import Sequential from keras.engine.training import Model from sklearn.preprocessing import LabelBinarizer from keras.preprocessing.image import ImageDataGenerator from keras.layers import Add, Dropout, Flatten, Dense, Activation
I found 5 directories to be unusable and hence, didn’t used them. So, I imported a total of 115 Breeds.
import cv2 BASEPATH = './Images' LABELS = set() paths =  for d in os.listdir(BASEPATH): LABELS.add(d) paths.append((BASEPATH + '/' + d, d)) ## resizing and converting to RGB def load_and_preprocess_image(path): image = cv2.imread(path) image = cv2.resize(image, (224, 224)) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) return image X, y = ,  i = 0 for path, label in paths: i += 1 ## Faulty Directories if i == 18 or i == 23 or i == 41 or i == 49 or i == 90: continue if path == "./Images/.DS_Store": continue for image_path in os.listdir(path): image = load_and_preprocess_image(path + "/" + image_path) X.append(image) y.append(label)
Now, the names of the folder are in this pattern ‘n8725563753-Husky’, hence, we need to clean this up to be left with the _‘Husky’ _part of the name.
Y =  ## Cleaning the names of the directories/targets for i in y: Y.append(i.split('-'))
#transfer-learning #machine-learning #classification #deep-learning #convolutional-network
The term Transfer Learning refers to the leverage of knowledge gained by a Neural Network trained on a certain (usually large) available dataset for solving new tasks for which few training examples are available, integrating the existing knowledge with the new one learned from the few examples of the task-specific dataset. Transfer Learning is thus commonly used, often together with other techniques such as Data Augmentation, in order to address the problem of lack of training data.
But, in practice, how much can Transfer Learning actually help, and how many training examples do we really need in order for it to be effective?
In this story, I try to answer these questions by applying two Transfer Learning techniques (e.g. Feature Extraction and Fine-Tuning) for addressing an Image Classification task, varying the number of examples on which the models are trained in order to see how the lack of data affects the effectiveness of the adopted approaches.
The task chosen for experimenting Transfer Learning consists of the classification of flower images into 102 different categories. The choice of this task is mainly due to the easy availability of a flowers dataset, as well as to the domain of the problem, which is generic enough to be suitable for effectively applying Transfer Learning with neural networks pre-trained on the well-known ImageNet dataset.
The adopted dataset is the 102 Category Flower Dataset created by M. Nilsback and A. Zisserman , which is a collection of 8189 labelled flowers images belonging to 102 different classes. For each class, there are between 40 and 258 instances and all the dataset images have significant scale, pose and light variations. The detailed list of the 102 categories together with the respective number of instances is available here.
Figure 1: Examples of images extracted from the 102 Category Dataset.
In order to create training datasets of different sizes and evaluate how they affect the performance of the trained networks, the original set of flowers images is split into training, validation and test sets several times, each time adopting different split percentages. Specifically, three different training sets are created (that from now on will be referred to as the Large, Medium and Small training sets) using the percentages shown in the table below.
Table 1: number of examples and split percentages (referred to the complete unpartitioned flowers dataset) of the datasets used to perform the experiments.
All the splits are performed adopting stratified sampling, in order to avoid introducing sampling biases and ensuring in this way that all the obtained training, validation and test subsets are representative of the whole initial set of images.
The image classification task described above is addressed by adopting the two popular techniques that are commonly used when applying Transfer Learning with pre-trained CNNs, namely Feature Extraction and Fine-Tuning.
Feature Extraction basically consists of taking the convolutional base of a previously trained network, running the target data through it and training a new classifier on top of the output, as summarized in the figure below.
Figure 2: Feature Extraction applied to a convolutional neural network: the classifiers are swapped while the same convolutional base is kept. “Frozen” means that the weighs are not updated during training.
The classifier stacked on top of the convolutional base can either be a stack of fully-connected layers or just a single Global Pooling layer, both followed by Dense layer with softmax activation function. There is no specific rule regarding which kind of classifier should be adopted, but, as described by Lin et. al , using just a single Global Pooling layer generally leads to less overfitting since in this layer there are no parameters to optimize.
Consequently, since the training sets used in the experiments are relatively small, the chosen classifier only consists of a single Global Average Pooling layer which output is fed directly into a softmax activated layer that outputs the probabilities for each of the 102 flowers categories.
During the training, only the weights of the top classifiers are updated, while the weights of the convolutional base are “frozen” and thus kept unchanged.
In this way, the shallow classifier learns how to classify the flower images into the possible 102 categories from the off-the-shelf representations previously learned by the source model for its domain. If the source and the target domains are similar, then these representations are likely to be useful to the classifier and the transferred knowledge can thus bring an improvement to its performance once it is trained.
Fine-Tuning can be seen as a further step than Feature Extraction that consists of selectively retraining some of the top layers of the convolutional base previously used for extracting features. In this way, the more abstract representations of the source model learned by its last layers are slightly adjusted to make them more relevant for the target problem.
This can be achieved by unfreezing some of the top layers of the convolutional base, keeping frozen all its other layers and jointly training the convolutional base with the same classifier previously used for Feature Extraction, as represented in the figure below.
Figure 3: Feature Extraction compared to Fine-Tuning.
It is important to point out that, according to F. Chollet, the top layers of a pre-trained convolutional base can be fine-tuned only if the classifier on top of it has already been previously trained. The reason is that if the classifier was not already trained, then its weights would be randomly initialized. As a consequence, the error signal propagating through the network during training would be too large and the unfrozen weights would be updated disrupting the abstract representations previously learned by the convolutional base.
#deep-learning #machine-learning #artificial-intelligence #image-classification #transfer-learning #deep learning
This is the third part of a series of posts showing the improvements in NLP modeling approaches. We have seen the use of traditional techniques like Bag of Words, TF-IDF, then moved on to RNNs and LSTMs. This time we’ll look into one of the pivotal shifts in approaching NLP Tasks — Transfer Learning!
The complete code for this tutorial is available at this Kaggle Kernel
The idea of using Transfer Learning is quite new in NLP Tasks, while it has been quite prominently used in Computer Vision tasks! This new way of looking at NLP was first proposed by Howard Jeremy, and has transformed the way we looked at data previously!
The core idea is two-fold — using generative pre-trained Language Model + task-specific fine-tuning was first explored in ULMFiT (Howard & Ruder, 2018), directly motivated by the success of using ImageNet pre-training for computer vision tasks. The base model is AWD-LSTM.
A Language Model is exactly like it sounds — the output of this model is to predict the next word of a sentence. The goal is to have a model that can understand the semantics, grammar, and unique structure of a language.
ULMFit follows three steps to achieve good transfer learning results on downstream language classification tasks:
fast.ai’s motto — Making Neural Networks Uncool again — tells you a lot about their approach ;) Implementation of these models is remarkably simple and intuitive, and with good documentation, you can easily find a solution if you get stuck anywhere. Along with this, and a few other reasons I elaborate below, I decided to try out the fast.ai library which is built on top of PyTorch instead of Keras. Despite being used to working in Keras, I didn’t find it difficult to navigate fast.ai and the learning curve is quite fast to implement advanced things as well!
In addition to its simplicity, there are some advantages of using fast.ai’s implementation -
Weight update for Stochastic Gradient Descent (SGD). ∇θ(ℓ)J(θ) is the gradient of Loss Function with respect to θ(ℓ). η(ℓ) is the learning rate of the ℓ-th layer.
#nlp #machine-learning #transfer-learning #deep-learning #sentiment-classification #deep learning