Data Augmentation in Deep Learning

Whenever you build and train a model for a machine learning task, regardless of its being a classification or regression one, your final goal is to make reliable predictions on new, never seen before input data. In other words, you want your model to generalize well on new data.

To achieve this goal, you have to prevent your model from being either excessively adjusted to training data (overfitted), or not capable of capturing pattern in data at all (underfitted).

Image for post

Note that the concepts of overfitting and underfitting are strictly related to the notion of bias-variance trade-off.

In this article, I’m going to dwell on the problem of overfitting and how to deal with it.

Understanding the cause and a possible remedy of overfitting

One of the reasons why overfitting might occur is the lack of data. Indeed, if you are training your model on too few data, it will try to exasperate its extraction of features from the training data, with the risk of identifying patterns that do not exist.

However, it often happens that available data are very few and that is all we can have. Namely, imagine a manufacturing company that wants to examine snapshots of its machinery with the goal of classifying them as “healthy” or “at risk of breakdown”. To train its algorithm (let’s say, a convolutional neural network, CNN) the company will need a bunch of pre-labeled images. The procedure of data collection will need time, but what if the company wants to accelerate the process, starting from a small sample of images? Well, rather than waiting for new images to come, the company could use the available data and derive new images from them, in such a way that each “new image” is created consistently with respect to the existing ones.

This process is called data augmentation and it is extremely powerful in terms of the increase of accuracy of the model. In the next paragraphs, we are going to see different types of data augmentation for image data, plus their implementation with Keras.

My inspirational muse for this activity will be a majestic golden retriever:

Image for post

#computer-vision #data-augmentation #keras #deep-learning #deep learning

Understanding the cause and a possible remedy of overfitting

medium.com

Data Augmentation in Deep Learning