In machine learning (ML), if the situation when the model does not generalize well from the training data to unseen data is called overfitting. As you might know, it is one of the trickiest obstacles in applied machine learning.
The first step in tackling this problem is to actually know that your model is **overfitting_. _**That is where proper cross-validation comes in.
After identifying the problem you can prevent it from happening by applying regularization or training with more data. Still, sometimes you might not have additional data to add to your initial dataset. Acquiring and labeling additional data points may also be the wrong path. Of course, in many cases, it will deliver better results, but in terms of work, it is time-consuming and expensive a lot of the time.
That is where Data Augmentation (DA) comes in.
In this article we will cover:
Data Augmentation is a technique that can be used to artificially expand the size of a training set by creating modified data from the existing one. It is a good practice to use DA if you want to prevent overfitting, or the initial dataset is too small to train on, or even if you want to squeeze better performance from your model.
Let’s make this clear, Data Augmentation is not only used to prevent overfitting. In general, having a large dataset is crucial for the performance of both ML and Deep Learning (DL) models. However, we can improve the performance of the model by augmenting the data we already have. It means that Data Augmentation is also good for enhancing the model’s performance.
In general, DA is frequently used when building a DL model. That is why throughout this article we will mostly talk about performing Data Augmentation with various DL frameworks. Still, you should keep in mind that you can augment the data and for the ML problems as well.
You can augment:
We will focus on image augmentations as those are the most popular ones. Nevertheless, augmenting other types of data is as efficient and easy. That is why it’s good to remember some common techniques which can be performed to augment the data.
We can apply various changes to the initial data. For example, for images we can use:
For text there are:
#computer vision #deep learning #machine learning tools #data