Image augmentation is a technique for artificially adding more images to your image data to expand the data set. It is mostly used to add variety to the data set so that models don’t over-fit.

Some of the most common augmentation methods are flipping, rotating, and tweaking image properties like contrast, brightness, and color.

We shall apply all these techniques to a sample of image leaves from the Plant-Village Data Set. The data set is available here. Open the link while you are signed in to your Google account so that it’s available in the “Shared with me” folder of your Google Drive.

Setting Up the Workspace

To make the setup easy and fast, we’ll be hosting our notebook on Google Colab. For those new to Colab, it is a free development environment offered by Google which lets you use GPUs or TPUs for your modeling needs. It also comes with the Python Imaging Library (PIL) and OpenCV preinstalled, so it saves us the trouble of installing them as they are not shipped with the Anaconda distribution of Python. Using Colab also lets you use Google Drive to host your data, so you don’t have to download 4000+ photos of leaves to your local machine.

Once you open Colab, choose “New Notebook” to open a blank Jupyter Notebook. Then click on “Connect” at the top-right. Once you’ve connected, you’ll be able to see a green tick with the RAM and the disk utilization where “Connect” was earlier. Once this is done, go to “Runtime” from the menu bar, select “Change runtime type”, and then choose “GPU” from the Hardware accelerator drop-down. You can also uncheck the “Omit code cell output when saving this notebook” option if you want to save your notebook with outputs. Click on “Save”, and your GPU-hosted Jupyter Notebook is now ready!

We start by importing the libraries. We will use:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2
import glob
import datetime
import random
from tqdm.notebook import tqdm
from PIL import Image
from PIL import ImageEnhance
np.random.seed(1) #to have reproducible results 
pd.set_option('display.max_colwidth', None)

#python #data-science

Introduction to Image Augmentation in Python
1.50 GEEK