Exploring Generative Adversarial Networks (GANs)

A generative adversarial network (GAN) is a powerful approach to machine learning (ML). At a high level, a GAN is simply two neural networks that feed into each other. One produces increasingly accurate data while the other gradually improves its ability to classify such data.

In this blog we’ll dive a bit deeper into how this mechanism works, where you might use it, and the potential ramifications of using a GAN in the real world.

A GAN in a Nutshell

The first neural network in a GAN is called the generator. It starts with random input1, and repeatedly generates data that approaches the quality of real-world data. It does this by sending its output to another neural network, the discriminator, which gradually improves its ability to classify that output from training data, and feeds its output (classification) back to the generator. From an implementation standpoint, the generator and discriminator have their own loss functions, where the loss function of the discriminator contains the loss function of the generator. The output (classification) is included in the loss functions, where the generator gets its weights updated through back propagation during training.

This design is illustrated in the following flow chart:

A generative adversarial network (GAN) is a powerful approach to machine learning (ML) flow chart.

The adversarial aspect of a GAN is that the discriminator’s results may be fed back into itself for self improvement, and/or back into the generator to improve the generator’s output. In this sense, the ability for the generator to improve its output, somewhat competes with the discriminator’s ability to classify data as training progresses. Moreover, the automatic training of a generative model (i.e., the generator) by the discriminator, means the GAN effectively turns an otherwise unsupervised ML process into an automated, supervised ML process.

To support such functionality, the generator is commonly built using an inverse convolutional neural network (sometimes called a deconvolutional network), because of that neural network’s ability to generate data (e.g., upsampling feature maps to create new images). The discriminator is often built using a regular CNN because of its ability to break data (e.g., images) down into feature maps, and to ultimately classify data (e.g., to determine if an image is real or fabricated). Note that GANs can also be built using other types of neural network layers.

At the end of the training process, an ML practitioner might use the fully-trained generator, discriminator, or both components for inference, depending on what real-world problem they are trying to solve.

The “Hello World” of GANs

In the context of GANs, a good “hello world”2 project can be created around the MNIST dataset, a library of images with handwritten digits ranging from 0 through 9. Users who are learning neural networks for the first time, often use this dataset as input, to tackle the problem of classifying the digits represented in those images.

Thus this problem can be further extended as a starting point for learning about GANs. Here, the goal is to gradually generate new images of handwritten digits which approach or even match the quality and style of those in the MNIST dataset, while also increasing the ability to classify whether a given image was generated by the GAN or is, in fact, a real-world image. The GAN for such a problem would look as follows:

GAN real world image flow chart. Generated image to discriminator to real.

The generator is seeded with random noise (data) and generates an image of a handwritten digit. At this point the output is probably pretty bad since the random noise likely doesn’t reflect a handwritten digit very well. This output is then fed to the discriminator along with images from the MNIST dataset (the training data). The discriminator in this example is binomial, classifying a given image from the generator as a real-world image or fake (generated) image.

The generator’s output, along with the discriminator’s classification, are then recursively fed back into the generator to repeat the process and hopefully improve both generator’s next output. At this point the discriminator may also feed its output back into itself, along with more training data, to further improve its ability to classify images.

Training a GAN can take a long time, on the order of a few hours to even days, depending on the data, compute resources available, and the level of accuracy that the ML practitioner is trying to achieve. An idealized case is to train until the discriminator incorrectly classifies the image around 50% of the time, at which point, ML practitioners often assume that the generator is outputting plausible data. However, ML practitioners may train to different levels of accuracy depending on their needs.

#machine-learnin #ai #algorithms

towardsdatascience.com

Exploring Generative Adversarial Networks (GANs)