Generative Adversarial Networks

What Are GANs?

Generative adversarial networks (GANs) are a type of deep neural network used to generate synthetic images. The architecture comprises two deep neural networks, a generator and a discriminator, which work against each other (thus, “adversarial”). The generator generates new data instances, while the discriminator evaluates the data for authenticity and decides whether each instance of data is “real” from the training dataset, or “fake” from the generator.

Together, the generator and discriminator are trained to work against each other until the generator is able to create realistic synthetic data that the discriminator can no longer determine is fake. After successful training, the data produced by the generator can be used to create new synthetic data, for potential use as input to other deep neural networks.

GANs are versatile in that they can learn to generate new instances of any datatype, such as synthetic images of faces, new songs in a certain style, or text of a specific genre.

Training a GAN

Using an example of creating synthetic images of money, let’s walk through the specific parts and functions of a GAN architecture.

Noise is fed into the generator. Since the generator hasn’t been trained yet, the output will look like noise in the beginning.

Showing GAN architecture, and input and output of untrained GAN.

Training data and the output of the generator is sent to the discriminator, which is being trained in parallel to identify real/fake images. The output of the discriminator at the beginning will not be very accurate as this portion of the network is also being trained and accuracy will improve over time.

Showing GAN architecture, and input and output of GAN during training.

Feedback: The output of the discriminator can be fed back to the generator and the discriminator, which can use this information to update parameters and attempt to improve on the accuracy.

Showing GAN architecture without feedback, and input and output of GAN during training.

The goal of the discriminator, when shown an instance from the true dataset, is to recognize those images that are authentic. Meanwhile, the generator is creating new, synthetic images that it passes to the discriminator. It does so in the hopes that they, too, will be deemed authentic, even though they are fake. The goal of the generator is to generate passable images: to lie without being caught. The goal of the discriminator is to identify images coming from the generator as fake.

MATLAB^® and Deep Learning Toolbox™ let you build GANs network architectures using automatic differentiation, custom training loops, and shared weights.

Applications of Generative Adversarial Networks

Handwriting generation: As with the image example, GANs are used to create synthetic data. This can be used to supplement smaller datasets that need more examples of data in order to train accurate deep learning models. One example is handwriting detection: in order to train a deep neural network on handwriting, thousands of samples of training data are needed, and to collect this data manually may be time-intensive.

Handwritten digits from 0 to 9, generated using a GAN. — Synthetic handwriting generation using GANs.

Scene generation: Conditional GANs are a specific type of GAN takes advantage of labels, while the original GAN does not assume labels will be present. Conditional GANs can be used in applications such as scene generation, where there must be a certain organization to the information. Take the example of scene generation for automated driving. The road and sidewalk must be located below the buildings and sky. A synthetic image created for this example that does not adhere to the location of the road will immediately be determined as fake and unusable in an automated driving application.

Image to image translation of road and sidewalk for automated driving using a pix2pix conditional GAN. — Image-to-image translation (pix2pix) using conditional GANs.

Audio and Speech Applications: GANs are also used for applications such as text-to-speech synthesis, voice conversion, and speech enhancement. GANs provide significant advantage over traditional audio and speech implementations as they can generate new samples rather than simply augment existing signals. One example in which GANs are used for sound synthesis is to create synthetic version of drum sounds: Train Generative Adversarial Network (GAN) for Sound Synthesis

Note: GANs can be powerful in generating new synthetic data for use in many applications, yet it is often challenging to arrive at accurate results due to many failure modes that may take place. MATLAB lets you monitor GAN training progress and identify common failure modes.

Examples and How To

Train Generative Adversarial Network (GAN) - Example
Train Conditional Generative Adversarial Network (CGAN) - Example
Generate Synthetic Signals Using CGAN - Example
Train GAN for Sound Synthesis - Example

Software Reference

Monitor GAN Training Progress and Identify Common Failure Modes - Documentation

Generative Adversarial Network (GAN) FAQs

GANs are a type of deep neural network used to generate synthetic data, comprising two networks—a generator and a discriminator—that work against each other in an adversarial process.

The generator creates new synthetic data instances while the discriminator evaluates whether each instance is real or fake. They train against each other until the generator produces realistic data that the discriminator can no longer identify as fake.

GANs can learn to generate new instances of any datatype, including synthetic images of faces, new songs in a certain style, or text of a specific genre.

"Adversarial" refers to the competitive relationship in which the generator tries to create convincing fake data without being caught, while the discriminator works to identify which images are fake.

Conditional GANs are a specific type of GAN that uses labels to organize information in a certain way, allowing control over generated output—for example, constraining scene layout in automated driving applications to ensure roads appear below buildings in scene generation for automated driving applications.

GANs are used for handwriting generation to supplement training datasets, scene generation for automated driving, and audio applications like text-to-speech synthesis, voice conversion, and speech enhancement.

GANs can generate thousands of synthetic data samples to supplement smaller datasets, eliminating the time-intensive process of manually collecting training data for deep learning models.

GANs may not arrive at accurate results due to many failure modes that may occur during training, requiring careful monitoring of training progress.

How to Design and Train Generative Adversarial Networks (GANs)

Online Course

Deep Learning Onramp

Get started