Do you see those pictures? They seem real but they aren’t. They have been generated by a computer. Let me explain how.
Generative modeling algorithms make the impressive task of generating new samples from a given distribution of a dataset. It is just like giving a sunset photograph database to an algorithm and having back a new sunset photograph but specifically different from the dataset.
A generative adversarial network, or GAN, is a generative model composed of two neuronal networks. A generator that learns to generate new plausible samples of a given distribution and a discriminator that aims to differentiate generated samples from real samples.
The generator and the discriminator are playing a game where they are against each other. They are adversaries (hence the term “adversarial”), the generator tries to fool the discriminator, and the latter is provided with both the real and the generated samples and it tries to distinguish them.
Based on those results, the generator adjusts its parameters to create new images. And so it goes, until the discriminator can no longer tell what’s genuine and what’s bogus.
In this article we will review the architecture of a basic GAN then we will open an overviewing window on the elegant mathematical theory behind its success before diving into real GANs applications that are truly astonishing.
GANs, a revolution
One night in 2014, Ian Goodfellow went drinking to celebrate with a fellow doctoral student who had just graduated. Some friends asked for his help with a thorny project. They were working on a computer that could create photos by itself. The plan Goodfellow’s friends were proposing was to use a complex statistical analysis of the elements that make up a photograph. This would have required a massive amount of number-crunching, and Goodfellow told them it simply wasn’t going to work.
As he pondered the problem over his beer, he hit on an idea. What if you pitted two neural networks against each other? His friends were skeptical, but he decided to give it a try. Goodfellow coded into the early hours and then tested his software. It worked the first time.
With this Taxonomy, we have an overview of methods to generate new samples from a dataset. By a simple look at this diagram, we can deduce in what way GANs are revolutionary :
- No need of Markov chains to do estimation as in Boltzmann Machines or GSNs.
- Asymptotically consistent contrary to Variational Autoencoder.
- No need for the transformation to be invertible such as the change of variables method.
Let’s go deeper into the subject, let’s talk about GAN’s architecture.
Architecture of GANs
The general architecture of a GAN is simple to understand. As we described the process in the beginning of the article a generative adversarial network is composed of a generator and a discriminator. The generator takes as input a latent random vector Z and outputs generated sample G(Z). We denote x a sample from the real dataset. We assume that X follows a probability distribution p_x which is the distribution of the sample from the dataset.
The goal of the GAN is to approach the distribution p_x with the distribution p_g which is the probability distribution of the sample generated G(Z). Both X and G(Z) are given to the discriminator which tries to predict if the sample is real or fake.
To better understand how the discriminator works, let’s consider Y a random variable that takes the value 1 if the image is real and 0 if the image is fake. The discriminator takes an element v in input and outputs the probability that Y=1 given v. So, D(v)= P(Y=1|v). The discriminator estimates the probability that a given input belongs to the real database.
From now we will denote:
- D(X) is the prediction of the discriminator for the real sample.
- D(G(Z)) is the prediction of the discriminator for the generated sample.
Before going further into the architecture of the generator and the discriminator, let’s talk about the loss function of the model.
We can find out the loss function of the generative adversarial network on the original paper of Ian Goodfellow.
The problem is viewed as a minmax game, which the solution is a Nash equilibrium.
A Generator in GANs is a neural network that creates fake data to be trained on the discriminator. It learns to generate plausible data. The generated samples become negative training examples for the discriminator. It takes a fixed-length random vector carrying noise as input and generates a sample.
The main aim of the Generator is to make the discriminator classify its output as real. So, when D(G(z)) is close to 1, the generator has reached its goal. So, the generator tries to maximize D(G(z)), which is the same as minimizing 1- D(G(z)).
The backpropagation method is used to adjust each weight in the right direction by calculating the weight’s impact on the output. It is also used to obtain gradients and these gradients can help change the generator weights.
The Discriminator is a neural network that identifies real data from the fake data created by the Generator.
In the process of training the discriminator, the discriminator classifies both real data and fake data from the generator. The discriminator loss penalizes the discriminator for misclassifying a real data instance as fake or a fake data instance as real.
So, the discriminator tries to maximize D(x) and 1-D(G(z)), indeed as x is from the real dataset distribution D(x) should be close to 1 and as G(z) is the generated sample D(G(z)) should be close to 0.
The discriminator updates its weights through backpropagation from the discriminator loss through the discriminator network.
As a result, the model consists of training successively the generator and the discriminator, updating the parameters and keep doing this process until the model converges.
Unlike other deep learning neural network models that are trained with a loss function until convergence, there is no objective loss function used to train the GAN since the generator and the discriminator have their own loss function during training. In addition, the equilibrium is neither a min nor a max, it’s a Nash equilibrium. There is no way to objectively assess the progress of the training and the relative or absolute quality of the model from loss alone.
In the case of image generation, models must be evaluated using the quality of the generated synthetic images.
For other models, quantitative measures, such as the inception score and the Frechet inception distance, can be combined with qualitative assessment to provide a robust assessment of GAN models.
With the development of deep learning, data augmentation became a top issue in machine learning. Since deep learning models need a lot of data to be trained on. Generating new plausible samples was the application described in the original paper by Ian Goodfellow, et al. in 2014. However GANs can be used for multiple inputs different from images, such as text, sounds, voices, music, structured data like drug molecules.
Maybe you’ve heard about the generator of Human Faces with which you can generate a random face that doesn’t exist just in one click. Tero Karras, et al. in their 2017 paper titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation” demonstrate the generation of plausible realistic photographs of human faces. The face generations were trained on celebrity examples, meaning that there are elements of existing celebrities in the generated faces, making them seem familiar, but not quite.
GANs can be used for other purposes different from image synthesis, there are examples where synthesis is not a main goal.
It is the case for Image-to-Image Translation. This is a bit of a catch-all task, for those papers that present GANs that can do many images translation tasks.
Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” demonstrate GANs, specifically their pix2pix approach for many image-to-image translation tasks. Examples include translation tasks such as:
- Translation of semantic images to photographs of cityscapes and buildings.
- Translation of satellite photographs to Google Maps.
- Translation of photos from day to night.
- Translation of black and white photographs to color.
- Translation of sketches to color photographs.
After being introduced in 2014 with the article of Ian Goodfellow, GANs have known an impressive exponential growth in popularity within the machine learning community. GAN has a wide range of applications that is spreading beyond data augmentation to reach fields as diverse as high-resolution image, image inpainting, image super-resolution, visual manipulation, text-to-image synthesis, asset pricing, market simulation and so on.
If you are interested in artificial intelligence applications, take a look at our blog to learn more about Natural Langage Processing
Avijeet Biswal, « The Best Introduction to What Generative Adversarial Networks (GANs », 18 septembre 2021. https://www.simplilearn.com/tutorials/deep-learning-tutorial/generative-adversarial-networks-gans.
Eliot Brion, « Understanding the GAN cost function », 2018. https://eliottbrion.github.io/2018-06-13/understanding-the-GAN-value-function.
Jason Brownlee, « 18 Impressive Applications of Generative Adversarial Networks (GANs) », 14 juin 2019, sect. Generative Adversarial Networks. https://machinelearningmastery.com/impressive-applications-of-generative-adversarial-networks/.
Ian Goodfellow, « Generative Adversarial Nerwork », aout 2016. https://www.iangoodfellow.com/slides/2016-08-31-Berkeley.pdf.
Ian J. Goodfellow, Jean Pouget-Abadie?, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair†, Aaron Courville, Yoshua Bengio, « Generative Adversarial Nets », 10 juin 2014.
Karras, Tero?, Aila, Timo, Laine, Samuli, Lehtinen, Jaakko, « Progressive Growing of GANs for Improved Quality, Stability, and Variation », octobre 2017. https://arxiv.org/pdf/1710.10196.pdf.