Generative Adversarial Networks for Images and NLP

Blog / Generative Adversarial Networks for Images and NLP

Generative Adversarial Networks for Images and NLP

Introduction to Generative Adversarial Networks :

Proposed by Ian Goodfellow in 2014, Generative Adversarial Networks (GANs) are an innovative deep learning framework which makes use of Artificial Neural Networks (ANNs) to learn the underlying distribution of a given dataset. From this distribution, the Generative Adversarial Network is then able to generate an infinite number of new samples which are conceptually indistinguishable from the samples that were present in the training dataset. In this retrospect, the innovative concept of Generative Adversarial Networks is often used to solve tasks within the field of generative modelling. In general, this task can be subdivided into the following two sub-tasks:

  1. Probability density function (pdf) estimation given a dataset containing a large collection of data samples, determine the probability density function that is associated with this dataset.
  2. Sample generation: given the probability density function of the dataset, generate new data samples by sampling from this distribution.

Generative Adversarial Networks are able to perform generative modelling tasks by making two agents — which are usually Deep Artificial Neural Networks — compete in what is called a zero-sum game: a term that is commonly used within economics to represent a process in which the losses and gains of all participants are equal to each other

The objectives of these two agents can be considered as being opposite to each other. The first Artificial Neural Network — usually referred to as the Generative Model G — aims at learning the underlying data distribution of the given dataset, and — from this distribution — generate new samples which are indistinguishable from those of the real dataset. On the other hand, the second Artificial Neural Network — often referred to as the Discriminative model D — aims at detecting perturbations in these generated samples, with the goal of being able to classify samples as either being from the real dataset or been generated by the generative model G. This process is visualized schematically in Figure 1.

Image for post

Figure 1: Schematic overview of the computational procedure used in Generative Adversarial Networks

Training Generative Adversarial Networks: A theoretical Framework :

In order to train a Generative Adversarial Network, one first needs to define the data distributions pz and pg. Here, pz represents the data distribution of the noise input, and pg represents the generator’s data distribution over the real data samples x. The Generator G acts as a mapping function G (z; ϴg), which transforms random noise z to the data space — that is: samples which resemble the ones from the training set. Since the Generator G is represented by an Artificial Neural Network, ϴg represents the network’s optimizable parameters. Alternatively, the Discriminator D can be considered as a mapping function D (x; ϴd), which transforms a sample from the data space to a scalar value. Here, the scalar value represents the probability that the input sample x originates from the real dataset, rather than from the Generative network G.

During a GANs training process, the Discriminator aims at maximizing the probability of correctly assigning the output label — i.e., originating from the real dataset x or originating from the Generator distribution pg — to the inputted samples. On the other hand, the Generator aims at fooling the Discriminator by creating increasingly better samples, which can be done by minimizing

. Mathematically, this allows one to represent the entire training procedure of a generative adversarial network as a mini-max game between the Generator G and the Discriminator D in the following way:

In essence, this results in the loss function of a minimax GAN (also referred to as a Vanilla GAN) to be proportional to the Jensen-Shannon Divergence (JS- divergence), which is a commonly used measure for quantifying the similarity between two probability density functions. By minimizing the JS-divergence, one minimizes the difference between the probability density function of the real dataset x and the Generator distribution pg. Whereas regular minimax Generative Adversarial Networks have proven to be a powerful tool for multiple applications, their training procedure is widely considered as being unstable and highly unreliable. This training difficulty can be attributed to two problems:

1. Vanishing Gradients Problem

In general, optimizing the discriminator part of a GAN is relatively easy, as this comes down to training a simple binary classification Artificial Neural Network. This leaves one to optimize the generative part of the Generative Network, which comes down to minimizing the JS-divergence between px and pg — as discussed above. This can be solved as a simple gradient-descent problem whenever the distributions px and pg are partly overlapping. However, when the samples created by the generator part exhibit a distribution that is far from that of the real data distribution, the JS-divergence between the two distributions will increase, causing the gradient to become zero. This results in the generator part to become untrainable, preventing the GAN from further improving its generated samples.

2. Mode Collapse

Another common problem that occurs during the training of Generative Adversarial Networks is mode collapse. Usually, the prime objective of training a Generative Adversarial Network is to enable it to generate a wide variety of samples which resemble those from the original dataset. As discussed in the previous section, this is done by training the generator in such a way that it learns to fool the discriminator network. However, when the discriminator has not been trained properly yet, it might happen that the generator finds what is called an ‘optimal sample’: a sample that is able to fully fool the discriminator network. This will cause the generator to keep generating the same sample over and over, causing the training procedure of both the generator and discriminator to stall.

One of the solutions that was proposed to alleviate regular minimax GANs from the vanishing gradient problem was to add a continuous stream of random noise to the generator’s input. This causes the distribution pg to spread out, therefore increasing the probability of overlapping with px and reducing the vanishing gradient problem. However, this technique often results in the added noise still being present in newly generates samples (e.g., noisy pixels in image data), thereby reducing the quality of the GAN’s output.

Alternatively, researchers have proposed an innovative way in training Generative Adversarial Networks by making use of the Wasserstein distance. Their proposed solution — referred to as a Wasserstein Generative Adversarial Network (WGANs) — rejects the use of the JS-divergence due to the commonly occurring vanishing gradient problem. Instead, it uses the Wasserstein distance — commonly referred to as the the earth mover’s (EM) distance — to determine the distance between px and pg:

The Wasserstein distance W between two probability density functions represents the cost that is associated with transforming one of the probability density functions into the shape of the other by using the conditions of optimal transport. Therefore, the Wasserstein distance can be considered as a metric for determining the difference between two probability density functions.

Replacing the JS-divergence by the Wasserstein distance has multiple benefits. First, the Wasserstein distance has a smooth gradient when compared to the JS-divergence, which drastically reduces the vanishing gradients problem. This allows generator network to learn during each iteration, regardless of its performance. Second, due to the smooth gradient of the Wasserstein distance, a Wasserstein GAN does not require to add additional noise onto the generator’s input. This results in the newly generated samples to be free from any noise or perturbations.

Generative Adversarial Networks for Natural Language Processing :

Generative Adversarial Networks have been particularly popular within the computer vision community to replicate outputs of so-called creative procedures such as drawings, photographs (images), and cartoons. Recently the innovative technology has enjoyed widespread interest from other industries as well, including the field of Natural Language Processing.

However, it soon became clear that using Generative Adversarial Networks within the scope of Natural Language Processing is increasingly more difficult compared to using them for Computer Vision applications. The main reason for this complexity is the nature of the data that is used within both fields. Whereas imagery data — being matrices of pixel intensity and color values — can be considered as continuous, text data is discrete. This makes it much more difficult to train Generative Adversarial Networks in a stable way, usually resulting in bad performance or low-quality outputs. However, even though Generative Adversarial Networks are hard to train for Natural Language Processing applications, the technology has recently shown promising results.

For example, researchers from Peking University have trained a Generative Adversarial Network to automatically generate text with a certain sentiment attached to it. Their GAN — which is called the SentiGAN — uses multiple generator networks (instead of one) and a multi-class discriminator. Using this innovative setup results in better output quality and solves the problem of mode collapse — which has been discussed extensively in previous sections. Indeed, the results from this research shows that the SentiGAN was able to outperform most state-of-the-art generic text generation methods such as Variational Autoencoders and Recurrent Neural Network Language Modelling (RNNLM).

Conclusion :

Yann LeCun — Director of Artificial Intelligence Research at Facebook — called Generative Adversarial Networks the most interesting machine learning topic from the last decade. Indeed, whereas being relatively new within the field of Artificial Intelligence and Machine Learning, Generative Adversarial Networks have sparked the interest of the worldwide computer scientists all over the world. Although they had initially been developed for generating imagery data, the technology has recently started to find applications in other fields of computer science as well — including the field of Natural Language Processing. With increased interests from researchers and the continuous discovery of new applications in different fields, Generative Adversarial Networks — and generative modelling in general — still has a lot in store for data scientists and AI-practitioners in the upcoming years.

Post your comment