Generative deep learning (part 2/2)
This post goes through the last 2 examples of DLP Ch 8. They demonstrate how image generation in DL works and the approach can be extended to sound and text. The main idea is to develop a low-dimensional latent space of representations where a DL model can take any point as the input and output a new image. This type of DL model is called a decoder or a generator.
Example 4 uses variational autoencoder (VAE), an example of decoder. Example 5 uses generative adversarial network (GAN), an example of generator. VAE is good at learning continuous and well-structured latent spaces where every direction encodes a meaningful line of variation in the data. GAN is good at creating highly realistic-looking images, but its latent space is less continuous and structured.
Ex 4: Variational Autoencoder (VAE)
This example trains an VAE on MNIST dataset and creates new images that are the variations of the original images. Below is a flow chart showing how the VAE algorithm works. Python code and output can be found here.
A sample of our output images shows a continuous distribution of different digits gradually transforming into each other. Also, each direction is a meaningful variation (e.g., one-ness, nine-ness, etc.)
Ex 5: Generative Adversarial Network (GAN)
A GAN consists of 2 parts: a generator and a discriminator. The generator takes a random point in the latent space and creates a new output image. The discriminator takes an image (either new output or initial input) and classifies it either from the initial input group or created by the generator.
The generator is trained to produce images that can fool the discriminator while the discriminator constantly adapts to the improving generator. This is a dynamic model where the goal is no longer to minimize a specific loss, but to reach an equilibrium between the two parts. To an economist, this sounds all too familiar.
In this particular example, we train a GAN on CIFAR10 frog images and tries to create a fake frog image that looks real. A <i>deep convolutional GAN</i> (DCGAN) is used, meaning both generator and discriminator are deep convnets. Python code can be found here.
The posted code only runs the model for 1,000 iterations to test it out. I also ran the model for 10,000 iterations and below is one of the fake frogs created by the model (on the right). For comparison, a real frog is shown on the left.
There are 5 heuristic tricks listed in the book that can help train a GAN:
- Use tanh as the last activation in the generator
- Sample points from the latent space using a normal distribution
- Introduce randomness (e.g., dropout and random noise to the labels in the discriminator)
- Avoid sparse gradients (e.g., use strided convolutions and LeakyReLU instead of max pooling and ReLU)
- Use kernel size that’s divisible by the stride size whenever applicable in both generator and discriminator