Generative Adversarial Networks

Hi there, it's been a while from the last time a posted something.

Recently, during the last Easter holiday I was playing with Generative Adversarial Networks, a type of neural networks used to produce new data.

Basically a GAN is composed by 2 networks: a generator and a discriminator

Generator

the generator network takes as input a random initialized vector, often called latent vector end produce an image in the shape of (w,h,c) where w,h are the dimensions and c is the channel number, in case of 64x64 RGB image it would be (64,64,3).

Discriminator

The discriminator, or critic, is another NN that gets an image as input (same shape of the image produced by the generator here), and return the fakeness of the image.

In other words the two networks try to play a minmax game where the generator wants to maximize the correctness of the generated images therefore minimizing the correctness of the assertion of the discriminator.

In order to train the 2 networks we need to stack them together and use two different optimizations, one for the generator and one for the discriminator

for the training process we will :

create a random noise
feed the noise to the generator
get the generated image
mix generated images to real images, and label them
feed the images to the discriminator
get the result and apply the gradients for G and D

the following is the code I used to produce images of manga faces

colab available at
https://colab.research.google.com/drive/1a5VfzxBqBPW5tF5YKFqQeWwHWdL8v38l

from keras.optimizers import Adam, RMSprop

from shutil import copyfile
import os
from keras.preprocessing.image import load_img ,img_to_array
import matplotlib.pyplot as plt

import keras

import os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from skimage.io import imread
from skimage.transform import resize
from keras import layers
from keras import backend
from keras.layers import BatchNormalization
from keras.utils.vis_utils import plot_model
from keras.preprocessing import image
from IPython.display import clear_output
from tqdm import tqdm_notebook

latent_dim = 100
height = 64
width = 64
channels = 3
dimh = int(height/2)
dimw = int(width/2)

## generator here
generator_input = keras.Input(shape=(latent_dim,))

x = layers.Dense(128 * dimh * dimw)(generator_input)
x = layers.LeakyReLU()(x)
x = layers.Reshape((dimh, dimw, 128))(x)

x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)
generator = keras.models.Model(generator_input, x)
generator.summary()

## discriminator here

discriminator_input = layers.Input(shape=(height, width, channels))
x = layers.GaussianNoise(0.01)(discriminator_input)
x = layers.Conv2D(128, 3)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Flatten()(x)

x = layers.Dropout(0.4)(x)

x = layers.Dense(1, activation='sigmoid')(x)

discriminator = keras.models.Model(discriminator_input, x)
discriminator.summary()

## stacked gan

discriminator_optimizer = keras.optimizers.RMSprop(lr=0.00005)
discriminator.compile(
    optimizer=discriminator_optimizer, 
    loss="binary_crossentropy",
    metrics=["accuracy"])

discriminator.trainable = False

gan_input = keras.Input(shape=(latent_dim,))
gan_output = discriminator(generator(gan_input))
gan = keras.models.Model(gan_input, gan_output)

gan_optimizer = keras.optimizers.RMSprop(lr=0.00005)
gan.compile(
    optimizer=gan_optimizer, 
    loss="binary_crossentropy",
    metrics=["accuracy"])

gan.summary()


## data processing, this will read all the picture in the ./image dir and resize to the correct size for the NN

from skimage.transform import resize
data_train_gan = []
list_file = [os.path.join(dirpath,filename) for dirpath, _, filenames in os.walk('./images') for filename in filenames if filename.endswith('.jpg')]
for file_name in list_file[:]:
    img = imread(file_name)
    img = resize(img, (height,width))
    data_train_gan.append(np.array(img))
x_train = data_train_gan

## configure training parameters

iterations = 20000
batch_size = 100
save_dir = '.'
start = 0

# load the models here if you have pre-trained models

# gan = keras.models.load_model('animev2-gan.h5')
# generator = keras.models.load_model('animev2-gen.h5')
# discriminator = keras.models.load_model('animev2-disc.h5')

for step in tqdm_notebook(range(iterations)):
    random_latent_vectors = np.random.normal(size = (batch_size, latent_dim))
    generated_images = generator.predict(random_latent_vectors)
    stop = start + batch_size
    real_images = x_train[start: stop]
    combined_images = np.concatenate([generated_images, real_images])
    labels = np.concatenate([np.ones((batch_size,1)), 
                                    np.zeros((batch_size, 1))])
    labels += 0.05 * np.random.random(labels.shape)

    d_loss = discriminator.train_on_batch(combined_images, labels)

    random_latent_vectors = np.random.normal(size=(batch_size, 
                                                 latent_dim))
    misleading_targets = np.zeros((batch_size, 1))
    a_loss = gan.train_on_batch(random_latent_vectors, 
                              misleading_targets)
    start += batch_size

    if start > len(x_train) - batch_size:
        start = 0
 
    if step % 10 == 0:
        # every 10 steps show the generated images
        clear_output(wait=True)
        plt.imshow(real_images[0])
        plt.show()
        print("Epoch: %s" % step)
        print('discriminator loss:', d_loss)
        print('advesarial loss:', a_loss)
        fig, axes = plt.subplots(2, 2)
        fig.set_size_inches(10,10)
        count = 0
        for i in range(2):
            for j in range(2):
                axes[i, j].imshow(generated_images[count])
                axes[i, j].axis('off')
                count += 1
        plt.show()
    
    if step % 100 == 0:
        # Save the models here 
        gan.save('animev3-gan.h5')
        generator.save('animev3-gen.h5')
        discriminator.save('animev3-disc.h5')

        print('discriminator loss:', d_loss)
        print('advesarial loss:', a_loss)

you can download this dataset https://www.kaggle.com/soumikrakshit/anime-faces and extract the images in a folder called images

Here are some images generated by the NN above after 10 hrs of training

using the latest space learnt by the GAN you can then create nice interpolation animations like the one below.

This technique has a huge variety of applications, like data-augmentation, to generate new data like in the example below where it was used to generate new faces from the Labeled Faces in the Wild dataset.

But it also can be used to generate new design options ... once learned the vector space we can apply mathematical operations on the latent vectors to modify the produced images ... as an example we could generate a new design of car of type sedan and mixed with a car of type truck.

In the next post I will be talking about how to analyze the vector space to find directions that we can use to manipulate the latent vectors ... and how to produce mesmerizing interpolation videos.

Byeeee!