Hi there, it's been a while from the last time a posted something.

Recently, during the last Easter holiday I was playing with Generative Adversarial Networks, a type of neural networks used to produce new data.

Basically a GAN is composed by 2 networks: a generator and a discriminator


the generator network takes as input a random initialized vector, often called latent vector end produce an image in the shape of (w,h,c) where w,h are the dimensions and c is the channel number, in case of 64x64 RGB image it would be (64,64,3).


The discriminator, or critic, is another NN that gets an image as input (same shape of the image produced by the generator here), and return the fakeness of the image.

In other words the two networks try to play a minmax game where the generator wants to maximize the correctness of the generated images therefore minimizing the correctness of the assertion of the discriminator.

In order to train the 2 networks we need to stack them together and use two different optimizations, one for the generator and one for the discriminator

for the training process we will :

  • create a random noise
  • feed the noise to the generator
  • get the generated image
  • mix generated images to real images, and label them
  • feed the images to the discriminator
  • get the result and apply the gradients for G and D

the following is the code I used to produce images of manga faces

colab available at

from keras.optimizers import Adam, RMSprop

from shutil import copyfile  
import os  
from keras.preprocessing.image import load_img ,img_to_array  
import matplotlib.pyplot as plt

import keras

import os  
import tensorflow as tf  
import numpy as np  
import matplotlib.pyplot as plt  
from skimage.io import imread  
from skimage.transform import resize  
from keras import layers  
from keras import backend  
from keras.layers import BatchNormalization  
from keras.utils.vis_utils import plot_model  
from keras.preprocessing import image  
from IPython.display import clear_output  
from tqdm import tqdm_notebook

latent_dim = 100  
height = 64  
width = 64  
channels = 3  
dimh = int(height/2)  
dimw = int(width/2)

## generator here
generator_input = keras.Input(shape=(latent_dim,))

x = layers.Dense(128 * dimh * dimw)(generator_input)  
x = layers.LeakyReLU()(x)  
x = layers.Reshape((dimh, dimw, 128))(x)

x = layers.Conv2D(256, 5, padding='same')(x)  
x = layers.LeakyReLU()(x)

x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)  
x = layers.LeakyReLU()(x)

x = layers.Conv2D(256, 5, padding='same')(x)  
x = layers.LeakyReLU()(x)  
x = layers.Conv2D(256, 5, padding='same')(x)  
x = layers.LeakyReLU()(x)

x = layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)  
generator = keras.models.Model(generator_input, x)  

## discriminator here

discriminator_input = layers.Input(shape=(height, width, channels))  
x = layers.GaussianNoise(0.01)(discriminator_input)  
x = layers.Conv2D(128, 3)(x)  
x = layers.LeakyReLU()(x)  
x = layers.Conv2D(128, 4, strides=2)(x)  
x = layers.LeakyReLU()(x)  
x = layers.Conv2D(128, 4, strides=2)(x)  
x = layers.LeakyReLU()(x)  
x = layers.Conv2D(128, 4, strides=2)(x)  
x = layers.LeakyReLU()(x)  
x = layers.Flatten()(x)

x = layers.Dropout(0.4)(x)

x = layers.Dense(1, activation='sigmoid')(x)

discriminator = keras.models.Model(discriminator_input, x)  

## stacked gan

discriminator_optimizer = keras.optimizers.RMSprop(lr=0.00005)  

discriminator.trainable = False

gan_input = keras.Input(shape=(latent_dim,))  
gan_output = discriminator(generator(gan_input))  
gan = keras.models.Model(gan_input, gan_output)

gan_optimizer = keras.optimizers.RMSprop(lr=0.00005)  


## data processing, this will read all the picture in the ./image dir and resize to the correct size for the NN

from skimage.transform import resize  
data_train_gan = []  
list_file = [os.path.join(dirpath,filename) for dirpath, _, filenames in os.walk('./images') for filename in filenames if filename.endswith('.jpg')]  
for file_name in list_file[:]:  
    img = imread(file_name)
    img = resize(img, (height,width))
x_train = data_train_gan

## configure training parameters

iterations = 20000  
batch_size = 100  
save_dir = '.'  
start = 0

# load the models here if you have pre-trained models

# gan = keras.models.load_model('animev2-gan.h5')
# generator = keras.models.load_model('animev2-gen.h5')
# discriminator = keras.models.load_model('animev2-disc.h5')

for step in tqdm_notebook(range(iterations)):  
    random_latent_vectors = np.random.normal(size = (batch_size, latent_dim))
    generated_images = generator.predict(random_latent_vectors)
    stop = start + batch_size
    real_images = x_train[start: stop]
    combined_images = np.concatenate([generated_images, real_images])
    labels = np.concatenate([np.ones((batch_size,1)), 
                                    np.zeros((batch_size, 1))])
    labels += 0.05 * np.random.random(labels.shape)

    d_loss = discriminator.train_on_batch(combined_images, labels)

    random_latent_vectors = np.random.normal(size=(batch_size, 
    misleading_targets = np.zeros((batch_size, 1))
    a_loss = gan.train_on_batch(random_latent_vectors, 
    start += batch_size

    if start > len(x_train) - batch_size:
        start = 0

    if step % 10 == 0:
        # every 10 steps show the generated images
        print("Epoch: %s" % step)
        print('discriminator loss:', d_loss)
        print('advesarial loss:', a_loss)
        fig, axes = plt.subplots(2, 2)
        count = 0
        for i in range(2):
            for j in range(2):
                axes[i, j].imshow(generated_images[count])
                axes[i, j].axis('off')
                count += 1

    if step % 100 == 0:
        # Save the models here 

        print('discriminator loss:', d_loss)
        print('advesarial loss:', a_loss)

you can download this dataset https://www.kaggle.com/soumikrakshit/anime-faces and extract the images in a folder called images

Here are some images generated by the NN above after 10 hrs of training

using the latest space learnt by the GAN you can then create nice interpolation animations like the one below.

This technique has a huge variety of applications, like data-augmentation, to generate new data like in the example below where it was used to generate new faces from the Labeled Faces in the Wild dataset.

But it also can be used to generate new design options ... once learned the vector space we can apply mathematical operations on the latent vectors to modify the produced images ... as an example we could generate a new design of car of type sedan and mixed with a car of type truck.

In the next post I will be talking about how to analyze the vector space to find directions that we can use to manipulate the latent vectors ... and how to produce mesmerizing interpolation videos.


—Read This Next—

OpenAI just released the 1558M parameter model for GPT2

'm posting here the link of a colab notebook with the test for the new GPT-2 model released by openAI https://colab.research.google.com/driv