Everyone loves art, and yet not everyone has the talent to create it. If you understand deep learning, however, then you don't need to know how to paint a Picasso painting. You can simply use deep learning to convert any image to a Picasso-style image.
This guide series will make that visualization possible! You will learn how CNN automatically maps the style of one image onto another. You will use an image with a particular artistic style to give a Picasso look to the photograph of your choice.
This is the first guide in a two-part series. This guide will cover the pre-processing of the image, along with an explanation of the VGG model, intermediate layers, and the cost function.
The second guide will talk about style loss and content loss. It will also give a brief explanation ofhow variation loss functions and optimization help to generate AI art.
Let's dig into what exactly the VGG model is first.
In a deep neural network, where hundreds of layers are involved, you face the serious problem of vanishing gradients in which the accuracy of the model decreases with the increase in the number of layers. Hence, efficient networks like ResNet, DenseNet, VGG-Net, etc. were introduced. It is difficult and very time consuming to build dense neural network from scratch for every problem, and ResNet, DenseNet, and VGG-Net are some of the state-of-the-art networks that have been quite successful.
This guide uses a pre-trained VGG-19 model. Pre-trained models are built using transfer learning techniques, where the model uses knowledge to solve a problem (i.e., recognizing a boat) and applies it to a related but similar problem (recognizing a ship).
The VGG-Net model can recognize low-level features using shallow (earlier) layers and high-level features using deeper layers. The images below show the layer structure of the VGG-19 Network:
Now you know the building blocks of the VGGNet-19 model. The architecture of its variants is described in the image below.
Instead of using large filters with larger strides, this model uses smaller filters. For example, the three 3x3 convolution layers will incorporate three non-linear rectification layers instead of one 7x7 convolution layer. This will help the decision function to learn more features. Also, a stack of 3x3 layers will decrease the size in terms of weights, making the model less prone to overfitting.
Now, access TensorFlow's 2.0. Before you input the image into the VGG model, it is important to pre-process it. Most images are in a numpy array, and the model accepts the images in the form of the tensor. The PIL library will do the job.
1import tensorflow as tf 2import IPython.display as display 3 4import matplotlib.pyplot as plt 5import matplotlib as mpl 6mpl.rcParams['figure.figsize'] = (12,12) 7mpl.rcParams['axes.grid'] = False 8 9import numpy as np 10import PIL.Image 11import time 12import functools 13 14def tensor_to_image(tf_input): 15 tf_input = tf_input*255 16 tf_input = np.array(tf_input, dtype=np.uint8) 17 if np.ndim(tf_input)>3: 18 assert tf_input.shape == 1 19 tf_input = tf_input 20 return PIL.Image.fromarray(tf_input) 21 22#input image of your choice 23content_path = 'houses.jpg' 24 25style_path = 'Picasso.png'
In artistic neural style transfer, we use three images:
Feel free to provide the path for your content and style image in the above code snippet.
After selecting the images, pre-process them in terms of size, shape, and dimensions.
1def load_img(image_path): 2 max_dim = 512 3 img = tf.io.read_file(image_path) 4 img = tf.image.decode_image(img, channels=3)#Detects the image to perform apropriate opertions 5 img = tf.image.convert_image_dtype(img, tf.float32)#converts image to tensor dtype 6 7 shape = tf.cast(tf.shape(img)[:-1], tf.float32)# Casts a tensor to float32. 8 9 long_dim = max(shape) 10 scale = max_dim / long_dim 11 12 new_shape = tf.cast(shape * scale, tf.int32) 13 14 img = tf.image.resize(img, new_shape) 15 16 return img[tf.newaxis, :]
1def imshow(image, title=None): 2 if len(image.shape) > 3: 3 image = tf.squeeze(image, axis=0) 4 5 plt.imshow(image) 6 if title: 7 plt.title(title)
1content_image = load_img(content_path) 2style_image = load_img(style_path) 3 4plt.subplot(1, 2, 1) 5imshow(content_image, 'Content-Image') 6 7plt.subplot(1, 2, 2) 8imshow(style_image, 'Style-Image')
Now, in this case, the generated image will target the elements of the content image, i.e., it will have houses and a lake but "painted" in the style of the style reference image.
Intermediate layers are necessary to define the representation of content and style from the images. For an input image, try to match the corresponding style and content target representations at these intermediate layers.
1x = tf.keras.applications.vgg19.preprocess_input(content_image*255) 2x = tf.image.resize(x, (224, 224)) 3vgg = tf.keras.applications.VGG19(include_top=True, weights='imagenet') 4prediction_probabilities = vgg(x) 5prediction_probabilities.shape
1vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet') 2 3print() 4for layer in vgg.layers: 5 print(layer.name)
The convolution layers in VGG have the responsibility to separate the style and content of an image. You will use different intermediate layers to extract content and style information.
For the content layer, the second convolutional layer in block 5,
block5_conv2 is used. Considering the fact that deeper layers in the network capture the objects and their arrangement in the input image, these are complex features to extract. And near the final layers of the CNN, the best features are found.
For the style layers, use the first convolutional layer in each block of layers, that is,
block1_conv1 up to
block5_conv5. CNN keeps learning features. At multiple layers, different patterns are detected. Starting layers will detect simple diagonal lines, first layer edges, then certain patterns, and so on.
1content_layers = ['block5_conv2'] 2 3style_layers = ['block1_conv1', 4 'block2_conv1', 5 'block3_conv1', 6 'block4_conv1', 7 'block5_conv1'] 8 9num_content_layers = len(content_layers) 10num_style_layers = len(style_layers)
Load pre-trained VGG, trained on Imagenet data.
1def vgg_layers(layer_names): 2 vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet') 3 vgg.trainable = False 4 5 tf_outs = [vgg.get_layer(layer).output for layer in layer_names] 6 7 model = tf.keras.Model([vgg.input], tf_outs) 8 return model
1style_extractor = vgg_layers(style_layers) 2style_outputs = style_extractor(style_image*255) 3 4#Look at the statistics of each layer's output 5for name, tf_out in zip(style_layers, style_outputs): 6 print(name) 7 print(" shape: ", tf_out.numpy().shape) 8 print(" min: ", tf_out.numpy().min()) 9 print(" max: ", tf_out.numpy().max()) 10 print(" mean: ", tf_out.numpy().mean()) 11 print()
The model is ready to combine the painted brushstroke of an artist (style) on your selected image (content). The model will optimize by calculating the cost function and reducing the losses.
In style transfer, a neural network is not trained. Instead, its weights and biases are kept constant, and an image is updated by changing/modifying the pixel values until the cost function is optimized (reducing the losses). It makes sure that the "content" in the content image and the "style" in the style image are present in the generated image.
This is the end of Part 1. The next guide will discuss the losses (a component of the cost function) in depth and the creation of artistic style image using NN.
You now have an understanding of the VGG-19 network and the cost function used to generate the artistic image. You also know how and why the intermediate layer is used to extract content and style from an image.
Continue with Part 2 of the series here.
For more details, reach out to me here.