There are some photos in your albums that make you think, "This picture of grandma is about 50 years old. If only this lovely picture were more clear, more colorful." The advancement in the technology of digital photography is remarkable. It can give black and white photos and videos color and restore any distorted images, which can be handy evidence for forensic purposes. Computer vision and deep learning techniques just add to this. Neural networks and convolution neural networks are well known for their data modeling techniques and approach.
This guide will deal with how autoencoders help to reduce noises in an image. It will use the Keras module and Fashion MNIST data. You can download it here. By the end of this guide, you will learn how autoencoders reconstruct a noisy image.
Keras has a remarkably powerful Python-based neural network API, and it runs on top of Tensorflow. I encourage you to look at this guide to get familiar with the components of CNN and how it manipulates the images to perform complex computer vision tasks.
Image data is made up of pixels. In black and white images, each pixel displays a number ranging from 0 to 255. A color image contains the pixel combination red (R), green (G), blue (B), each ranging from 0 to 255. If an image has a resolution of 748 x 1005, it is a grid with 748 columns and 1005 rows. So that will be 748*1005 = 0.75 megapixels.
Autoencoders are tagged under self-supervised learning. Some say it's unsupervised as they are independent of the labeled responses when coming to classification. They are used by neural networks to perform representation learning. In the image below, the autoencoders contain a bottleneck network that performs compressed knowledge representation for the input. To leverage the autoencoders performing, you need to make sure that they carefully recreate observation and also learn generalized encoding and decoding methods on the training data. In autoencoders, middle layers/hidden core layers are of more value than the output layer.
If the number of neurons in the middle layer is less than the number of neurons in the input layer, the network extracts the more effective information. The middle layer will not have any other option but to learn the most important image patterns, ignoring the noises. If you have more neurons in the middle layer, the neural network will have a higher capacity to learn the pattern, making the network lazy. It will copy and paste the input values to the output values, learn noises, and not extract any feature.
Hence, the bottleneck model is essential.
The guides Encoders and Decoders for Neural Machine Translation and NMT: Encoder and Decoder with Keras discuss how encoder and decoder models work hand in hand to produce a giant model used for machine translation. Here, in image denoising, the encoding network will compress the input layer values (bottleneck). Its results will work as input to the middle layer. The decoder network's job is to reconstruct the information and provide the results. Most computer vision engineers follow symmetry/mirror arrangement when it comes to the number of hidden layers, meaning that the number of hidden layers and neurons in the encoder network will be the same in the decoder network.
To remove the noise from an image, it is important to reduce its dimensionality. Principal Component Analysis (PCA) is used to perform this task. But PCA has limitations; it only applies linear transformation and also contains outliers. On the other hand, autoencoders can introduce non-linearity into the network with the help of their non-linear activation functions and the stack of multiple layers. Outliers, a by-product of dimensionality reduction, can easily be detected by using this neural network.
The example in this guide will take a reference for Keras implementation on Fashion MNIST image modeling. This guide runs on Google Colab GPU. I would strongly recommend using GPU as it improves the training time drastically. Go to Edit > Notebook Settings, make changes, and save.
Skip this part if you are working on a different IDE or aware of how google Colab handles the data.
If you load your data in a normal folder in Colab, it will be temporarily present. Before starting, mount your drive in the Colab.
from google.colab import drive drive.mount('/content/drive')
Copy and paste the authentication code and press enter.
You are all set! Import the important libraries and modules.
1 2 3 4 5 6 7 8 9 10 11 12 13
import seaborn as sns import numpy as np import pandas as pd from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt from tensorflow.keras.models import Sequential, Model from tensorflow.keras.layers import Dense, Input from tensorflow.keras.utils import to_categorical %matplotlib inline sns.set(style = 'white', context = 'notebook', palette = 'deep') np.random.seed(42)
To read the CSV data, you will need
pandas.read_csv. It will read the data into the Pandas data frame. Or you can use
keras.dataset in the library and import
fashion.mnist.load_data() to use the dataset.
train = pd.read_csv("/content/drive/My Drive/fashion-mnist_test.csv") test = pd.read_csv("/content/drive/My Drive/fashion-mnist_train.csv")
Check how pixels look like in the data frame.
train.head() will show the first five columns of the data frame.
So there are 784 total pixels present in the data of size 28x28. A black and white image is in a 2D array form.
The data contains black and white images with unsigned integers of the range 0 to 255.
Here there is a need for scaling the image. Normalize the pixel values by rescaling them to the range 0-1. The first step is to convert the data type from the data frame and series to NumPy
1 2 3 4 5
y_train = train["label"] x_train = train.drop(labels = ["label"], axis = 1) print(type(x_train)) print(type(y_train))
1 2 3 4 5
x_train = x_train.to_numpy() y_train = y_train.to_numpy() print(type(x_train)) print(type(y_train))
x_train = x_train.astype('float')/255.
Now by using the holdout method, split the training and testing data into an 80:20 ratio.
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size = 0.2, random_state = 42)
Check the number of samples you got.
1 2 3 4 5
x_train_size = len(x_train) x_val_size = len(x_val) print(x_train_size) print(x_val_size)
To develop a generalized model, a bit of noise is added to the input data to make it corrupt. The uncorrupted data is maintained, and it acts as the output. Here the model cannot memorize the training data and maps out the result as input. Output targets are different. This forces the model to map the input data to a lower-dimension manifold (a concentration point for input data). Consider an example where the data is comprised of car images; all images that look like cars would be part of a manifold. If this manifold is accurately detected then the added noise can be skipped. You can refer to this paper to gain more knowledge.
Add synthetic noise by applying random data on the image data. You will need to normalize that new form of random image too. To achieve that, multiply the random noise by 0.9 and clip the range between 0 to 1. You may also use the Gaussian noise matrix and notice the difference.
1 2 3 4 5 6 7 8 9 10
#method-1 x_train_noisy = x_train + np.random.rand(x_train_size, 784) * 0.9 x_val_noisy = x_val + np.random.rand(x_val_size, 784) * 0.9 #method-2: Adding Gaussian Noise # x_train_noisy = x_train + 0.75 * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) # x_val_noisy = x_val + 0.75 * np.random.normal(loc=0.0, scale=1.0, size=x_val.shape) x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_val_noisy = np.clip(x_val_noisy, 0., 1.)
Only for the visualization purpose, the image is reshaped from 1D array to 2D array, 784 to (28,28).
1 2 3 4 5 6 7 8 9 10 11 12
def plot(x, p , labels = False): plt.figure(figsize = (20,2)) for i in range(10): plt.subplot(1, 10, i+1) plt.imshow(x[i].reshape(28,28), cmap = 'binary') plt.xticks() plt.yticks() if labels: plt.xlabel(np.argmax(p[i])) plt.show() return plot(x_train, None)
The input size is of a 1D array. Notice that Dense layer 64 produces the bottleneck. The final layer at the decoder end gives the output of 784 units. The sigmoid function gives out the value between 0 and 1. This layer decides whether to consider the noise pixel or not.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
input_image = Input(shape = (784, ) ) encoded = Dense(512, activation = 'relu')(input_image) encoded = Dense(512, activation = 'relu')(encoded) encoded = Dense(256, activation = 'relu')(encoded) encoded = Dense(256, activation = 'relu')(encoded) encoded = Dense(64, activation = 'relu')(encoded) decoded = Dense(512, activation = 'relu')(encoded) decoded = Dense(784, activation = 'sigmoid')(decoded) autoencoder = Model(input_image, decoded) autoencoder.compile(loss= 'binary_crossentropy' , optimizer = 'adam') autoencoder.summary()
The input size is of the 1D array. Notice that the Dense layer
64 produces the bottleneck. The last layer at the decoder end gives the output of 784 units. The sigmoid function gives out the value between 0 and 1. This layer decides if to consider the noise pixel.
1 2 3
import tensorflow as tf history = autoencoder.fit(x_train_noisy, x_train, epochs=100, batch_size=128, shuffle = True, validation_data=(x_val_noisy, x_val))
Below you can see how well denoised images were produced from noisy ones present in
x_val. There are three outputs: original test image, noisy test image, and denoised test image form autoencoders.
preds = autoencoder.predict(x_val_noisy)
print("Test Image") plot(x_val, None)
print("Noisy Image") plot(x_val_noisy, None)
1 2 3
print("Denoised Image") plot(preds, None)
Plot the loss.
1 2 3 4 5 6 7 8 9 10
def plot_loss(history, x = 'loss', y = 'val_loss'): fig, ax = plt.subplots( figsize=(20,10)) ax.plot(history.history[x]) ax.plot(history.history[y]) plt.title('Model Loss') plt.ylabel(y) plt.xlabel(x) plt.legend(['Train', 'Val'], loc='upper left') ax.grid(color='black') plt.show()
The result is good but there is still scope for narrowing the gap. Play with the hyper-parameters. You can leverage the TensorBoard HParam feature that helps you track the progress and visualize it. Adjust the number of epochs. Increase/decrease the layers in the model check for the results each time.
But what if you have huge image data? The training will take a lot of time. Keras and Pytorch both have many pre-trained CNNs including, ResNet, VGG, DenseNet, and MobileNet. They use a large image database. ImageNet is an open source database you can download for your research and also contribute to.
Further, you may combine the noise reduction model with the classification model. The autoencoders will try to enhance the image. If you have any questions, feel free to reach out to me at CodeAlphabet.