Tensorflow is a machine learning library used by researchers and, also, for production. In this guide, we’ll explore how to perform simple image classification in Tensorflow using Keras backend. Image Classification is the task of assigning a single label to an input image from a predefined set of labels, otherwise called labels or categories. After reading this guide you will have a clear understanding of how image classification works. Once Image is classified you can draw the bounding box over the image. This is called Image localization. Image classification is the basis for the self-driving car, creating Generative networks which can help industries in decision-making tasks.
This task can be divided into the following subtasks.
Google Colab is used for this demo Google Colab. It provides free GPU and TPU to perform ML/AI tasks. The entire code of this guide can be found in mnist. First, you need to grab your New API Token from a Kaggle account. Then, upload the API token file as a kaggle.json in Colab using the following code:
1from google.colab import files
2files.upload()
The next step is to mount a Google drive and change it to the desired directory in Google drive.
1from google.colab import drive
2import os
3
4drive.mount("/content/gdrive")
5os.chdir("/content/gdrive/My Drive/<path/of/google drive folder/>") #change dir
Next, install the Kaggle python library through pip installation. Then, create a directory named Kaggle, copy the API token, and set permissions.
1!pip install -q kaggle
2!mkdir -p ~/.kaggle
3!cp kaggle.json ~/.kaggle/
4!ls ~/.kaggle
5!chmod 600 /root/.kaggle/kaggle.json # set permission
Then, download MNIST Digit Recognizer data using the below command:
1!kaggle competitions download -c digit-recognizer
If the download is successful, three files will be found in the present working directory named “train.csv”, “test.csv”, and “sample_submission.csv”. This data can be downloaded directly from the Kaggle website and used for training in different environment setup.
MNIST is a handwritten digit image database, and each image is 28x28 = 784 pixels for each image. The given Digit Recognizer Data has 42000 training images and 28000 test images. Data is represented in CSV format, in which the first column is the label and the remaining 784 columns represent pixel values. Each row represents individual images. The test data contains 784 columns. The task is to predict labels for the 28000 test images; labels are digits 0-9.
Read the image data stored in CSV format. The pandas read_csv() function is used to read the CSV file.
1train = pd.read_csv("train.csv")
2test=pd.read_csv("test.csv")
Then, prepare the data for training by dropping the label column. The training data contains only pixel values.
1X_train = train.drop(["label"],axis = 1)
2Y_train = train["label"]
3X_test=test
After reading data check to verify the quality of the data, can you find how the 10 classes in the training images are distributed? Can you find how many missing values are present? The below code counts how many samples are present for each class.
1g = sns.countplot(Y_train)
2Y_train.value_counts()
Next, we will calculate the number of null values in the training and test data. This will tell us if there are any corrupted images in the data. In this case, there are no null values, so the data quality is good.
1X_train.isnull().any().describe()
1X_test.isnull().any().describe()
This is a grayscale image with possible pixel intensity values from 0-255. To make the pixel intensity values within the range 0-1, we’ll divide the intensity values of all pixels by 255. The motivation is to achieve consistency in the range of values being handled to avoid mental distraction or fatigue.
1X_train = X_train/255.0
2X_test = X_test/255.0
The Conv2D layers in Keras are designed to work with three-dimensions per image. They have 4D inputs and outputs. The input arguments are the number of samples, width, height, and the number of features or channels. Syntax: reshape (nb_samples, width, height,nb_features)
1X_train = X_train.values.reshape(len(X_train), 28, 28,1)
2X_test = X_test.values.reshape(len(X_test), 28, 28,1)
When encoding labels, convert labels into one hot encoding.
The Keras function “to_categorical()” takes labels0-9 as the input and converts it to a one-hot encoding of integer encoded values.
1from keras.utils.np_utils import to_categorical
2
3Y_train = to_categorical(Y_train, num_classes = 10)
Training data is split into the training and validation set. Validation data is created to evaluate the performance of the model before applying it into actual data. The below code randomly moves 10% of the training data into validation data. We set a random seed =3 to initialize a random generator to randomly pick the validation data.
1from sklearn.model_selection import train_test_split
2# Set the random seed
3random_seed = 3
4X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size = 0.1, random_state=random_seed)
Convolution is done to extract features from input images. In the figure below, the image size is 5x5 and kernel size is 3x3. The kernel is slid over the image to extract the feature map. The feature map that is extracted is spatially correlated.
The batch normalization is used to bring the values in hidden layers into the same scale as everything else. To classify oranges from lemons, each batch sees a different set of values and their activation values will be different. Batch Normalization reduces the dependency between each batch by bringing all of the values into the same scale.
Max Pooling extracts important features obtained from convolution. Max pooling is done after a few convolutions. In the code below, 2x2 max pooling is used. It finds the maximum value in the 2x2 and returns the highest value. It also reduces the number of parameters in the network by reducing the size of the feature map.
For the 11x11x10 incoming tensor feature maps take the average of each 11x11 matrix slice which gives a 10-dimensional vector. This can feed into the fully-connected layers which are a single-dimension vector representing 10 classes.
The ReLu Activation function is used to carry forward all of the positive values to the next layer and makes sure negative values are dropped down. Any value less than zero is negative; a value of zero or greater is taken as a positive value.
1model = Sequential()
2model.add(Conv2D(128, (3, 3), activation='relu', input_shape=(28,28,1))) # 26
3model.add(BatchNormalization())
4
5model.add(Conv2D(64, (3, 3), activation='relu')) # 24
6model.add(BatchNormalization())
7
8model.add(Conv2D(50, (3, 3), activation='relu')) # 22
9model.add(BatchNormalization())
10
11model.add(Conv2D(52, (3, 3), activation='relu')) # 20
12model.add(BatchNormalization())
13
14model.add(Conv2D(64, (3, 3), activation='relu')) # 18
15model.add(BatchNormalization())
16
17model.add(Conv2D(32, (3, 3), activation='relu')) # 16
18model.add(BatchNormalization())
19
20model.add(Conv2D(27, (3, 3), activation='relu')) # 14
21model.add(BatchNormalization())
22
23model.add(Conv2D(15, (3, 3), activation='relu')) # 12
24model.add(BatchNormalization())
25
26model.add(Conv2D(10, (3, 3), activation='relu')) # 9
27model.add(BatchNormalization())
28
29model.add(GlobalAveragePooling2D())
30
31model.add(Activation('softmax'))
A hyperparameter is a parameter whose value is set before the learning process. The hyperparameters present in CNN are:
The optimizer is used to update weight and model parameters to minimize the loss function. Adam stands for Adaptive Moment Estimation and it’s chosen because of its fast convergence.
When compiling the three parameters loss, the optimizer and metrics are required.
1model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
Apply the model in the training and validation set.
1%%time
2history = model.fit(X_train, Y_train, epochs=40,verbose=1,validation_data = (X_val,Y_val),batch_size=batch_size)
Now find the validation loss and validation accuracy of the model.
1val_loss,val_acc = model.evaluate(X_val, Y_val, verbose=0)
2print("Validation Accuracy:",val_acc)
The validation accuracy is 99.19 %.
1# Predict the values from the validation dataset
2Y_pred = model.predict(X_val)
Confusion Matrix is a tabular representation of actual class and predicted class.
1def plot_confusion_matrix(cm, classes,
2 normalize=False,
3 title='Confusion matrix',
4 cmap=plt.cm.Blues):
5 """
6 This function prints and plots the confusion matrix.
7 Normalization can be applied by setting `normalize=True`.
8 """
9 plt.imshow(cm, interpolation='nearest', cmap=cmap)
10 plt.title(title)
11 plt.colorbar()
12 tick_marks = np.arange(len(classes))
13 plt.xticks(tick_marks, classes, rotation=45)
14 plt.yticks(tick_marks, classes)
15
16 if normalize:
17 cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
18
19 thresh = cm.max() / 2.
20 for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
21 plt.text(j, i, cm[i, j],
22 horizontalalignment="center",
23 color="white" if cm[i, j] > thresh else "black")
24
25 plt.tight_layout()
26 plt.ylabel('True label')
27 plt.xlabel('Predicted label')
1import itertools
2
3# Convert predictions classes to one hot vectors
4Y_pred_classes = np.argmax(Y_pred,axis = 1)
5# Convert validation observations to one hot vectors
6Y_true = np.argmax(Y_val,axis = 1)
7# compute the confusion matrix
8confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
9# plot the confusion matrix
10plot_confusion_matrix(confusion_mtx, classes = range(10))
This time, we’ll predict classes for unseen images. These images are not used in training or validation.
1# predict results
2results = model.predict(X_test)
3# select the index with the maximum probability
4results = np.argmax(results,axis = 1)
5results = pd.Series(results,name="Label")
The predicted labels are stored in a CSV file using the pandas to_csv function.
1submit = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)
2submit.to_csv("cnn_mnist_predictions.csv",index=False)
In this guide, we have learned how to load images from Keras datasets, how to preprocess images, how to train images, validate the model performance, and how to predict classes for unseen images. You can change the model hyperparameters and train it again, and see if the performance can be further increased.
To know more about image classification techniques you can read about the different types of convolution architecture. Densenet, and its variants, are widely used for Image Classification, Image Segmentation, and Image Localization tasks.