Author avatar

Vaibhav Sharma

Image Classification Using Tensorflow

Vaibhav Sharma

  • Sep 16, 2019
  • 15 Min read
  • 287 Views
  • Sep 16, 2019
  • 15 Min read
  • 287 Views
Data
Tensorflow

Introduction

Image classification is a stereotype problem that is best suited for neural networks. This comes under the category of perceptual problems, wherein it is difficult to define the rules for why a given image belongs to a certain category and not another. The human brain can perform this kind of perceptual task with ease but it becomes hopelessly difficult for traditional computer algorithms to solve it. Just to give an example, a two-year-old baby can differentiate a dog from the cat but is a daunting task for traditional computing approaches. However, Machine Learning is able to make great strides in this direction. In this guide, we are going to train a neural network on the images of cats and dogs using Convolutional Neural Networks (CNNs).

Data Preparation and Reading the Data from the Directory

For every machine learning application to work, data is required. Traditional ML (machine learning) tasks that deal with records, rows, or tuples, users can read the data directly into the NumPy array or Pandas dataframe (for a python ecosystem, it may be different for other languages such as R). Image data cannot be directly read and converted to tensors. However, Keras provides inbuilt methods that can perform this task easily. The following is the code to read the image data from the train and test directories.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from tensorflow import keras
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.15,
    height_shift_range=0.15,
    shear_range=0.15,
    zoom_range=0.15,
    horizontal_flip=True,
    fill_mode='nearest')

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    "D:\\dogs-vs-cats_train", target_size=(150,150), batch_size=20, class_mode='binary')


validation_generator = test_datagen.flow_from_directory(
    "D:\\dogs-vs-cats_validation", target_size=(150,150), batch_size=20,class_mode='binary')
	
python

The above snippet takes the image data from the train directory (dogs-vs-cats_train) and validation directory(dogs-vs-cats_validation) and rescales the pixel values by dividing by 255. It also resizes all the images to the size of 150X150, irrespective of the original size of the image present in either train or validation directory. It is also worth noting that train_datagen contains additional arguments than test_datagen. This is done ]to perform Data Augmentation. Data Augmentation is the process of creating more training data based on the already existent training data. In the above snippet, the existing images are flipped, sheared, zoomed, shifted, and rotated to add more training on the data and avoid over-fitting.

Convolutional Neural Networks (CNNs)

Convolutional neural networks differ from traditional dense networks in the way that it is extremely effective in image recognition and computer vision. One of the aspects of the CNNs that make them so powerful is that they can learn the patterns in one part of the screen and apply it anywhere else. This is in stark contrast to that of the traditional dense nets. For example, a CNN can learn the ear pattern of the cat on the top right corner of the screen (image) and apply it to anywhere else on the screen for the new images that come in. One other aspect of CNN that adds to it its uniqueness, along with power, is that it can learn from the hierarchies of patterns. For example, the first layer may learn the pattern of edges, while the subsequent layer may learn the texture, and so on. Also, it is easy to define the number of filters that the user wants from each of the convolutional layers. Filters can be thought of as an individual concept that the network can learn. For example, a high-level concept can be present of eyes, ears, legs, etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
model = keras.Sequential([
    keras.layers.Conv2D(32,(3,3), activation='relu', input_shape=(150,150,3)),
    keras.layers.MaxPool2D((2,2)),
    keras.layers.Conv2D(64,(3,3),activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Conv2D(128,(3,3), activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Conv2D(128,(3,3), activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(512, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')])

model.summary()
python

The first Conv2D layer the patches of 3X3 feature maps and determines 32 filters over the input. Similarly, the second Conv2D layer computes 64 filters and the third layer Conv2D layer computes 128 filters.

Maxpooling Operation

Generally, CNNs are accompanied by a Maxpooling operation. The primary operation of Maxpooling is to downsample the feature maps. Maxpooling generally proceeds after each convolutional layer and it reduces the dimensionality of the images. It is not a necessary attribute but it is an effective way to make the models efficient and increase their predictive power. The below table gives a summary of the model created this far.

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 148, 148, 32)      896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 128)       147584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 128)         0
_________________________________________________________________
flatten (Flatten)            (None, 6272)              0
_________________________________________________________________
dropout (Dropout)            (None, 6272)              0
_________________________________________________________________
dense (Dense)                (None, 512)               3211776
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0

Model Compilation, Fitting, Plotting, and Saving the Model

Once the model creation is done, we can proceed to compile and fit the data. The output produced by each epoch is stored in the history object which is later used to plot the graph of accuracy vs. epochs. This is used to determine the performance of the model and make sure that it is not over-fitting.

1
2
3
4
5
model.compile(optimizer='adam', 
              loss='binary_crossentropy',
              metrics=['acc'])
            
history = model.fit_generator(train_generator, steps_per_epoch = 100, epochs=80, validation_data=validation_generator, validation_steps=50)
python

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Epoch 1/80
50/50 [==============================] - 25s 503ms/step - loss: 0.6924 - acc: 0.5100
101/101 [==============================] - 156s 2s/step - loss: 0.7046 - acc: 0.5020 - val_loss: 0.6924 - val_acc: 0.5100
Epoch 2/80
50/50 [==============================] - 21s 414ms/step - loss: 0.6750 - acc: 0.5500
101/101 [==============================] - 147s 1s/step - loss: 0.6915 - acc: 0.5325 - val_loss: 0.6750 - val_acc: 0.5500
Epoch 3/80
50/50 [==============================] - 21s 413ms/step - loss: 0.6895 - acc: 0.5000
101/101 [==============================] - 145s 1s/step - loss: 0.6792 - acc: 0.5500 - val_loss: 0.6895 - val_acc: 0.5000
Epoch 4/80
50/50 [==============================] - 22s 444ms/step - loss: 0.6767 - acc: 0.5940
101/101 [==============================] - 155s 2s/step - loss: 0.6896 - acc: 0.5260 - val_loss: 0.6767 - val_acc: 0.5940
Epoch 5/80
50/50 [==============================] - 19s 388ms/step - loss: 0.6656 - acc: 0.6200
101/101 [==============================] - 154s 2s/step - loss: 0.6709 - acc: 0.5759 - val_loss: 0.6656 - val_acc: 0.6200

The above output is not complete and shows the output trace only until five epochs. The complete training process is for 80 epochs and may take some time to complete, depending on how fast the machine is.

The following code was used to create the graph of Training and Validation accuracy vs. epochs.

1
2
3
4
5
6
7
8
9
10
acc_train = history.history['acc']
acc_val = history.history['val_acc']
epochs = range(1,81)
plt.plot(epochs,acc_train, 'g', label='training accuracy')
plt.plot(epochs, acc_val, 'b', label= 'validation accuracy')
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
python

alt text

Since it is not feasible to fit the model every-time, once a good optimized model is achieved, it is best to save the model and save yourself the trouble of training it again. This can be done through the following code.

1
model.save('cats_and_dogs_small_1.h5')
python

For reference, the complete code is as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
from tensorflow import keras
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
import os
import cv2
import numpy as np
from os import listdir
from os.path import isfile, join
mypath = 'D:\\ml\\test'

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.15,
    height_shift_range=0.15,
    shear_range=0.15,
    zoom_range=0.15,
    horizontal_flip=True,
    fill_mode='nearest')

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    "D:\\dogs-vs-cats_train", target_size=(150,150), batch_size=20, class_mode='binary')


validation_generator = test_datagen.flow_from_directory(
    "D:\\dogs-vs-cats_validation", target_size=(150,150), batch_size=20,class_mode='binary')

model = keras.Sequential([
    keras.layers.Conv2D(32,(3,3), activation='relu', input_shape=(150,150,3)),
    keras.layers.MaxPool2D((2,2)),
    keras.layers.Conv2D(64,(3,3),activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Conv2D(128,(3,3), activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Conv2D(128,(3,3), activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(512, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')])

model.summary()

model.compile(optimizer='adam', 
              loss='binary_crossentropy',
              metrics=['acc'])
            
history = model.fit_generator(train_generator, steps_per_epoch = 100, epochs=80, validation_data=validation_generator, validation_steps=50)

model.save('cats_and_dogs_small_1.h5')
acc_train = history.history['acc']
acc_val = history.history['val_acc']
epochs = range(1,81)
plt.plot(epochs,acc_train, 'g', label='training accuracy')
plt.plot(epochs, acc_val, 'b', label= 'validation accuracy')
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
python

Loading the Model and Making Predictions

Following training, the program loads the weights that have been computed previously and uses them for predicting the new set of data to be either cat or dog.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from tensorflow import keras
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
import os
import cv2
import numpy as np
from os import listdir
from os.path import isfile, join
mypath = 'D:\\ml\\test'

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.15,
    height_shift_range=0.15,
    shear_range=0.15,
    zoom_range=0.15,
    horizontal_flip=True,
    fill_mode='nearest')

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    "D:\\dogs-vs-cats_train", target_size=(150,150), batch_size=20, class_mode='binary')


validation_generator = test_datagen.flow_from_directory(
    "D:\\dogs-vs-cats_validation", target_size=(150,150), batch_size=20,class_mode='binary')

model = keras.Sequential([
    keras.layers.Conv2D(32,(3,3), activation='relu', input_shape=(150,150,3)),
    keras.layers.MaxPool2D((2,2)),
    keras.layers.Conv2D(64,(3,3),activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Conv2D(128,(3,3), activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Conv2D(128,(3,3), activation='relu'),
    keras.layers.MaxPool2D(2,2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(512, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')])

model.load_weights("D:\\ml\\cats_and_dogs_small_1.h5")

onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

for image_file in onlyfiles:
    img = image.load_img("D:\\ml\\test\\" + image_file, target_size=(150,150))
    x = image.img_to_array(img)
    x = x.reshape((1,)+x.shape)
    print(model.predict(x))
    if model.predict(x) < 0.3:
        print(image_file + ": Must be a cat")
    if model.predict(x) > 0.7:
        print(image_file + ": Must be a dog")
    if model.predict(x) > 0.3 and model.predict(x) < 0.7:
        print(image_file + ": Not sure if its a cat or a dog")
		
python

Output

1
2
3
4
5
6
7
8
9
10
[[1.]]
100.jpg: Must be a dog
[[1.]]
101.jpg: Must be a dog
[[1.]]
102.jpg: Must be a dog
[[0.]]
106.jpg: Must be a cat
[[1.]]
107.jpg: Must be a dog

Conclusion

Image classification is a flagship example of the capability of the Deep Learning technology. A few years back, anything like this was inconceivable even in the realm of Machine Learning. Deep learning is making big strides on things previously considered to be unfathomable. Also, it is to be noted that the Neural Network is a black-boxed approach and practicing it is more of an art than a science. A good optimized model is the work of trial and error and making informed guesses on your hyperparameters and epochs that needs to be run. Setting up the training data is also one of the more fundamental issues in the overall success of the model.

Appendix

I have compiled a few examples of the data set which can be found at my GitHub

0