Convolutional neural networks (CNNs) are similar to neural networks to the extent that both are made up of neurons, which need to have their weights and biases optimized. The main difference between the two is that CNNs make the explicit assumption that the inputs are images, which allows us to incorporate certain properties into the architecture. These properties make the forward propagation step much more efficient and reduce the number of parameters needed in the network. This makes CNNs the best choice for solving problems related to image recognition, object detection, and other computer vision applications.
In this guide, you will learn how to build CNNs using the keras library. Let's start by loading the required libraries and packages.
1 2 3 4 5 6 7 8 9 10 11 12
import keras from keras.models import Sequential from keras.layers import Dense from keras.utils import to_categorical import matplotlib.pyplot as plt from keras.datasets import mnist from keras.layers import Dropout from keras.layers import Flatten from keras.layers.convolutional import Conv2D from keras.layers.convolutional import MaxPooling2D from keras.utils import np_utils
We will use the popular MNIST dataset in this guide. Each image in the dataset has dimensions of 28x28 pixels and contains a centered, grayscale digit. The model will take the image as input, and it will output one of the ten possible digits (0 through 9). There are 70,000 images in the data, of which 60,000 will be used for training the model and the remaining 10,000 for validating the model.
The first line of code below loads the MNIST dataset and creates the training and test arrays. The second line of code checks the shape of the second image in the training set. The result is a 28x28 pixel shape, which was expected.
1 2 3 4
(X_train, y_train), (X_test, y_test) = mnist.load_data() X_train.shape plt.imshow(X_train) plt.show()
CNNs identify images using pixels that are often related. However, before training the algorithm, we need to prepare the data. The first step is to reshape the inputs —
X_test — as done in the first two lines of code below. The
reshape function performs this task, taking in three arguments. The first argument is the number of images, shown as
X_train.shape. The second argument is the shape of each image (28x28), while the third argument is 1 because the images are greyscale.
Finally, we perform the one-hot-encoding of the target variable. This is done in the fifth and sixth lines of code below. The last two lines of code print the training, test shape and number of classes in the target variable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
# Lines 1 and 2 X_train = X_train.reshape((X_train.shape, 28, 28, 1)).astype('float32') X_test = X_test.reshape((X_test.shape, 28, 28, 1)).astype('float32') # Lines 3 and 4 X_train = X_train / 255 X_test = X_test / 255 # Lines 5 and 6 y_train = np_utils.to_categorical(y_train) y_test = np_utils.to_categorical(y_test) num_classes = y_test.shape print(X_train.shape); print(X_test.shape); print(num_classes)
1 2 3
(60000, 28, 28, 1) (10000, 28, 28, 1) 10
We will create a function to train the CNN model, which is defined as
cnn_model below. The first line of code below calls for the
Sequential constructor because the model type we are building is sequential in nature. From the second line of code onwards, we start using the
add() function to add layers to the model.
The first layer is a
Conv2D layer that will deal with the input images, represented as two-dimensional matrices. There are 32 nodes in this layer, which has a kernel size of 5, and the activation function is
relu, or Rectified Linear Activation.
ReLu is the most widely used activation function in deep neural networks because of its advantages in being nonlinear as well as having the ability to not activate all the neurons at the same time. In simple terms, this means that at a time, only a few neurons are activated, making the network sparse and very efficient.
The next step is to add a pooling layer,
MaxPooling2D, followed by a regularization layer called
Dropout. Between the dropout and the dense layers, there is the
Flatten layer, which converts the 2D matrix data to a vector. This in turn allows the output to be processed by standard, fully connected layers.
The next step is to add the fully connected dense layer with 128 neurons and the rectifier activation function. Next, we add the output layer, which has 10 neurons for the 10 classes and a softmax activation function. This activation function generates probability-like predictions for each class.
The final step is to compile the model, which takes three parameters: optimizer, loss, and metrics. The optimizer controls the learning rate, which will be the
adam optimizer in our case. The main advantage of the
adam optimizer is that we don't need to specify the learning rate as is the case with gradient descent, thereby saving us the task of optimizing the learning rate for our model. We will use the
categorical_crossentropy loss function, which is the common choice for classification problems. In simple terms, the lower the score, the better the model. The evaluation metric we will use to validate the model performance on the test data is the
accuracy metric. The higher the accuracy score, the better the model performance.
The function below creates and compiles the CNN model as discussed above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
def cnn_model(): # create model model = Sequential() model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation='relu')) model.add(MaxPooling2D()) model.add(Dropout(0.2)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dense(num_classes, activation='softmax')) # Compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model model = cnn_model()
The first line of code below fits the model on the training data. We also provide the argument, epochs, which represents the number of training iterations. We have considered 5 epochs and the batch size of 150. The second line uses the
model.evaluate() function to evaluate the model on the test data, while the third line prints the error and the accuracy score.
1 2 3 4 5
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=150) scores = model.evaluate(X_test, y_test, verbose=0) print("CNN Error: %.2f%%" % (100-scores*100))
1 2 3 4 5 6 7 8 9 10 11 12
Train on 60000 samples, validate on 10000 samples Epoch 1/5 60000/60000 [==============================] - 53s 891us/step - loss: 0.2253 - acc: 0.9347 - val_loss: 0.0688 - val_acc: 0.9798 Epoch 2/5 60000/60000 [==============================] - 55s 909us/step - loss: 0.0687 - acc: 0.9790 - val_loss: 0.0459 - val_acc: 0.9846 Epoch 3/5 60000/60000 [==============================] - 56s 931us/step - loss: 0.0491 - acc: 0.9852 - val_loss: 0.0417 - val_acc: 0.9860 Epoch 4/5 60000/60000 [==============================] - 56s 927us/step - loss: 0.0401 - acc: 0.9876 - val_loss: 0.0412 - val_acc: 0.9867 Epoch 5/5 60000/60000 [==============================] - 58s 961us/step - loss: 0.0336 - acc: 0.9896 - val_loss: 0.0386 - val_acc: 0.9873 CNN Error: 1.27%
The above output shows that with only five epochs, we have achieved accuracy of 98.73 percent on our validation data set, which is very good performance.
In this guide, you have learned how to build a simple convolutional neural network using the high-performing deep learning library keras. You also learned about the different parameters that can be tuned depending on the problem statement and the data.
To learn more about building deep learning algorithms using the keras library, please refer to the following guides: