Consider the following sequence of numbers. How difficult are they for you to comprehend?
Figuring out this sequence of numbers is easy, even though the resolution is distorted and the shape of the digits is irregular. Thanks to our brains, which made this process feel natural. We should, of course, also be thankful to ourselves, having spent years learning and applying the numbers in our day-to-day lives.
In the 21st century, is it possible to imitate the human brain? Yes, with artificial intelligence, or deep learning algorithms, which allow computers to learn and perform tasks that seem quite natural and repetitive to human brains—tasks like handwritten digit recognition.
In this guide, we will use Neural Networks (NN) to develop a handwritten digit classifier from scratch using PyTorch.
In PyTorch, a matrix (array) is called a tensor. Tensors are the arrays of numbers or functions that obey definite transformation rules. PyTorch tensors are like NumPy arrays. They are just n-dimensional arrays that work on numeric computation, which knows nothing about deep learning or gradient or computational graphs. A vector is a 1-dimensional tensor. A matrix is a 2-dimensional tensor, and an array with three indices is a 3-dimensional tensor (RGB color images).
The only difference between a NumPy array and a Pytorch tensor is that the PyTorch tensor can work on CPU or GPU, and the NumPy array has no GPU backend. To work on GPU, we need to cast our tensor to data CUDA datatype.
Data can tell you a lot if you ask the right questions.
To understand data, data scientists spends most of their time gathering datasets and preprocessing them. Further tasks are comparatively easy.
In this guide, we will use the MNIST database, a collection of 70,000 handwritten digits split into 60,000 training images and 10,000 testing images. We will use Pytorch as it provides the clean and prepared data ready for implementation with minimal lines of code.
So let's begin by making the following imports.
1import numpy as np
2import torch
3import torchvision
4import matplotlib.pyplot as plt
5from time import time
6from torchvision import datasets, transforms
7from torch import nn, optim
We have a common problem here in that these samples are not the same size. Neural networks will require images to be of fixed size. Hence, we need to define transformations to our data before feeding it into the pipeline.
We do this using torchvision library form torch. It will help us transform (using torchvision.transform
) and load our dataset (using torchvision.dataset
).
1transform = transforms.Compose([transforms.ToTensor(),
2 transforms.Normalize((0.5,), (0.5,)),
3 ])
transforms.Compose()
: This clubs all the transforms provided to it. Compose is applied to the inputs one by one.
transforms.ToTensor()
: This converts images into numbers so that our system can understand. Then it scales the PIL.Image
(RGB) or numpy.ndarray
(H x W x C) in the range \[0, 255\]
using torch.ToTensor
to shape (C x H x W) in the range \[0, 1\]
. Images are now converted to Torch Tensor.
transforms.Normalize()
: These values are estimated mean and standard deviation.
Create a folder in the directory where you desire to download the data. After downloading the data, load it into Dataloader. At each epoch, dataloader
will shuffle the images, making the model robust by getting a new order of exploration.
batch_size
denotes the number of samples contained in each generated batch.
We will do some data analysis to check if our image and label shapes match each other.
1trainset = datasets.MNIST(r'..\input\MNIST', download=True, train=True, transform=transform)
2trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
3
4testset = datasets.MNIST(r'..\input\MNIST', download=True, train=False, transform=transform)
5testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)
1dataiter = iter(trainloader) # creating a iterator
2images, labels = dataiter.next() # creating images for image and lables for image number (0 to 9)
3
4print(images.shape)
5print(labels.shape)
torch.Size([64, 1, 28, 28])
: This shows that there are 64 images in each batch, which has 1 color channel and 28x28 pixel dimensions.
torch.Size([64])
: 64 images should have 64 labels associated with them.
Let's display one random image from the training set.
1plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');
That's great! Now let's display a grid of random images in the data set. You will get a glimpse of what the dataset will look like before being fed to the neural network.
1figure = plt.figure()
2num_of_images = 60
3for index in range(1, num_of_images + 1):
4 plt.subplot(6, 10, index)
5 plt.axis('off')
6 plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')
A neural network is simply a function that fits some data, typically called neurons. Each neuron has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit's output. A single neuron will have no advantage in deep learning. Hence, multiple neurons are combined to form a neural network using this equation:
PyTorch provides an easy way to build networks like this. By using nn.Sequential
, a tensor is passed sequentially through operations.
It wraps up the network into three linear layers with ReLu
and Tanh
activation function. Very often, softmax
produces a probability close to 0, and 1 and floating-point numbers cannot represent values 0 and 1. Hence it's more convenient to build the model with a log-softmax
output using nn.LogSoftmax
.
1# Model creation with neural net Sequential model
2model=nn.Sequential(nn.Linear(784,128), # 1 layer:- 784 input 128 o/p
3 nn.ReLU(), # Defining Regular linear unit as activation
4 nn.Linear(128,64), # 2 Layer:- 128 Input and 64 O/p
5 nn.Tanh(), # Defining Regular linear unit as activation
6 nn.Linear(64,10), # 3 Layer:- 64 Input and 10 O/P as (0-9)
7 nn.LogSoftmax(dim=1) # Defining the log softmax to find the probablities for the last output unit
8 )
9
10print(model)
To calculate losses in PyTorch, we will use the .nn module
and define Negative Log-Likelihood Loss. Likelihood refers to the chance of certain calculated parameters producing certain known data. Note that criterion
combines nn.NLLLoss() and Logsoftmax() into one single class. The input contains the scores (raw output) of each class. With the softmax
function, you will likely use cross-entropy loss.
To calculate the loss, first define the criterion, then pass the output of your network with the correct labels.
1# defining the negative log-likelihood loss for calculating loss
2criterion = nn.NLLLoss()
1images, labels = next(iter(trainloader))
2images = images.view(images.shape[0], -1)
3
4logps = model(images) #log probabilities
5loss = criterion(logps, labels) #calculate the NLL-loss
The autograd module automatically calculates the gradient of the tensor. The small change in the input weight that reflects the change in loss is called the gradient of that weight and is calculated using backpropagation. It is then used to update the weights by using a learning rate. It reduces the overall loss and trains the neural net.
If requires_grad = False
, it will hold a None
value. It will continue to hold the None value unless .backward()
function is called from some other node. This will calculate gradient of loss with respect to weight.
1print('Before backward pass: \n', model[0].weight.grad)
2loss.backward() # to calculate gradients of parameter
3print('After backward pass: \n', model[0].weight.grad)
The optimizer will update the parameters based on the computed gradients. It will update the weights using backpropagation. The result is visible as a gradual decrease in training loss with each epoch.
1# defining the optimiser with stochastic gradient descent and default parameters
2optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
3
4print('Initial weights - ', model[0].weight)
5
6images, labels = next(iter(trainloader))
7images.resize_(64, 784)
8
9# Clear the gradients, do this because gradients are accumulated
10optimizer.zero_grad()
11
12# Forward pass
13output = model(images)
14loss = criterion(output, labels)
15# the backward pass and update weights
16loss.backward()
17print('Gradient -', model[0].weight.grad)
1time0 = time()
2epochs = 15 # total number of iteration for training
3running_loss_list= []
4epochs_list = []
5
6for e in range(epochs):
7 running_loss = 0
8 for images, labels in trainloader:
9 # Flatenning MNIST images with size [64,784]
10 images = images.view(images.shape[0], -1)
11
12 # defining gradient in each epoch as 0
13 optimizer.zero_grad()
14
15 # modeling for each image batch
16 output = model(images)
17
18 # calculating the loss
19 loss = criterion(output, labels)
20
21 # This is where the model learns by backpropagating
22 loss.backward()
23
24 # And optimizes its weights here
25 optimizer.step()
26
27 # calculating the loss
28 running_loss += loss.item()
29
30 else:
31 print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader)))
32print("\nTraining Time (in minutes) =",(time()-time0)/60)
We are almost there. We have calculated the loss, done backward pass, and updated weights, and the accuracy looks excellent. Before we put the algorithm through the loop for all the images, let's validate our result using one image.
The function classify
displays the image and the predicted probability in the form of a bar graph.
1def classify(img, ps):
2 '''
3 Function for viewing an image and it's predicted classes.
4 '''
5 ps = ps.data.numpy().squeeze()
6
7 fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
8 ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
9 ax1.axis('off')
10 ax2.barh(np.arange(10), ps)
11 ax2.set_aspect(0.1)
12 ax2.set_yticks(np.arange(10))
13 ax2.set_yticklabels(np.arange(10))
14 ax2.set_title('Class Probability')
15 ax2.set_xlim(0, 1.1)
16 plt.tight_layout()
1images, labels = next(iter(testloader))
2# replace trainloader to check training accuracy.
3
4img = images[0].view(1, 784)
5# Turn off gradients to speed up this part
6with torch.no_grad():
7 logpb = model(img)
8
9# Output of the network are log-probabilities, need to take exponential for probabilities
10pb = torch.exp(logpb)
11probab = list(pb.numpy()[0])
12print("Predicted Digit =", probab.index(max(probab)))
13classify(img.view(1, 28, 28), pb)
Our model is working! Now let's iterate through the validation set using the loop to calculate the total number of correct predictions and accuracy of the model.
1correct_count, all_count = 0, 0
2for images,labels in testloader:
3 for i in range(len(labels)):
4 img = images[i].view(1, 784)
5
6 with torch.no_grad():
7 logps = model(img)
8
9 ps = torch.exp(logps)
10 probab = list(ps.numpy()[0])
11 pred_label = probab.index(max(probab))
12 true_label = labels.numpy()[i]
13 if(true_label == pred_label):
14 correct_count += 1
15 all_count += 1
16
17print("Number Of Images Tested =", all_count)
18print("\nModel Accuracy =", (correct_count/all_count))
19
1torch.save(model, 'path/to/save/my_mnist_model.pt') # or .pth extension
HURRAY! We have over 97.2% accuracy. We don't need to train the model every time. PyTorch has a functionality that can save our model so that in the future, we can load it and use it directly.
The volume of data made it easy for our model to read even the most unrecognizable numbers. With PyTorch, we were able to concentrate more on developing our model than cleaning the data. The field is now yours. Experiment more on the MNIST dataset by adding hidden layers to the network, applying a different combination of activation functions, or increasing the number of epochs, and see how it affects the accuracy of the test data.
For any questions regarding this guide, feel free to reach out to me at CodeAlphabet.