Introduction

5

Consider the following sequence of numbers. How difficult are they for you to comprehend?

Figuring out this sequence of numbers is easy, even though the resolution is distorted and the shape of the digits is irregular. Thanks to our brains, which made this process feel natural. We should, of course, also be thankful to ourselves, having spent years learning and applying the numbers in our day-to-day lives.

In the 21st century, is it possible to imitate the human brain? Yes, with artificial intelligence, or deep learning algorithms, which allow computers to learn and perform tasks that seem quite natural and repetitive to human brains—tasks like handwritten digit recognition.

In this guide, we will use *Neural Networks (NN)* to develop a handwritten digit classifier from scratch using PyTorch.

In PyTorch, a matrix (array) is called a *tensor*. Tensors are the arrays of numbers or functions that obey definite transformation rules. PyTorch tensors are like NumPy arrays. They are just n-dimensional arrays that work on numeric computation, which knows nothing about deep learning or gradient or computational graphs. A vector is a 1-dimensional tensor. A matrix is a 2-dimensional tensor, and an array with three indices is a 3-dimensional tensor (RGB color images).

The only difference between a NumPy array and a Pytorch tensor is that the PyTorch tensor can work on CPU or GPU, and the NumPy array has no GPU backend. To work on GPU, we need to cast our tensor to data CUDA datatype.

Data can tell you a lot if you ask the right questions.

To understand data, data scientists spends most of their time gathering datasets and preprocessing them. Further tasks are comparatively easy.

In this guide, we will use the MNIST database, a collection of 70,000 handwritten digits split into 60,000 training images and 10,000 testing images. We will use Pytorch as it provides the clean and prepared data ready for implementation with minimal lines of code.

So let's begin by making the following imports.

`1 2 3 4 5 6 7`

`import numpy as np import torch import torchvision import matplotlib.pyplot as plt from time import time from torchvision import datasets, transforms from torch import nn, optim`

python

We have a common problem here in that these samples are not the same size. Neural networks will require images to be of fixed size. Hence, we need to define transformations to our data before feeding it into the pipeline.

We do this using torchvision library form torch. It will help us transform (using `torchvision.transform`

) and load our dataset (using `torchvision.dataset`

).

`1 2 3`

`transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)), ])`

python

`transforms.Compose()`

: This clubs all the transforms provided to it. Compose is applied to the inputs one by one.

`transforms.ToTensor()`

: This converts images into numbers so that our system can understand. Then it scales the `PIL.Image`

(RGB) or `numpy.ndarray`

(H x W x C) in the range `\[0, 255\]`

using `torch.ToTensor`

to shape (C x H x W) in the range `\[0, 1\]`

. Images are now converted to *Torch Tensor*.

`transforms.Normalize()`

: These values are estimated mean and standard deviation.

Create a folder in the directory where you desire to download the data. After downloading the data, load it into Dataloader. At each epoch, `dataloader`

will shuffle the images, making the model robust by getting a new order of exploration.

`batch_size`

denotes the number of samples contained in each generated batch.

We will do some data analysis to check if our image and label shapes match each other.

`1 2 3 4 5`

`trainset = datasets.MNIST(r'..\input\MNIST', download=True, train=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True) testset = datasets.MNIST(r'..\input\MNIST', download=True, train=False, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)`

python

`1 2 3 4 5`

`dataiter = iter(trainloader) # creating a iterator images, labels = dataiter.next() # creating images for image and lables for image number (0 to 9) print(images.shape) print(labels.shape)`

python

`torch.Size([64, 1, 28, 28])`

: This shows that there are 64 images in each batch, which has 1 color channel and 28x28 pixel dimensions.

`torch.Size([64])`

: 64 images should have 64 labels associated with them.

Let's display one random image from the training set.

`1`

`plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');`

python

That's great! Now let's display a grid of random images in the data set. You will get a glimpse of what the dataset will look like before being fed to the neural network.

`1 2 3 4 5 6`

`figure = plt.figure() num_of_images = 60 for index in range(1, num_of_images + 1): plt.subplot(6, 10, index) plt.axis('off') plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')`

python

A *neural network* is simply a function that fits some data, typically called *neurons*. Each neuron has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit's output. A single neuron will have no advantage in deep learning. Hence, multiple neurons are combined to form a neural network using this equation:

PyTorch provides an easy way to build networks like this. By using `nn.Sequential`

, a tensor is passed sequentially through operations.

It wraps up the network into three linear layers with `ReLu`

and `Tanh`

activation function. Very often, `softmax`

produces a probability close to 0, and 1 and floating-point numbers cannot represent values 0 and 1. Hence it's more convenient to build the model with a `log-softmax`

output using `nn.LogSoftmax`

.

`1 2 3 4 5 6 7 8 9 10`

`# Model creation with neural net Sequential model model=nn.Sequential(nn.Linear(784,128), # 1 layer:- 784 input 128 o/p nn.ReLU(), # Defining Regular linear unit as activation nn.Linear(128,64), # 2 Layer:- 128 Input and 64 O/p nn.Tanh(), # Defining Regular linear unit as activation nn.Linear(64,10), # 3 Layer:- 64 Input and 10 O/P as (0-9) nn.LogSoftmax(dim=1) # Defining the log softmax to find the probablities for the last output unit ) print(model)`

python

To calculate losses in PyTorch, we will use the `.nn module`

and define Negative Log-Likelihood Loss. *Likelihood* refers to the chance of certain calculated parameters producing certain known data. Note that `criterion`

combines nn.NLLLoss() and Logsoftmax() into one single class. The input contains the scores (raw output) of each class. With the `softmax`

function, you will likely use cross-entropy loss.

To calculate the loss, first define the criterion, then pass the output of your network with the correct labels.

`1 2`

`# defining the negative log-likelihood loss for calculating loss criterion = nn.NLLLoss()`

python

`1 2 3 4 5`

`images, labels = next(iter(trainloader)) images = images.view(images.shape[0], -1) logps = model(images) #log probabilities loss = criterion(logps, labels) #calculate the NLL-loss`

python

The autograd module *auto*matically calculates the *grad*ient of the tensor. The small change in the input weight that reflects the change in loss is called the *gradient* of that weight and is calculated using *backpropagation*. It is then used to update the weights by using a learning rate. It reduces the overall loss and trains the neural net.

If `requires_grad = False`

, it will hold a `None`

value. It will continue to hold the None value unless `.backward()`

function is called from some other node. This will calculate gradient of loss with respect to weight.

`1 2 3`

`print('Before backward pass: \n', model[0].weight.grad) loss.backward() # to calculate gradients of parameter print('After backward pass: \n', model[0].weight.grad)`

python

The optimizer will update the parameters based on the computed gradients. It will update the weights using backpropagation. The result is visible as a gradual decrease in training loss with each epoch.

`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17`

`# defining the optimiser with stochastic gradient descent and default parameters optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) print('Initial weights - ', model[0].weight) images, labels = next(iter(trainloader)) images.resize_(64, 784) # Clear the gradients, do this because gradients are accumulated optimizer.zero_grad() # Forward pass output = model(images) loss = criterion(output, labels) # the backward pass and update weights loss.backward() print('Gradient -', model[0].weight.grad)`

python

`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32`

`time0 = time() epochs = 15 # total number of iteration for training running_loss_list= [] epochs_list = [] for e in range(epochs): running_loss = 0 for images, labels in trainloader: # Flatenning MNIST images with size [64,784] images = images.view(images.shape[0], -1) # defining gradient in each epoch as 0 optimizer.zero_grad() # modeling for each image batch output = model(images) # calculating the loss loss = criterion(output, labels) # This is where the model learns by backpropagating loss.backward() # And optimizes its weights here optimizer.step() # calculating the loss running_loss += loss.item() else: print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader))) print("\nTraining Time (in minutes) =",(time()-time0)/60)`

python

We are almost there. We have calculated the loss, done backward pass, and updated weights, and the accuracy looks excellent. Before we put the algorithm through the loop for all the images, let's validate our result using one image.

The function `classify`

displays the image and the predicted probability in the form of a bar graph.

`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16`

`def classify(img, ps): ''' Function for viewing an image and it's predicted classes. ''' ps = ps.data.numpy().squeeze() fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2) ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze()) ax1.axis('off') ax2.barh(np.arange(10), ps) ax2.set_aspect(0.1) ax2.set_yticks(np.arange(10)) ax2.set_yticklabels(np.arange(10)) ax2.set_title('Class Probability') ax2.set_xlim(0, 1.1) plt.tight_layout()`

python

`1 2 3 4 5 6 7 8 9 10 11 12 13`

`images, labels = next(iter(testloader)) # replace trainloader to check training accuracy. img = images[0].view(1, 784) # Turn off gradients to speed up this part with torch.no_grad(): logpb = model(img) # Output of the network are log-probabilities, need to take exponential for probabilities pb = torch.exp(logpb) probab = list(pb.numpy()[0]) print("Predicted Digit =", probab.index(max(probab))) classify(img.view(1, 28, 28), pb)`

python

Our model is working! Now let's iterate through the validation set using the loop to calculate the total number of correct predictions and accuracy of the model.

`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19`

`correct_count, all_count = 0, 0 for images,labels in testloader: for i in range(len(labels)): img = images[i].view(1, 784) with torch.no_grad(): logps = model(img) ps = torch.exp(logps) probab = list(ps.numpy()[0]) pred_label = probab.index(max(probab)) true_label = labels.numpy()[i] if(true_label == pred_label): correct_count += 1 all_count += 1 print("Number Of Images Tested =", all_count) print("\nModel Accuracy =", (correct_count/all_count))`

python

`1`

`torch.save(model, 'path/to/save/my_mnist_model.pt') # or .pth extension`

python

HURRAY! We have over 97.2% accuracy. We don't need to train the model every time. PyTorch has a functionality that can save our model so that in the future, we can load it and use it directly.

The volume of data made it easy for our model to read even the most unrecognizable numbers. With PyTorch, we were able to concentrate more on developing our model than cleaning the data. The field is now yours. Experiment more on the MNIST dataset by adding hidden layers to the network, applying a different combination of activation functions, or increasing the number of epochs, and see how it affects the accuracy of the test data.

For any questions regarding this guide, feel free to reach out to me at CodeAlphabet.

5