Important Update
The Guide Feature will be discontinued after December 15th, 2023. Until then, you can continue to access and refer to the existing guides.
Author avatar

Gaurav Singhal

Building Your First PyTorch Solution

Gaurav Singhal

  • Feb 5, 2020
  • 14 Min read
  • 10,422 Views
  • Feb 5, 2020
  • 14 Min read
  • 10,422 Views
Data
Pytorch

Introduction

Consider the following sequence of numbers. How difficult are they for you to comprehend?

img

Figuring out this sequence of numbers is easy, even though the resolution is distorted and the shape of the digits is irregular. Thanks to our brains, which made this process feel natural. We should, of course, also be thankful to ourselves, having spent years learning and applying the numbers in our day-to-day lives.

In the 21st century, is it possible to imitate the human brain? Yes, with artificial intelligence, or deep learning algorithms, which allow computers to learn and perform tasks that seem quite natural and repetitive to human brains—tasks like handwritten digit recognition.

PyTorch

In this guide, we will use Neural Networks (NN) to develop a handwritten digit classifier from scratch using PyTorch.

In PyTorch, a matrix (array) is called a tensor. Tensors are the arrays of numbers or functions that obey definite transformation rules. PyTorch tensors are like NumPy arrays. They are just n-dimensional arrays that work on numeric computation, which knows nothing about deep learning or gradient or computational graphs. A vector is a 1-dimensional tensor. A matrix is a 2-dimensional tensor, and an array with three indices is a 3-dimensional tensor (RGB color images).

The only difference between a NumPy array and a Pytorch tensor is that the PyTorch tensor can work on CPU or GPU, and the NumPy array has no GPU backend. To work on GPU, we need to cast our tensor to data CUDA datatype.

Know Your Data

Data can tell you a lot if you ask the right questions.

To understand data, data scientists spends most of their time gathering datasets and preprocessing them. Further tasks are comparatively easy.

In this guide, we will use the MNIST database, a collection of 70,000 handwritten digits split into 60,000 training images and 10,000 testing images. We will use Pytorch as it provides the clean and prepared data ready for implementation with minimal lines of code.

So let's begin by making the following imports.

1import numpy as np
2import torch
3import torchvision
4import matplotlib.pyplot as plt
5from time import time
6from torchvision import datasets, transforms
7from torch import nn, optim
python

We have a common problem here in that these samples are not the same size. Neural networks will require images to be of fixed size. Hence, we need to define transformations to our data before feeding it into the pipeline.

We do this using torchvision library form torch. It will help us transform (using torchvision.transform) and load our dataset (using torchvision.dataset).

1transform = transforms.Compose([transforms.ToTensor(),
2                              transforms.Normalize((0.5,), (0.5,)),
3                              ])
python

transforms.Compose(): This clubs all the transforms provided to it. Compose is applied to the inputs one by one.

transforms.ToTensor(): This converts images into numbers so that our system can understand. Then it scales the PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range \[0, 255\] using torch.ToTensor to shape (C x H x W) in the range \[0, 1\]. Images are now converted to Torch Tensor.

transforms.Normalize(): These values are estimated mean and standard deviation.

Explore Your Data

Create a folder in the directory where you desire to download the data. After downloading the data, load it into Dataloader. At each epoch, dataloader will shuffle the images, making the model robust by getting a new order of exploration.

batch_size denotes the number of samples contained in each generated batch.

We will do some data analysis to check if our image and label shapes match each other.

1trainset = datasets.MNIST(r'..\input\MNIST', download=True, train=True, transform=transform)
2trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
3
4testset = datasets.MNIST(r'..\input\MNIST', download=True, train=False, transform=transform)
5testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)
python
1dataiter = iter(trainloader) # creating a iterator
2images, labels = dataiter.next() # creating images for image and lables for image number (0 to 9) 
3
4print(images.shape)
5print(labels.shape)
python

Imgur

torch.Size([64, 1, 28, 28]): This shows that there are 64 images in each batch, which has 1 color channel and 28x28 pixel dimensions.

torch.Size([64]): 64 images should have 64 labels associated with them.

Let's display one random image from the training set.

1plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');
python

Imgur

That's great! Now let's display a grid of random images in the data set. You will get a glimpse of what the dataset will look like before being fed to the neural network.

1figure = plt.figure()
2num_of_images = 60
3for index in range(1, num_of_images + 1):
4    plt.subplot(6, 10, index)
5    plt.axis('off')
6    plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')
python

Imgur

Building a Neural Network

A neural network is simply a function that fits some data, typically called neurons. Each neuron has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit's output. A single neuron will have no advantage in deep learning. Hence, multiple neurons are combined to form a neural network using this equation:

Imgur

PyTorch provides an easy way to build networks like this. By using nn.Sequential, a tensor is passed sequentially through operations.

It wraps up the network into three linear layers with ReLu and Tanh activation function. Very often, softmax produces a probability close to 0, and 1 and floating-point numbers cannot represent values 0 and 1. Hence it's more convenient to build the model with a log-softmax output using nn.LogSoftmax.

Imgur

1# Model creation with neural net Sequential model
2model=nn.Sequential(nn.Linear(784,128), # 1 layer:- 784 input 128 o/p
3                    nn.ReLU(),          # Defining Regular linear unit as activation
4                    nn.Linear(128,64),  # 2 Layer:- 128 Input and 64 O/p
5                    nn.Tanh(),          # Defining Regular linear unit as activation
6                    nn.Linear(64,10),   # 3 Layer:- 64 Input and 10 O/P as (0-9)
7                    nn.LogSoftmax(dim=1) # Defining the log softmax to find the probablities for the last output unit
8                  ) 
9
10print(model)
python

Imgur

Loss in PyTorch

To calculate losses in PyTorch, we will use the .nn module and define Negative Log-Likelihood Loss. Likelihood refers to the chance of certain calculated parameters producing certain known data. Note that criterion combines nn.NLLLoss() and Logsoftmax() into one single class. The input contains the scores (raw output) of each class. With the softmax function, you will likely use cross-entropy loss.

To calculate the loss, first define the criterion, then pass the output of your network with the correct labels.

1# defining the negative log-likelihood loss for calculating loss
2criterion = nn.NLLLoss() 
python
1images, labels = next(iter(trainloader))
2images = images.view(images.shape[0], -1)
3
4logps = model(images) #log probabilities
5loss = criterion(logps, labels) #calculate the NLL-loss
python

Autograd and Weights

The autograd module automatically calculates the gradient of the tensor. The small change in the input weight that reflects the change in loss is called the gradient of that weight and is calculated using backpropagation. It is then used to update the weights by using a learning rate. It reduces the overall loss and trains the neural net.

If requires_grad = False, it will hold a None value. It will continue to hold the None value unless .backward() function is called from some other node. This will calculate gradient of loss with respect to weight.

1print('Before backward pass: \n', model[0].weight.grad)
2loss.backward() # to calculate gradients of parameter 
3print('After backward pass: \n', model[0].weight.grad)
python

Imgur

Training the Neural Network (Learning)

The optimizer will update the parameters based on the computed gradients. It will update the weights using backpropagation. The result is visible as a gradual decrease in training loss with each epoch.

1# defining the optimiser with stochastic gradient descent and default parameters
2optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
3
4print('Initial weights - ', model[0].weight)
5
6images, labels = next(iter(trainloader))
7images.resize_(64, 784)
8
9# Clear the gradients, do this because gradients are accumulated
10optimizer.zero_grad()
11
12# Forward pass
13output = model(images)
14loss = criterion(output, labels)
15# the backward pass and update weights
16loss.backward()
17print('Gradient -', model[0].weight.grad)
python

Imgur

1time0 = time()
2epochs = 15 # total number of iteration for training
3running_loss_list= []
4epochs_list = []
5
6for e in range(epochs):
7    running_loss = 0
8    for images, labels in trainloader:
9        # Flatenning MNIST images with size [64,784]
10        images = images.view(images.shape[0], -1) 
11    
12        # defining gradient in each epoch as 0
13        optimizer.zero_grad()
14        
15        # modeling for each image batch
16        output = model(images)
17        
18        # calculating the loss
19        loss = criterion(output, labels)
20        
21        # This is where the model learns by backpropagating
22        loss.backward()
23        
24        # And optimizes its weights here
25        optimizer.step()
26        
27        # calculating the loss
28        running_loss += loss.item()
29        
30    else:
31        print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader)))
32print("\nTraining Time (in minutes) =",(time()-time0)/60)
python

Imgur

Testing and Model Evaluation (Applying)

We are almost there. We have calculated the loss, done backward pass, and updated weights, and the accuracy looks excellent. Before we put the algorithm through the loop for all the images, let's validate our result using one image.

The function classify displays the image and the predicted probability in the form of a bar graph.

1def classify(img, ps):
2    ''' 
3    Function for viewing an image and it's predicted classes.
4    '''
5    ps = ps.data.numpy().squeeze()
6
7    fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
8    ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
9    ax1.axis('off')
10    ax2.barh(np.arange(10), ps)
11    ax2.set_aspect(0.1)
12    ax2.set_yticks(np.arange(10))
13    ax2.set_yticklabels(np.arange(10))
14    ax2.set_title('Class Probability')
15    ax2.set_xlim(0, 1.1)
16    plt.tight_layout()
python
1images, labels = next(iter(testloader))
2# replace trainloader to check training accuracy.
3
4img = images[0].view(1, 784)
5# Turn off gradients to speed up this part
6with torch.no_grad():
7    logpb = model(img)
8
9# Output of the network are log-probabilities, need to take exponential for probabilities
10pb = torch.exp(logpb)
11probab = list(pb.numpy()[0])
12print("Predicted Digit =", probab.index(max(probab)))
13classify(img.view(1, 28, 28), pb)
python

Imgur

Our model is working! Now let's iterate through the validation set using the loop to calculate the total number of correct predictions and accuracy of the model.

1correct_count, all_count = 0, 0
2for images,labels in testloader:
3  for i in range(len(labels)):
4    img = images[i].view(1, 784)
5
6    with torch.no_grad():
7        logps = model(img)
8
9    ps = torch.exp(logps)
10    probab = list(ps.numpy()[0])
11    pred_label = probab.index(max(probab))
12    true_label = labels.numpy()[i]
13    if(true_label == pred_label):
14      correct_count += 1
15    all_count += 1
16
17print("Number Of Images Tested =", all_count)
18print("\nModel Accuracy =", (correct_count/all_count))
19    
python

Imgur

1torch.save(model, 'path/to/save/my_mnist_model.pt') # or .pth extension
python

HURRAY! We have over 97.2% accuracy. We don't need to train the model every time. PyTorch has a functionality that can save our model so that in the future, we can load it and use it directly.

Conclusion

The volume of data made it easy for our model to read even the most unrecognizable numbers. With PyTorch, we were able to concentrate more on developing our model than cleaning the data. The field is now yours. Experiment more on the MNIST dataset by adding hidden layers to the network, applying a different combination of activation functions, or increasing the number of epochs, and see how it affects the accuracy of the test data.

For any questions regarding this guide, feel free to reach out to me at CodeAlphabet.