Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Building Deep Learning Networks with PyTorch

Mar 11, 2020 • 11 Minute Read

Introduction

Deep learning is one of the most popular topics in data science and artificial intelligence today. It is a sub-field of machine learning, comprising of a set of algorithms that are based on learning representations of data. Deep learning has been applied in some of the most exciting technological innovations today, such as robotics, autonomous vehicles, computer vision, natural language processing, image recognition, and many more.

There are many deep learning libraries out there, but the most popular are TensorFlow, Keras, and PyTorch. We will be focusing on Pytorch, which is based on the Torch library. It is an open-source machine learning library primarily developed by Facebook's AI Research lab (FAIR). In this guide, you will learn to build deep learning neural network with Pytorch.

Understanding Deep Neural Networks

Neural networks form the basis of deep learning, with algorithms inspired by the architecture of the human brain. Neural networks are made up of layers of neurons, which are the core processing unit of the network. In simple terms, a neuron can be considered a mathematical approximation of a biological neuron.

The basic architecture of a deep learning neural network consists of three main components.

  1. Input Layer: This is where the training observations are fed.

  2. Hidden Layers: These are the intermediate layers between the input and output layers. The deep neural network learns about the relationships involved in data in this component.

  3. Output Layer: This is the layer where the final output is extracted from what’s happening in the previous two layers. In case of classification problems, the output layer will have one of the target classes as output.

Setup

Let’s start by loading the required libraries.

      import torch
import torchvision
from torchvision import transforms, datasets

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
    

Data

We will use the popular MNIST dataset in this guide. The MNIST dataset (Modified National Institute of Standards and Technology) is a large database of handwritten digits that was created by re-mixing the samples from NIST's original datasets. It contains 60,000 training images and 10,000 testing images, and it is a popular dataset used for image classification.

Each image in the dataset has dimensions of 28 by 28 pixels and contains a centered, grayscale digit. The model will take the image as input, and it will output one of the ten possible digits (0 through 9).

In Pytorch, the MNIST data is loaded in the torchvision library that was imported above. The first two lines of code below prepare the datasets, while the last two lines of code use the torch.utils.data.DataLoader() function to prepare the data loading for training and testing datasets.

The argument batch_size = 10 ensures that only 10 images are processed at a time. We are keeping the number small to reduce the processing time, but this can be increased. The num_workers argument specifies how many processors we are going to use to fetch the data.

      train = torchvision.datasets.MNIST('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

test = torchvision.datasets.MNIST('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True, num_workers=2)

testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False, num_workers=2)
    

Having loaded the data in the environment and created the training and test sets, let us look at the shape using the code below.

      trainset_shape = trainset.dataset.train_data.shape
testset_shape = testset.dataset.test_data.shape

print(trainset_shape, testset_shape)
    

Output:

      torch.Size([60000, 28, 28]) torch.Size([10000, 28, 28])
    

There are 70,000 images in the MNIST data, of which 60,000 will be used for training the model and the remaining 10,000 for validating the model. This is displayed in the above output. The dimensions 28, 28 show that the images are grayscale (black and white).

Model Training

We will train the model, for which we’ll create a class, Net. This class in turn inherits from the nn.Module class. The next step is to define the layers of our deep neural network. We start by defining the parameters for the fully connected layers with the __init__() method.

In our case, we have four layers. Each of our layers expects the first parameter to be the input size, which is 28 by 28 in our case. This results in 64 connections, which will become the input for the second layer. We repeat the same step for the third and the fourth layers. The only change in the fourth layer will be that the output is 10 neurons, representing ten classes of the images.

We have defined the layers, but we also need to define how they interact with each other. This is done with the def forward(self, x) function below. We have built a fully connected, feed-forward neural network, which means we go from input to output in a forward manner. The forward step begins with the activation function, which is relu or Rectified Linear Activation.

ReLu is the most widely used activation function in deep neural networks because of its advantages in being nonlinear as well as having the ability to not activate all the neurons at the same time. In simple terms, this means that at a time, only a few neurons are activated, making the network sparse and very efficient.

For the output layer, we'll use the softmax function, often used for a multi-class classification problem.

      class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.log_softmax(x, dim=1)
    

Having trained the model, let us have a look at it with the code below.

      net = Net()
print(net)
    

Output:

      Net(
      (fc1): Linear(in_features=784, out_features=64, bias=True)
      (fc2): Linear(in_features=64, out_features=64, bias=True)
      (fc3): Linear(in_features=64, out_features=64, bias=True)
      (fc4): Linear(in_features=64, out_features=10, bias=True)
    )
    

We built the fully connected neural network (called net) in the previous step, and now we’ll predict the classes of digits. We’ll use the adam optimizer to optimize the network, and considering that this is a classification problem, we’ll use the cross entropy as loss function. This is done using the lines of code below. The lr argument specifies the learning rate of the optimizer function.

      loss_criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.005)
    

The next step is to complete a forward pass on the neural network using the input data. We’ll have five full passes over the data.

The function net.zero_grad() sets gradients to zero before the loss calculation. The function net(X.view(-1,784)) passes in the reshaped batch. The number 784 is a result of the 28 by 28 image dimensions.

The loss_criterion(output, y) function calculates the loss value. The next steps involve computing the gradients of the weights using back propagation, then changing the weights using the adam optimizer. The last line of the code prints the loss for the five passes.

      for epoch in range(5): 
    for data in trainset:  
        X, y = data  
        net.zero_grad()  
        output = net(X.view(-1,784))  
        loss = loss_criterion(output, y)  
        loss.backward()  
        optimizer.step()  
    print(loss)
    

Output:

      tensor(0.1360, grad_fn=<NllLossBackward>)
    tensor(0.1455, grad_fn=<NllLossBackward>)
    tensor(0.1776, grad_fn=<NllLossBackward>)
    tensor(0.2263, grad_fn=<NllLossBackward>)
    tensor(0.0365, grad_fn=<NllLossBackward>)
    

Model Evaluation

We have trained the network, and the next step is to evaluate the model on the test data set. This is done using the code below.

      correct = 0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net(X.view(-1,784))
        
        for idx, i in enumerate(output):
               if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 2))
    

Output:

      Accuracy:  0.95
    

The above output shows that with only five passes, we have achieved accuracy of 95 percent on our test data set, which is a good performance. We can further tune the hyperparameters, such as learning rate or batch size, to improve the model performance.

Conclusion