Author avatar

Gaurav Singhal

Expediting Deep Learning with Transfer Learning: PyTorch Playbook

Gaurav Singhal

  • Jul 16, 2020
  • 11 Min read
  • Jul 16, 2020
  • 11 Min read
Data Analytics
Machine Learning



Yes! It is easier for you to learn how to play the electronic guitar if you already know how to play the acoustic guitar. You don't have to learn the basics of electric guitar again from the beginning. We humans can intelligently apply knowledge learned previously to a different task or domain and use that knowledge to solve new problems effectively.

Can a machine imitate this knowledge transferring power of the human brain? Yes, thanks to transfer learning (TL).

This guide will cover the motivation and types of TL. For a brief introduction to pre-trained models and a hands-on example, check out this Kaggel competition, Dogs vs Cats. As it is a two-class classification, in machine learning terms it is known as a binary classification problem.


Transfer learning has an emphasis on storing the knowledge gained while solving one task and applying it to different but related tasks. A basic learning process is shown below.

Basic learning process

Traditionally, CNN and deep learning algorithms are used for solving specific tasks. Once the feature-space distribution changes, the model needs to be built from scratch. The initial layers in the convolution network detect the low-level features like intensities, colors, edges, etc. Whether you are detecting a car, human, or animal, these layers are common. The deeper layers will detect more complex features like shape, face, pattern, etc.

The bigger the training data the better the prediction accuracy will be. While working on complex problems, humongous training image data is required. That means stacking more and more layers to make the network deeper. But suppose stacking 1000 neural network (NN) layers doesn't yield good results, and made the situation worse.

Transfer learning to the rescue!

Instead of creating the whole network from scratch, the model can learn the features of one task and apply them to another. It is common to use a pre-trained model. Most commonly these models have trained on ImageNet--1.2 million images with 1000 categories.

Remember to change your classification layer (FC) to the same number of classes that you need to predict.

Types of Transfer Learning Techniques

There are three types of TL techniques: Inductive, Transductive, and Unsupervised. Below is an overview of different settings to transfer.

Transfer Learning Techniques

Their definitions and differences are given below.

definition and difference

The image below shows the different approaches to implement TL from the source domain to the target domain

types of transfer learning

different approaches used in different settings

Applying Transfer Learning


Understand the common knowledge between source and target domain/task to improve the performance of the target task.


It is suggested not to use TL if your target dataset is distinct from the training dataset (ImageNet), which is generally not the case considering the variety of images. This type of knowledge transfer is known as negative transfer. ImageNet weights will not help if target images are out of their scope, for example, medical images or images from a telescope.


When the source and target domains and tasks are related, identify the different transfer learning techniques.

Follow the table below and match your requirements.

relationship between traditional machine learning and various transfer learning settings

Now you'll learn how to apply these TL techniques to Deep Learning.

Apply Transfer Learning in Deep Learning

Pre-trained Models

Deep learning requires a good amount of training time and data compared to machine learning (computer vision). You can save some time using pre-trained models to extract the features, fine-tune their weights, save them, and make them available for others to use. This is also known as deep transfer learning

Below are some famous types of pre-trained models available to download at Pytorch API.

  1. ResNet
  2. DenseNet
  3. VGG-16
  4. MobileNet

Pre-trained models will give the benefits of high accuracy and speed, saving you from weeks of work to train and create these models from scratch.


The deeper layers of pre-trained models are used for learning features and are fine-tuned. To implement transfer learning with fine-tuning, the last layers are replaced when the trainable layer is added.


The earlier layers are more generalized even if the data is new and small. The results would come out absolutely fine even if you freeze the initial layers and retain the rest. For larger datasets, you may retain the complete network with initial weights.

This guide will use a DenseNet121 pre-trained model as a feature extractor. The data has a constraint of having fewer training samples per category. Even if the input images are new and never existed before in ImageNet, the model has to extract appropriate features and predict the results.

Implementation in Python

Import the important libraries.

1import torchvision 
2import torch.nn as nn 
3import torch 
4import torch.nn.functional as F 
5from torchvision import transforms,models,datasets 
6import matplotlib.pyplot as plt 
7from PIL import Image 
8import numpy as np 
9from torch import optim 

Call the images and transform using the transform.Compose function.

1train_data_dir = '/input/cat-and-dog/training_set/training_set' 
3transform = transforms.Compose([transforms.Resize(255), 
4    transforms.CenterCrop(224), 
5    transforms.ToTensor()]) 
7dataset = torchvision.datasets.ImageFolder(train_data_dir, transform= transform) 
8train_loader =, batch_size=400 ,shuffle=True) 
9test_data_dir = '/input/cat-and-dog/test_set/test_set' 
11transform = transforms.Compose([transforms.Resize(255), 
12    transforms.CenterCrop(224), 
13    transforms.ToTensor()]) 
15test_dataset = torchvision.datasets.ImageFolder(test_data_dir, transform= transform) 
16test_loader =, batch_size=400 ,shuffle=True) 
1def imshow(inp, title=None): 
2    """Imshow for Tensor.""" 
3    inp = inp.numpy().transpose((1, 2, 0)) 
4    plt.figure(figsize=(20,150)) 
5    plt.imshow(inp) 
7inputs, classes = next(iter(train_loader)) 
9# Make a grid from batch 
10out = torchvision.utils.make_grid(inputs, scale_each= True) 


Download the pre-trained model.

1model = models.densenet121(pretrained = True) 

model output 1

model output 2

Change the output layer with an activation layer of Logsoftmax().

1for params in model.parameters(): 
2    params.requires_grad = False 
4from collections import OrderedDict 
6classifier = nn.Sequential(OrderedDict([ 
7    ('fc1',nn.Linear(1024,500)), 
8    ('relu',nn.ReLU()), 
9    ('fc2',nn.Linear(500,2)), 
10    ('Output',nn.LogSoftmax(dim=1)) 
13model.classifier = classifier 

Train the model.

1model = model.cuda() 
2optimizer= optim.Adam(model.classifier.parameters()) 
3criterian= nn.NLLLoss() 
7for epoch in range(10): 
8    train_loss= 0 
9    test_loss= 0 
10    for bat,(img,label) in enumerate(train_loader): 
11        # moving batch and labels to gpu 
12        img ='cuda:0') 
13        label ='cuda:0') 
14        model.train() 
15        optimizer.zero_grad() 
17        output = model(img) 
18        loss = criterian(output,label) 
19        loss.backward() 
20        optimizer.step() 
21        train_loss = train_loss+loss.item() 
23    accuracy=0 
24    for bat,(img,label) in enumerate(test_loader): 
25        img ='cuda:0') 
26        label ='cuda:0') 
27        model.eval() 
28        logps= model(img) 
29        loss = criterian(logps,label) 
30        test_loss+= loss.item() 
31        ps=torch.exp(logps) 
32        top_ps,top_class=ps.topk(1,dim=1) 
33        equality=top_class == label.view(*top_class.shape) 
34        accuracy +=torch.mean(equality.type(torch.FloatTensor)).item() 
36    list_train_loss.append(train_loss/20) 
37    list_test_loss.append(test_loss/20) 
38    print('epoch: ',epoch,' train_loss: ',train_loss/20,' test_loss: ',test_loss/20,' accuracy: ', accuracy/len(test_loader)) 


1samples, _ = iter(test_loader).next() 
2samples ='cuda:0') 
3fig = plt.figure(figsize=(24, 16)) 
5output = model(samples[:24]) 
6pred = torch.argmax(output, dim=1) 
7pred = [p.item() for p in pred] 
8ad = {0:'I guess it\'s a cat', 1:'I guess it\'s a dog'} 
9for num, sample in enumerate(samples[:24]): 
10    plt.subplot(4,6,num+1) 
11    plt.title(ad[pred[num]]) 
12    plt.axis('off') 
13    sample = sample.cpu().numpy() 
14    plt.imshow(np.transpose(sample, (1,2,0))) 


1import matplotlib.pyplot as plt 
2figs , ax = plt.subplots(1,2,figsize=(20,5)) 

out graph

Well done! The accuracy is 99%.


The objective of this guide is to give a brief introduction to transfer learning and its types and approaches, as well as how it can be applied in deep learning. Check out the links provided in this guide for the different pre-trained models for deep learning, and try executing them on the cat and dog dataset. I encourage you to apply this model in your own dataset, but make sure you change the classification layer as per the problem statement.

I hope you learned something new today. Happy learning!