Introduction to DenseNet with TensorFlow

By Gaurav Singhal

May 6, 2020 • 10 Minute Read

Introduction

DenseNet is one of the new discoveries in neural networks for visual object recognition. DenseNet is quite similar to ResNet with some fundamental differences. ResNet uses an additive method (+) that merges the previous layer (identity) with the future layer, whereas DenseNet concatenates (.) the output of the previous layer with the future layer. Get in-depth knowledge of ResNet in this guide.

Why Do We DenseNet?

DenseNet was developed specifically to improve the declined accuracy caused by the vanishing gradient in high-level neural networks. In simpler terms, due to the longer path between the input layer and the output layer, the information vanishes before reaching its destination.

The primary purpose of this guide is to give insights on DenseNet and implement DenseNet121 using TensorFlow 2.0 (TF 2.0) and Keras.

In this guide, you will work with a data set called Natural Images that can be downloaded from Kaggle.

DenseNet Architecture

DenseNet Structure

DenseNet falls in the category of classic networks.

This image shows a 5-layer dense block with a growth rate of k = 4 and the standard ResNet structure.

An output of the previous layer acts as an input of the second layer by using composite function operation. This composite operation consists of the convolution layer, pooling layer, batch normalization, and non-linear activation layer.

These connections mean that the network has L(L+1)/2 direct connections. L is the number of layers in the architecture.

The DenseNet has different versions, like DenseNet-121, DenseNet-160, DenseNet-201, etc. The numbers denote the number of layers in the neural network. The number 121 is computed as follows:

DenseBlocks and Layers

Be it adding or concatenating, the grouping of layers by the above equation is only possible if feature map dimensions are the same. What if dimensions are different? The DenseNet is divided into DenseBlocks where a number of filters are different, but dimensions within the block are the same. Transition Layer applies batch normalization using downsampling; it's an essential step in CNN.

Let's see what's inside the DenseBlock and transition layer.:

This is the full architecture in abstract form.:

Source: Pablo R

The number of filters changes between the DenseBlocks, increasing the dimensions of the channel. The growth rate (k) helps in generalizing the l-th layer. It controls the amount of information to be added to each layer.

Implementing the Code

Before starting, it is essential to import all the relevant libraries. The main drivers here are tensorflow.keras.applications to import DenseNet121 and tensorflow.keras.layers to import layers involved in building the network.

          import tensorflow 

import pandas as pd
import numpy as np
import os
import keras
import random
import cv2
import math
import seaborn as sns

from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt

from tensorflow.keras.layers import Dense,GlobalAveragePooling2D,Convolution2D,BatchNormalization
from tensorflow.keras.layers import Flatten,MaxPooling2D,Dropout

from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.applications.densenet import preprocess_input

from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator,img_to_array

from tensorflow.keras.models import Model

from tensorflow.keras.optimizers import Adam

from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

import warnings
warnings.filterwarnings("ignore")
    

      print("Tensorflow-version:", tensorflow.__version__)

Output: Tensorflow-version: 2.0.0

          model_d=DenseNet121(weights='imagenet',include_top=False, input_shape=(128, 128, 3)) 

x=model_d.output

x= GlobalAveragePooling2D()(x)
x= BatchNormalization()(x)
x= Dropout(0.5)(x)
x= Dense(1024,activation='relu')(x) 
x= Dense(512,activation='relu')(x) 
x= BatchNormalization()(x)
x= Dropout(0.5)(x)

preds=Dense(8,activation='softmax')(x) #FC-layer
    

          model=Model(inputs=base_model.input,outputs=preds)
model.summary()
    

To avoid the problem of overfitting, avoid training the entire network. layer.trainable=False will freeze all the layers, keeping only the last eight layers (FC) to detect edges and blobs in the image. Once the model is fitted well, it can be fine-tuned by using layer.trainable=True.

          for layer in model.layers[:-8]:
    layer.trainable=False
    
for layer in model.layers[-8:]:
    layer.trainable=True
    

          model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()
    

Notice the drop in the parameters.

          data=[]
labels=[]
random.seed(42)
imagePaths = sorted(list(os.listdir("../input/natural-images/")))
random.shuffle(imagePaths)
print(imagePaths)

for img in imagePaths:
    path=sorted(list(os.listdir("../input/natural-images/"+img)))
    for i in path:
        image = cv2.imread("../input/natural-images/"+img+'/'+i)
        image = cv2.resize(image, (128,128))
        image = img_to_array(image)
        data.append(image)
        l = label = img
        labels.append(l)
    

          data = np.array(data, dtype="float32") / 255.0
labels = np.array(labels)
mlb = LabelBinarizer()
labels = mlb.fit_transform(labels)
print(labels[0])
    

          (xtrain,xtest,ytrain,ytest)=train_test_split(data,labels,test_size=0.4,random_state=42)
print(xtrain.shape, xtest.shape)
    

If the model sees no change in validation loss the ReduceLROnPlateau function will reduce the learning rate, which often benefits the model. The ImageDataGenerator function performs real-time data augmentation over generated tensor image data batches in a loop.

          anne = ReduceLROnPlateau(monitor='val_accuracy', factor=0.5, patience=5, verbose=1, min_lr=1e-3)
checkpoint = ModelCheckpoint('model.h5', verbose=1, save_best_only=True)

datagen = ImageDataGenerator(zoom_range = 0.2, horizontal_flip=True, shear_range=0.2)


datagen.fit(xtrain)
# Fits-the-model
history = model.fit_generator(datagen.flow(xtrain, ytrain, batch_size=128),
               steps_per_epoch=xtrain.shape[0] //128,
               epochs=50,
               verbose=2,
               callbacks=[anne, checkpoint],
               validation_data=(xtrain, ytrain))
    

          ypred = model.predict(xtest)

total = 0
accurate = 0
accurateindex = []
wrongindex = []

for i in range(len(ypred)):
    if np.argmax(ypred[i]) == np.argmax(ytest[i]):
        accurate += 1
        accurateindex.append(i)
    else:
        wrongindex.append(i)
        
    total += 1
    
print('Total-test-data;', total, '\taccurately-predicted-data:', accurate, '\t wrongly-predicted-data: ', total - accurate)
print('Accuracy:', round(accurate/total*100, 3), '%')
    

          label=['dog', 'flower', 'motorbike', 'person', 'cat', 'fruit', 'airplane', 'car']
imidx = random.sample(accurateindex, k=9)# replace with 'wrongindex'

nrows = 3
ncols = 3
fig, ax = plt.subplots(nrows,ncols,sharex=True,sharey=True,figsize=(15, 12))

n = 0
for row in range(nrows):
    for col in range(ncols):
            ax[row,col].imshow(xtest[imidx[n]])
            ax[row,col].set_title("Predicted label :{}\nTrue label :{}".format(label[np.argmax(ypred[imidx[n]])], label[np.argmax(ytest[imidx[n]])]))
            n += 1

plt.show()
    

          Ypred = model.predict(xtest)

Ypred = np.argmax(Ypred, axis=1)
Ytrue = np.argmax(ytest, axis=1)

cm = confusion_matrix(Ytrue, Ypred)
plt.figure(figsize=(12, 12))
ax = sns.heatmap(cm, cmap="rocket_r", fmt=".01f",annot_kws={'size':16}, annot=True, square=True, xticklabels=label, yticklabels=label)
ax.set_ylabel('Actual', fontsize=20)
ax.set_xlabel('Predicted', fontsize=20)
    

Conclusion

You have built a DenseNet model with ~98% accuracy. DenseNet diminishes the vanishing gradient problem, and it requires fewer parameters to train the model. Dynamic feature propagation takes care of the seamless flow of information.

This guide gives the basic knowledge on building the DenseNet-121, its architecture, its advantages, and how it is different from ResNet. From the heat map, we can see that 44 dogs are misclassified as cats, possibly because the misclassified dog pictures have traits similar to the cats. Results can be improved by fine-tuning the model. Try adding or removing more dense blocks and layers, finding the frequency of data in each class, and augmenting the images.

Deep Neural Network is a vast field. Progressive research is carried on to make it simpler to learn and solve complex real-world problems. If you need any help with your projects in Deep Learning, contact me at CodeAlphabet.

References

G. Huang, Z. Liu and L. van der Maaten, “Densely Connected Convolutional Networks,” 2018.

Gaurav S.

Guarav is a Data Scientist with a strong background in computer science and mathematics. He has extensive research experience in data structures, statistical data analysis, and mathematical modeling. With a solid background in Web development he works with Python, JAVA, Django, HTML, Struts, Hibernate, Vaadin, Web Scrapping, Angular, and React. His data science skills include Python, Matplotlib, Tensorflows, Pandas, Numpy, Keras, CNN, ANN, NLP, Recommenders, Predictive analysis. He has built systems that have used both basic machine learning algorithms and complex deep neural network. He has worked in many data science projects, some of them are product recommendation, user sentiments, twitter bots, information retrieval, predictive analysis, data mining, image segmentation, SVMs, RandomForest etc.

More about this author