Deep Learning is one of the hottest topics in data science today. This is not surprising given the tremendous amount of fascinating applications being developed using deep learning, such as self-driving cars, color restoration, natural language processing, automatic machine translation, image classification, and many more.
There are many deep learning libraries out there, but the most popular ones are TensorFlow, Keras, and PyTorch. Although TensorFlow and Pytorch are immensely popular, they are not easy to use and have a steep learning curve. So, for many practitioners, Keras is the preferred choice.
The Keras library is a high-level API for building deep learning models that has gained favor for its ease of use and simplicity facilitating fast development. Often, building a very complex deep learning network with Keras can be achieved with only a few lines of code.
In this guide, we will focus on how to use the Keras library to build regression models.
Regression is a type of supervised machine learning algorithm used to predict a continuous label. The goal is to produce a model that represents the ‘best fit’ to some observed data, according to an evaluation criterion.
The basic architecture of the deep learning neural network, which we will be following, consists of three main components.
1) Input Layer: This is where the training observations are fed. The number of predictor variables is also specified here through the neurons.
2) Hidden Layers: These are the intermediate layers between the input and output layers. The deep neural network learns about the relationships involved in data in this component.
3) Output Layer: This is the layer where the final output is extracted from what’s happening in the previous two layers. In case of regression problems, the output later will have one neuron.
Unemployment is a major socio-economic and political issue for any country and, hence, managing it is a chief task for any government. But to manage unemployment within an economy, it is imperative to predict it as well. This is what this guide will aim to achieve. The guide will be building a deep learning regression model using Keras to predict unemployment.
The data used in this project was produced from US economic time series data available from http://research.stlouisfed.org/fred2. The data contains 574 rows and 5 variables, as described below:
1- psavert - personal savings rate. 2- pce - personal consumption expenditures, in billions of dollars. 3- uempmed - median duration of unemployment, in weeks. 4- pop - total population, in thousands. 5- unemploy- number of unemployed in thousands (dependent variable).
We will evaluate the performance of the model using Root Mean Squared Error (RMSE), a commonly used metric for regression problems. In simple terms, RMSE measures the average magnitude of the residuals or error. Mathematically, it is computed as the square root of the average of squared differences between predicted and actual values.
Following are the steps which are commonly followed while implementing Regression Models with Keras.
Step 1 - Loading the required libraries and modules.
Step 2 - Loading the data and performing basic data checks.
Step 3 - Creating arrays for the features and the response variable.
Step 4 - Creating the training and test datasets.
Step 5 - Define, compile, and fit the Keras regression model.
Step 6 - Predict on the test data and compute evaluation metrics.
The following sections will cover these steps.
1# Import required libraries 2import pandas as pd 3import numpy as np 4import matplotlib.pyplot as plt 5import sklearn 6 7# Import necessary modules 8from sklearn.model_selection import train_test_split 9from sklearn.metrics import mean_squared_error 10from math import sqrt 11 12# Keras specific 13import keras 14from keras.models import Sequential 15from keras.layers import Dense
The first line of code reads in the data as pandas dataframe, while the second line of code prints the shape - 574 observations of 5 variables. The third line gives summary statistics of the numerical variables. We can see that all the variables have 574 as 'count' which is equal to the number of records in the dataset. That means we don't have missing values.
1df = pd.read_csv('regressionexample.csv') 2print(df.shape) 3df.describe()
The first line of code creates an object of the target variable, while the second line of code gives us the list of all the features, excluding the target variable 'unemploy'.
The third line normalizes the predictors. This is important because the units of the variables differ significantly and may influence the modeling process. To prevent this, we will do normalization via scaling of the predictors between 0 and 1.
The fourth line displays the summary of the normalized data. We can see that all the independent variables have now been scaled between 0 and 1. The target variable remains unchanged.
1target_column = ['unemploy'] 2predictors = list(set(list(df.columns))-set(target_column)) 3df[predictors] = df[predictors]/df[predictors].max() 4df.describe()
The first couple of lines creates arrays of independent (X) and dependent (y) variables, respectively. The third line splits the data into training and test dataset, while the fourth line prints the shape of the training set (401 observations of 4 variables) and test set (173 observations of 4 variables).
1X = df[predictors].values 2y = df[target_column].values 3 4X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40) 5print(X_train.shape); print(X_test.shape)
1(401, 4) 2(173, 4)
We will build a regression model using deep learning in Keras. To begin with, we will define the model. The first line of code below calls for the Sequential constructor. Note that we would be using the Sequential model because our network consists of a linear stack of layers. The second line of code represents the first layer which specifies the activation function and the number of input dimensions, which in our case is 4 predictors. Then we repeat the same process in the third and fourth line of codes for the hidden layers, this time without the input_dim parameter. The last line of code creates the output layer with one node that is supposed to output the number of unemployed in thousands.
The activation function used in the hidden layers is a rectified linear unit, or ReLU. It is the most widely used activation function because of its advantages of being nonlinear, as well as the ability to not activate all the neurons at the same time. In simple terms, this means that at a time, only a few neurons are activated, making the network sparse and very efficient.
1# Define model 2model = Sequential() 3model.add(Dense(500, input_dim=4, activation= "relu")) 4model.add(Dense(100, activation= "relu")) 5model.add(Dense(50, activation= "relu")) 6model.add(Dense(1)) 7#model.summary() #Print model Summary
The next step is to define an optimizer and the loss measure for training. The mean squared error is our loss measure and the "adam" optimizer is our minimization algorithm. The main advantage of the "adam" optimizer is that we don't need to specify the learning rate as is the case with gradient descent; thereby saving us the task of optimizing the learning rate for our model. We achieve this task with the first line of the code below.
The second line of code fits the model on the training dataset. We also provide the argument, epochs, which represents the number of training iterations. We have taken 20 epochs.
1model.compile(loss= "mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"]) 2model.fit(X_train, y_train, epochs=20)
1Epoch 1/20 2401/401 [==============================] - 0s 1ms/step - loss: 68136318.3441 - mean_squared_error: 68136318.3441 3Epoch 2/20 4401/401 [==============================] - 0s 133us/step - loss: 68101432.0698 - mean_squared_error: 68101432.0698 5Epoch 3/20 6401/401 [==============================] - 0s 125us/step - loss: 67985495.1022 - mean_squared_error: 67985495.1022 7Epoch 4/20 8401/401 [==============================] - 0s 134us/step - loss: 67665023.0524 - mean_squared_error: 67665023.0524 9Epoch 5/20 10401/401 [==============================] - 0s 127us/step - loss: 66899397.2868 - mean_squared_error: 66899397.2868 11Epoch 6/20 12401/401 [==============================] - 0s 107us/step - loss: 65355226.3042 - mean_squared_error: 65355226.3042 13Epoch 7/20 14401/401 [==============================] - 0s 120us/step - loss: 62432633.3566 - mean_squared_error: 62432633.3566 15Epoch 8/20 16401/401 [==============================] - 0s 128us/step - loss: 57537882.0549 - mean_squared_error: 57537882.0549 17Epoch 9/20 18401/401 [==============================] - 0s 150us/step - loss: 50086165.6958 - mean_squared_error: 50086165.6958 19Epoch 10/20 20401/401 [==============================] - 0s 119us/step - loss: 39984370.9975 - mean_squared_error: 39984370.9975 21Epoch 11/20 22401/401 [==============================] - 0s 97us/step - loss: 28126145.2868 - mean_squared_error: 28126145.2868 23Epoch 12/20 24401/401 [==============================] - 0s 110us/step - loss: 16095036.0499 - mean_squared_error: 16095036.0499 25Epoch 13/20 26401/401 [==============================] - 0s 126us/step - loss: 7629222.0150 - mean_squared_error: 7629222.0150 27Epoch 14/20 28401/401 [==============================] - 0s 107us/step - loss: 4147607.1696 - mean_squared_error: 4147607.1696 29Epoch 15/20 30401/401 [==============================] - 0s 107us/step - loss: 3668975.7581 - mean_squared_error: 3668975.7581 31Epoch 16/20 32401/401 [==============================] - 0s 111us/step - loss: 3646548.0898 - mean_squared_error: 3646548.0898 33Epoch 17/20 34401/401 [==============================] - 0s 126us/step - loss: 3563563.1328 - mean_squared_error: 3563563.1328 35Epoch 18/20 36401/401 [==============================] - 0s 117us/step - loss: 3533091.9377 - mean_squared_error: 3533091.9377 37Epoch 19/20 38401/401 [==============================] - 0s 123us/step - loss: 3496560.1110 - mean_squared_error: 3496560.1110 39Epoch 20/20 40401/401 [==============================] - 0s 132us/step - loss: 3467280.0112 - mean_squared_error: 3467280.0112
The first line of code predicts on the train data, while the second line prints the RMSE value on the train data. The same is repeated in the third and fourth lines of code which predicts and prints the RMSE value on test data.
1pred_train= model.predict(X_train) 2print(np.sqrt(mean_squared_error(y_train,pred_train))) 3 4pred= model.predict(X_test) 5print(np.sqrt(mean_squared_error(y_test,pred)))
The output above shows that the RMSE, which is our evaluation metric, was 1856 thousand for train data and 1825 thousand for test data. Ideally, the lower the RMSE value, the better the model performance. However, in contrast to accuracy, it is not straightforward to interpret RMSE as we would have to look at the unit which in our case is in thousands.
In this guide, we have built Regression models using the deep learning framework, Keras. The guide used the US economics time series data and built a deep learning regression model to predict the number of unemployed population in thousands.
Our model is achieving a stable performance with not much variance in the train and test set RMSE. The most ideal result would be an RMSE value of zero, but that's almost impossible in real economic datasets. Also, since the unit of the target variable is in thousands, that also affects the RMSE value.
There are other iterations such as changing the number of neurons, adding more hidden layers, or increasing the number of epochs, which can be tried out to see the impact on model performance.
This regression problem could also be modeled using other algorithms such as Decision Tree, Random Forest, Gradient Boosting or Support Vector Machines. However, that is not in the scope of this guide which is aimed at enabling individuals to solve Regression problems using deep learning library Keras.