Introduction

16

Deep Learning is one of the hottest topics in data science today. This is not surprising given the tremendous amount of fascinating applications being developed using deep learning, such as self-driving cars, color restoration, natural language processing, automatic machine translation, image classification, and many more.

There are many deep learning libraries out there, but the most popular ones are TensorFlow, Keras, and PyTorch. Although TensorFlow and Pytorch are immensely popular, they are not easy to use and have a steep learning curve. So, for many practitioners, Keras is the preferred choice.

The Keras library is a high-level API for building deep learning models that has gained favor for its ease of use and simplicity facilitating fast development. Often, building a very complex deep learning network with Keras can be achieved with only a few lines of code.

In this guide, we will focus on how to use the Keras library to build regression models.

Regression is a type of supervised machine learning algorithm used to predict a continuous label. The goal is to produce a model that represents the ‘best fit’ to some observed data, according to an evaluation criterion.

The basic architecture of the deep learning neural network, which we will be following, consists of three main components.

1) Input Layer: This is where the training observations are fed. The number of predictor variables is also specified here through the neurons.

2) Hidden Layers: These are the intermediate layers between the input and output layers. The deep neural network learns about the relationships involved in data in this component.

3) Output Layer: This is the layer where the final output is extracted from what’s happening in the previous two layers. In case of regression problems, the output later will have one neuron.

Unemployment is a major socio-economic and political issue for any country and, hence, managing it is a chief task for any government. But to manage unemployment within an economy, it is imperative to predict it as well. This is what this guide will aim to achieve. The guide will be building a deep learning regression model using Keras to predict unemployment.

The data used in this project was produced from US economic time series data available from http://research.stlouisfed.org/fred2. The data contains 574 rows and 5 variables, as described below:

`1 2 3 4 5`

`- psavert - personal savings rate. - pce - personal consumption expenditures, in billions of dollars. - uempmed - median duration of unemployment, in weeks. - pop - total population, in thousands. - unemploy- number of unemployed in thousands (dependent variable).`

We will evaluate the performance of the model using Root Mean Squared Error (RMSE), a commonly used metric for regression problems. In simple terms, RMSE measures the average magnitude of the residuals or error. Mathematically, it is computed as the square root of the average of squared differences between predicted and actual values.

Following are the steps which are commonly followed while implementing Regression Models with Keras.

*Step 1 - Loading the required libraries and modules.*

*Step 2 - Loading the data and performing basic data checks.*

*Step 3 - Creating arrays for the features and the response variable.*

*Step 4 - Creating the training and test datasets.*

*Step 5 - Define, compile, and fit the Keras regression model.*

*Step 6 - Predict on the test data and compute evaluation metrics.*

The following sections will cover these steps.

`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15`

`# Import required libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import sklearn # Import necessary modules from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error from math import sqrt # Keras specific import keras from keras.models import Sequential from keras.layers import Dense`

python

The first line of code reads in the data as pandas dataframe, while the second line of code prints the shape - 574 observations of 5 variables. The third line gives summary statistics of the numerical variables. We can see that all the variables have 574 as 'count' which is equal to the number of records in the dataset. That means we don't have missing values.

`1 2 3`

`df = pd.read_csv('regressionexample.csv') print(df.shape) df.describe()`

python

`1`

`(574, 5)`

pce | pop | psavert | uempmed | unemploy | |
---|---|---|---|---|---|

count | 574 | 574 | 574 | 574 | 574 |

mean | 4,844 | 2,57,189 | 8 | 9 | 7,772 |

std | 3,579 | 36,731 | 3 | 4 | 2,642 |

min | 507 | 1,98,712 | 2 | 4 | 2,685 |

25% | 1,582 | 2,24,896 | 6 | 6 | 6,284 |

50% | 3,954 | 2,53,060 | 8 | 8 | 7,494 |

75% | 7,667 | 2,90,291 | 11 | 9 | 8,691 |

max | 12,162 | 3,20,887 | 17 | 25 | 15,352 |

The first line of code creates an object of the target variable, while the second line of code gives us the list of all the features, excluding the target variable 'unemploy'.

The third line normalizes the predictors. This is important because the units of the variables differ significantly and may influence the modeling process. To prevent this, we will do normalization via scaling of the predictors between 0 and 1.

The fourth line displays the summary of the normalized data. We can see that all the independent variables have now been scaled between 0 and 1. The target variable remains unchanged.

`1 2 3 4`

`target_column = ['unemploy'] predictors = list(set(list(df.columns))-set(target_column)) df[predictors] = df[predictors]/df[predictors].max() df.describe()`

python

pce | pop | psavert | uempmed | unemploy | |
---|---|---|---|---|---|

count | 574 | 574 | 574 | 574 | 574 |

mean | 0.40 | 0.80 | 0.47 | 0.34 | 7,772 |

std | 0.29 | 0.11 | 0.18 | 0.16 | 2,642 |

min | 0.04 | 0.62 | 0.11 | 0.16 | 2,685 |

25% | 0.13 | 0.70 | 0.32 | 0.24 | 6,284 |

50% | 0.33 | 0.79 | 0.45 | 0.30 | 7,494 |

75% | 0.63 | 0.90 | 0.62 | 0.36 | 8,691 |

max | 1 | 1 | 1 | 1 | 15,352 |

The first couple of lines creates arrays of independent (X) and dependent (y) variables, respectively. The third line splits the data into training and test dataset, while the fourth line prints the shape of the training set (401 observations of 4 variables) and test set (173 observations of 4 variables).

`1 2 3 4 5`

`X = df[predictors].values y = df[target_column].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40) print(X_train.shape); print(X_test.shape)`

python

`1 2`

`(401, 4) (173, 4)`

We will build a regression model using deep learning in Keras. To begin with, we will define the model. The first line of code below calls for the Sequential constructor. Note that we would be using the Sequential model because our network consists of a linear stack of layers. The second line of code represents the first layer which specifies the activation function and the number of input dimensions, which in our case is 4 predictors. Then we repeat the same process in the third and fourth line of codes for the hidden layers, this time without the input_dim parameter. The last line of code creates the output layer with one node that is supposed to output the number of unemployed in thousands.

The activation function used in the hidden layers is a rectified linear unit, or ReLU. It is the most widely used activation function because of its advantages of being nonlinear, as well as the ability to not activate all the neurons at the same time. In simple terms, this means that at a time, only a few neurons are activated, making the network sparse and very efficient.

`1 2 3 4 5 6 7`

`# Define model model = Sequential() model.add(Dense(500, input_dim=4, activation= "relu")) model.add(Dense(100, activation= "relu")) model.add(Dense(50, activation= "relu")) model.add(Dense(1)) #model.summary() #Print model Summary`

python

The next step is to define an optimizer and the loss measure for training. The mean squared error is our loss measure and the "adam" optimizer is our minimization algorithm. The main advantage of the "adam" optimizer is that we don't need to specify the learning rate as is the case with gradient descent; thereby saving us the task of optimizing the learning rate for our model. We achieve this task with the first line of the code below.

The second line of code fits the model on the training dataset. We also provide the argument, epochs, which represents the number of training iterations. We have taken 20 epochs.

`1 2`

`model.compile(loss= "mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"]) model.fit(X_train, y_train, epochs=20)`

python

`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40`

`Epoch 1/20 401/401 [==============================] - 0s 1ms/step - loss: 68136318.3441 - mean_squared_error: 68136318.3441 Epoch 2/20 401/401 [==============================] - 0s 133us/step - loss: 68101432.0698 - mean_squared_error: 68101432.0698 Epoch 3/20 401/401 [==============================] - 0s 125us/step - loss: 67985495.1022 - mean_squared_error: 67985495.1022 Epoch 4/20 401/401 [==============================] - 0s 134us/step - loss: 67665023.0524 - mean_squared_error: 67665023.0524 Epoch 5/20 401/401 [==============================] - 0s 127us/step - loss: 66899397.2868 - mean_squared_error: 66899397.2868 Epoch 6/20 401/401 [==============================] - 0s 107us/step - loss: 65355226.3042 - mean_squared_error: 65355226.3042 Epoch 7/20 401/401 [==============================] - 0s 120us/step - loss: 62432633.3566 - mean_squared_error: 62432633.3566 Epoch 8/20 401/401 [==============================] - 0s 128us/step - loss: 57537882.0549 - mean_squared_error: 57537882.0549 Epoch 9/20 401/401 [==============================] - 0s 150us/step - loss: 50086165.6958 - mean_squared_error: 50086165.6958 Epoch 10/20 401/401 [==============================] - 0s 119us/step - loss: 39984370.9975 - mean_squared_error: 39984370.9975 Epoch 11/20 401/401 [==============================] - 0s 97us/step - loss: 28126145.2868 - mean_squared_error: 28126145.2868 Epoch 12/20 401/401 [==============================] - 0s 110us/step - loss: 16095036.0499 - mean_squared_error: 16095036.0499 Epoch 13/20 401/401 [==============================] - 0s 126us/step - loss: 7629222.0150 - mean_squared_error: 7629222.0150 Epoch 14/20 401/401 [==============================] - 0s 107us/step - loss: 4147607.1696 - mean_squared_error: 4147607.1696 Epoch 15/20 401/401 [==============================] - 0s 107us/step - loss: 3668975.7581 - mean_squared_error: 3668975.7581 Epoch 16/20 401/401 [==============================] - 0s 111us/step - loss: 3646548.0898 - mean_squared_error: 3646548.0898 Epoch 17/20 401/401 [==============================] - 0s 126us/step - loss: 3563563.1328 - mean_squared_error: 3563563.1328 Epoch 18/20 401/401 [==============================] - 0s 117us/step - loss: 3533091.9377 - mean_squared_error: 3533091.9377 Epoch 19/20 401/401 [==============================] - 0s 123us/step - loss: 3496560.1110 - mean_squared_error: 3496560.1110 Epoch 20/20 401/401 [==============================] - 0s 132us/step - loss: 3467280.0112 - mean_squared_error: 3467280.0112`

The first line of code predicts on the train data, while the second line prints the RMSE value on the train data. The same is repeated in the third and fourth lines of code which predicts and prints the RMSE value on test data.

`1 2 3 4 5`

`pred_train= model.predict(X_train) print(np.sqrt(mean_squared_error(y_train,pred_train))) pred= model.predict(X_test) print(np.sqrt(mean_squared_error(y_test,pred)))`

python

`1 2`

`1856.4850642445354 1825.5904063232729`

The output above shows that the RMSE, which is our evaluation metric, was 1856 thousand for train data and 1825 thousand for test data. Ideally, the lower the RMSE value, the better the model performance. However, in contrast to accuracy, it is not straightforward to interpret RMSE as we would have to look at the unit which in our case is in thousands.

In this guide, we have built Regression models using the deep learning framework, Keras. The guide used the US economics time series data and built a deep learning regression model to predict the number of unemployed population in thousands.

Our model is achieving a stable performance with not much variance in the train and test set RMSE. The most ideal result would be an RMSE value of zero, but that's almost impossible in real economic datasets. Also, since the unit of the target variable is in thousands, that also affects the RMSE value.

There are other iterations such as changing the number of neurons, adding more hidden layers, or increasing the number of epochs, which can be tried out to see the impact on model performance.

This regression problem could also be modeled using other algorithms such as Decision Tree, Random Forest, Gradient Boosting or Support Vector Machines. However, that is not in the scope of this guide which is aimed at enabling individuals to solve Regression problems using deep learning library Keras.

16