Regression with Keras
Mar 20, 2019 • 14 Minute Read
Introduction
Deep Learning is one of the hottest topics in data science today. This is not surprising given the tremendous amount of fascinating applications being developed using deep learning, such as selfdriving cars, color restoration, natural language processing, automatic machine translation, image classification, and many more.
There are many deep learning libraries out there, but the most popular ones are TensorFlow, Keras, and PyTorch. Although TensorFlow and Pytorch are immensely popular, they are not easy to use and have a steep learning curve. So, for many practitioners, Keras is the preferred choice.
The Keras library is a highlevel API for building deep learning models that has gained favor for its ease of use and simplicity facilitating fast development. Often, building a very complex deep learning network with Keras can be achieved with only a few lines of code.
In this guide, we will focus on how to use the Keras library to build regression models.
Regression with Keras
Regression is a type of supervised machine learning algorithm used to predict a continuous label. The goal is to produce a model that represents the ‘best fit’ to some observed data, according to an evaluation criterion.
The basic architecture of the deep learning neural network, which we will be following, consists of three main components.

Input Layer: This is where the training observations are fed. The number of predictor variables is also specified here through the neurons.

Hidden Layers: These are the intermediate layers between the input and output layers. The deep neural network learns about the relationships involved in data in this component.

Output Layer: This is the layer where the final output is extracted from what’s happening in the previous two layers. In case of regression problems, the output later will have one neuron.
Problem Statement
Unemployment is a major socioeconomic and political issue for any country and, hence, managing it is a chief task for any government. But to manage unemployment within an economy, it is imperative to predict it as well. This is what this guide will aim to achieve. The guide will be building a deep learning regression model using Keras to predict unemployment.
The data used in this project was produced from US economic time series data available from https://research.stlouisfed.org/fred2. The data contains 574 rows and 5 variables, as described below:
 psavert  personal savings rate.
 pce  personal consumption expenditures, in billions of dollars.
 uempmed  median duration of unemployment, in weeks.
 pop  total population, in thousands.
 unemploy number of unemployed in thousands (dependent variable).
Evaluation Metrics
We will evaluate the performance of the model using Root Mean Squared Error (RMSE), a commonly used metric for regression problems. In simple terms, RMSE measures the average magnitude of the residuals or error. Mathematically, it is computed as the square root of the average of squared differences between predicted and actual values.
Steps
Following are the steps which are commonly followed while implementing Regression Models with Keras.
Step 1  Loading the required libraries and modules.
Step 2  Loading the data and performing basic data checks.
Step 3  Creating arrays for the features and the response variable.
Step 4  Creating the training and test datasets.
Step 5  Define, compile, and fit the Keras regression model.
Step 6  Predict on the test data and compute evaluation metrics.
The following sections will cover these steps.
Step 1  Loading the Required Libraries and Modules
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
# Import necessary modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
# Keras specific
import keras
from keras.models import Sequential
from keras.layers import Dense
Step 2  Reading the Data and Performing Basic Data Checks
The first line of code reads in the data as pandas dataframe, while the second line of code prints the shape  574 observations of 5 variables. The third line gives summary statistics of the numerical variables. We can see that all the variables have 574 as 'count' which is equal to the number of records in the dataset. That means we don't have missing values.
df = pd.read_csv('regressionexample.csv')
print(df.shape)
df.describe()
(574, 5)
pce  pop  psavert  uempmed  unemploy  

count  574  574  574  574  574 
mean  4,844  2,57,189  8  9  7,772 
std  3,579  36,731  3  4  2,642 
min  507  1,98,712  2  4  2,685 
25%  1,582  2,24,896  6  6  6,284 
50%  3,954  2,53,060  8  8  7,494 
75%  7,667  2,90,291  11  9  8,691 
max  12,162  3,20,887  17  25  15,352 
Step 3  Creating Arrays for the Features and the Response Variable
The first line of code creates an object of the target variable, while the second line of code gives us the list of all the features, excluding the target variable 'unemploy'.
The third line normalizes the predictors. This is important because the units of the variables differ significantly and may influence the modeling process. To prevent this, we will do normalization via scaling of the predictors between 0 and 1.
The fourth line displays the summary of the normalized data. We can see that all the independent variables have now been scaled between 0 and 1. The target variable remains unchanged.
target_column = ['unemploy']
predictors = list(set(list(df.columns))set(target_column))
df[predictors] = df[predictors]/df[predictors].max()
df.describe()
pce  pop  psavert  uempmed  unemploy  

count  574  574  574  574  574 
mean  0.40  0.80  0.47  0.34  7,772 
std  0.29  0.11  0.18  0.16  2,642 
min  0.04  0.62  0.11  0.16  2,685 
25%  0.13  0.70  0.32  0.24  6,284 
50%  0.33  0.79  0.45  0.30  7,494 
75%  0.63  0.90  0.62  0.36  8,691 
max  1  1  1  1  15,352 
Step 4  Creating the Training and Test Datasets
The first couple of lines creates arrays of independent (X) and dependent (y) variables, respectively. The third line splits the data into training and test dataset, while the fourth line prints the shape of the training set (401 observations of 4 variables) and test set (173 observations of 4 variables).
X = df[predictors].values
y = df[target_column].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)
print(X_train.shape); print(X_test.shape)
(401, 4)
(173, 4)
Step 5  Building the Deep Learning Regression Model
We will build a regression model using deep learning in Keras. To begin with, we will define the model. The first line of code below calls for the Sequential constructor. Note that we would be using the Sequential model because our network consists of a linear stack of layers. The second line of code represents the first layer which specifies the activation function and the number of input dimensions, which in our case is 4 predictors. Then we repeat the same process in the third and fourth line of codes for the hidden layers, this time without the input_dim parameter. The last line of code creates the output layer with one node that is supposed to output the number of unemployed in thousands.
The activation function used in the hidden layers is a rectified linear unit, or ReLU. It is the most widely used activation function because of its advantages of being nonlinear, as well as the ability to not activate all the neurons at the same time. In simple terms, this means that at a time, only a few neurons are activated, making the network sparse and very efficient.
# Define model
model = Sequential()
model.add(Dense(500, input_dim=4, activation= "relu"))
model.add(Dense(100, activation= "relu"))
model.add(Dense(50, activation= "relu"))
model.add(Dense(1))
#model.summary() #Print model Summary
The next step is to define an optimizer and the loss measure for training. The mean squared error is our loss measure and the "adam" optimizer is our minimization algorithm. The main advantage of the "adam" optimizer is that we don't need to specify the learning rate as is the case with gradient descent; thereby saving us the task of optimizing the learning rate for our model. We achieve this task with the first line of the code below.
The second line of code fits the model on the training dataset. We also provide the argument, epochs, which represents the number of training iterations. We have taken 20 epochs.
model.compile(loss= "mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])
model.fit(X_train, y_train, epochs=20)
Epoch 1/20
401/401 [==============================]  0s 1ms/step  loss: 68136318.3441  mean_squared_error: 68136318.3441
Epoch 2/20
401/401 [==============================]  0s 133us/step  loss: 68101432.0698  mean_squared_error: 68101432.0698
Epoch 3/20
401/401 [==============================]  0s 125us/step  loss: 67985495.1022  mean_squared_error: 67985495.1022
Epoch 4/20
401/401 [==============================]  0s 134us/step  loss: 67665023.0524  mean_squared_error: 67665023.0524
Epoch 5/20
401/401 [==============================]  0s 127us/step  loss: 66899397.2868  mean_squared_error: 66899397.2868
Epoch 6/20
401/401 [==============================]  0s 107us/step  loss: 65355226.3042  mean_squared_error: 65355226.3042
Epoch 7/20
401/401 [==============================]  0s 120us/step  loss: 62432633.3566  mean_squared_error: 62432633.3566
Epoch 8/20
401/401 [==============================]  0s 128us/step  loss: 57537882.0549  mean_squared_error: 57537882.0549
Epoch 9/20
401/401 [==============================]  0s 150us/step  loss: 50086165.6958  mean_squared_error: 50086165.6958
Epoch 10/20
401/401 [==============================]  0s 119us/step  loss: 39984370.9975  mean_squared_error: 39984370.9975
Epoch 11/20
401/401 [==============================]  0s 97us/step  loss: 28126145.2868  mean_squared_error: 28126145.2868
Epoch 12/20
401/401 [==============================]  0s 110us/step  loss: 16095036.0499  mean_squared_error: 16095036.0499
Epoch 13/20
401/401 [==============================]  0s 126us/step  loss: 7629222.0150  mean_squared_error: 7629222.0150
Epoch 14/20
401/401 [==============================]  0s 107us/step  loss: 4147607.1696  mean_squared_error: 4147607.1696
Epoch 15/20
401/401 [==============================]  0s 107us/step  loss: 3668975.7581  mean_squared_error: 3668975.7581
Epoch 16/20
401/401 [==============================]  0s 111us/step  loss: 3646548.0898  mean_squared_error: 3646548.0898
Epoch 17/20
401/401 [==============================]  0s 126us/step  loss: 3563563.1328  mean_squared_error: 3563563.1328
Epoch 18/20
401/401 [==============================]  0s 117us/step  loss: 3533091.9377  mean_squared_error: 3533091.9377
Epoch 19/20
401/401 [==============================]  0s 123us/step  loss: 3496560.1110  mean_squared_error: 3496560.1110
Epoch 20/20
401/401 [==============================]  0s 132us/step  loss: 3467280.0112  mean_squared_error: 3467280.0112
Step 6  Predict on the Test Data and Compute Evaluation Metrics
The first line of code predicts on the train data, while the second line prints the RMSE value on the train data. The same is repeated in the third and fourth lines of code which predicts and prints the RMSE value on test data.
pred_train= model.predict(X_train)
print(np.sqrt(mean_squared_error(y_train,pred_train)))
pred= model.predict(X_test)
print(np.sqrt(mean_squared_error(y_test,pred)))
1856.4850642445354
1825.5904063232729
Evaluation of the Model Performance
The output above shows that the RMSE, which is our evaluation metric, was 1856 thousand for train data and 1825 thousand for test data. Ideally, the lower the RMSE value, the better the model performance. However, in contrast to accuracy, it is not straightforward to interpret RMSE as we would have to look at the unit which in our case is in thousands.
Conclusion
In this guide, we have built Regression models using the deep learning framework, Keras. The guide used the US economics time series data and built a deep learning regression model to predict the number of unemployed population in thousands.
Our model is achieving a stable performance with not much variance in the train and test set RMSE. The most ideal result would be an RMSE value of zero, but that's almost impossible in real economic datasets. Also, since the unit of the target variable is in thousands, that also affects the RMSE value.
There are other iterations such as changing the number of neurons, adding more hidden layers, or increasing the number of epochs, which can be tried out to see the impact on model performance.
This regression problem could also be modeled using other algorithms such as Decision Tree, Random Forest, Gradient Boosting or Support Vector Machines. However, that is not in the scope of this guide which is aimed at enabling individuals to solve Regression problems using deep learning library Keras.