Introduction

Time series algorithms are extensively used for analyzing and forecasting time-based data. These algorithms are built on underlying statistical assumptions. In this guide, you will learn the statistical assumptions and the basic time series algorithms, and their implementation in Python.

Let's begin by understanding the data.

In this guide, you will use the fictitious monthly sales data of a supermarket chain containing 564 observations and three variables, as described below:

`date`

: the first date of every month`sales`

: daily sales, in millions of dollars`Class`

: the variable denoting the training and test data set partition

The lines of code below import the required libraries and the data.

```
1import pandas as pd
2import numpy as np
3
4# Reading the data
5df = pd.read_csv("timeseries2.csv")
6print(df.shape)
7print(df.info())
```

python

Output:

```
1 (564, 3)
2
3 <class 'pandas.core.frame.DataFrame'>
4 RangeIndex: 564 entries, 0 to 563
5 Data columns (total 3 columns):
6 date 564 non-null object
7 sales 564 non-null float64
8 Class 564 non-null object
9 dtypes: float64(1), object(2)
10 memory usage: 13.3+ KB
11 None
```

The next step is to create the training and test datasets for model validation. You will also create the training array that will be used for statistical tests.

```
1train = df[df["Class"] == "Train"]
2test = df[df["Class"] == "Test"]
3print(train.shape)
4print(test.shape)
5
6train_array = train["sales"]
7print(train_array.shape)
```

python

Output:

```
1 (552, 3)
2 (12, 3)
3
4 (552,)
```

With the data prepared, you are ready to move to the forecasting techniques in the subsequent sections. However, before moving to forecasting it's important to understand the statistical concepts of *white noise* and *stationarity* in time series.

A *white noise* series is a time series that is purely random and has variables that are independent and identically distributed with a mean of zero. This means that the observations have the same variance and there is no auto-correlation.

One of the initial techniques is to look at the summary statistics. This can be done with the code below. The output shows that the mean is not zero and the standard deviation is not one. These numbers indicate that the series is not white noise.

`1print(train_array.describe())`

python

Output:

```
1 count 552.000000
2 mean 6.221014
3 std 2.105854
4 min 2.100000
5 25% 5.000000
6 50% 6.000000
7 75% 6.900000
8 max 12.300000
9 Name: sales, dtype: float64
```

Visualizing the series is the next step, and can be done using the code below.

```
1import matplotlib.pyplot as plt
2train_array.plot()
3plt.show()
```

python

The information in this visualization shows that the data is not a purely random series. You can also create a histogram to confirm whether the distribution is normal.

```
1# histogram plot
2train_array.hist()
3plt.show()
```

python

Both the above plots confirm that the series is not white noise. Another method of confirming this is through auto-correlation, which is expected to be zero. This can be visualized with the code below.

```
1from pandas.plotting import autocorrelation_plot
2autocorrelation_plot(train_array)
3plt.show()
```

python

The output above shows a significant autocorrelation pattern. All of the above analysis suggests that this is not a white noise series.

One of the popular time series algorithms is the Auto Regressive Integrated Moving Average (ARIMA), which is defined for stationary series. A stationary series is one where the properties do not change over time. There are several methods to check the stationarity of the series. The one you’ll use here is the Augmented Dickey-Fuller test.

The Augmented Dickey-Fuller test is a type of statistical unit root test. The test uses an autoregressive model and optimizes an information criterion across multiple different lag values.

The null hypothesis of the test is that the time series is not stationary, whereas the alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.

The first step is to import the `adfuller`

module from the `statsmodels`

package. This is done in the first line of code below. The second line creates a series of values using the `sales`

variable of the training data set. The remaining lines conducts the test and prints the result values.

```
1from statsmodels.tsa.stattools import adfuller
2
3X = train.sales
4
5result = adfuller(X)
6print('ADF Statistic: %f' % result[0])
7print('p-value: %f' % result[1])
8print('Critical Values:')
9for key, value in result[4].items():
10 print('\t%s: %.3f' % (key, value))
```

python

Output:

```
1ADF Statistic: -2.960313
2 p-value: 0.038771
3 Critical Values:
4 1%: -3.442
5 5%: -2.867
6 10%: -2.570
```

The output above shows that the p-value is slightly lower than the threshold value of 0.05 which means you reject the null hypothesis. The series seems to be roughly stationary. Having understood the basic statistical concepts of time series, you will now build time series forecasting models.

One last step before building the model is to create a utility function that will be used as an evaluation metric. The code below creates the function for calculating the *mean absolute percentage error* (MAPE), which is the metric to be used. The lower the MAPE value, the better the forecasting model performance.

```
1def mean_absolute_percentage_error(y_true, y_pred):
2 y_true, y_pred = np.array(y_true), np.array(y_pred)
3 return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
```

python

In the exponential smoothing method, forecasts are produced using weighted averages of past observations, with the weights decaying exponentially as the observations get older. The value of the smoothing parameter for the level is decided by the parameter `smoothing_level`

.

The first two lines of code below import the required libraries and the modules. The third line fits the simple exponential model, while the fourth line generates the forecast on the test data. Finally, the `mean_absolute_percentage_error()`

function is used to produce the MAPE error on the test data, which comes out to be 10%.

```
1import statsmodels.api as sm
2from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
3
4model1 = SimpleExpSmoothing(np.asarray(train['sales'])).fit(smoothing_level=0.7,optimized=False)
5
6test['SimpleExp'] = model1.forecast(len(test))
7mean_absolute_percentage_error(test.sales, test.SimpleExp)
```

python

Output:

`110.05`

This is an extension of the simple exponential smoothing method that takes into account the trend component while generating forecasts. This method involves two smoothing equations, one for the level and one for the trend component.

The lines of code below create the model on the training data, generate predictions on the test data, and evaluate the model performance using the utility function.

```
1fit_holt = Holt(np.asarray(train['sales'])).fit(smoothing_level = 0.5,smoothing_slope = 0.1)
2
3test['Holt_linear_model'] = fit_holt.forecast(len(test))
4
5mean_absolute_percentage_error(test.sales, test.Holt_linear_model)
```

python

Output:

`12.15`

The output above shows that the MAPE for the test data is 2.1%.

This is an extension of the holt-linear model that takes into account both the trend and seasonality component while generating forecasts.

The lines of code below create the model on the training data, generate predictions on the test data, and evaluate the model performance using the utility function.

```
1fit_holt_winter = ExponentialSmoothing(np.asarray(train['sales']) ,seasonal_periods=6 ,trend='add', seasonal='add',).fit()
2
3test['Holt_Winter'] = fit_holt_winter.forecast(len(test))
4
5mean_absolute_percentage_error(test.sales, test.Holt_Winter)
```

python

Output:

`16.837`

The output above shows that the MAPE for the test data is 6.8%.

In this guide, you learned about the underlying statistical concepts of white noise and stationarity in time series data. You also learned how to implement basic time series forecasting models using Python.

The performance of the models on the test data is summarized below:

Simple Exponential Smoothing: MAPE of 10%

Holt Linear Trend Model: MAPE of 2.1%

Holt-Winters Method: MAPE of 6.8%

The simple exponential smoothing model did well to achieve a lower MAPE of 10%. However, the other two models outperformed it by producing an even lower MAPE. The Holt Linear Trend model emerged as the winner based on its lowest MAPE of 2.1%.

To learn more about data science using Python, please refer to the following guides.