Author avatar

Deepika Singh

Implementing Marketing Analytics in R: Part 2

Deepika Singh

  • Feb 15, 2020
  • 16 Min read
  • 2,429 Views
  • Feb 15, 2020
  • 16 Min read
  • 2,429 Views
Data
General Data Science Principles

Introduction

Marketing and customer-related decisions are a top priority for every business. With the help of statistical modeling and analytics, it's possible to support decision makers and help them make strategic decisions based on data, not just instincts. The integration of statistical modeling with marketing strategy can also be referred to as marketing analytics.

In this series of two guides, we explore techniques for implementing marketing analytics in R.

In the previous guide, Part 1, we covered:

  • Customer Churn Prediction

  • RFM Analysis

In this guide, Part 2, we will cover:

  • Clustering

  • Sales Forecasting

  • Other Areas

Let's start by loading the required libraries.

1library(plyr)
2library(readr)
3library(dplyr)
4library(caret)
5library(ggplot2)
6library(repr)
7library(caret)
{r}

Segmentation

Segmentation is the marketing strategy of dividing or segmenting customers to build and deploy effective marketing plans. The basis of segmenting can be demographic, geographic, or any other metric. The statistical technique of dividing the customers into homogeneous segments is called clustering. Let's start by loading the data.

Data

For this clustering exercise, we'll use a fictitious dataset of retail store customers containing 150 observations and 5 variables, as described below:

  1. mean_HH_income: The mean household income of the customer.

  2. ABV: Average bill value per purchase by a customer.

  3. No_bills_month: Average number of bills in a month from the customer.

  4. satisfaction_score: The customer's satisfaction score. The higher the satisfaction score, the more satisfied the customer is with the store.

  5. family_size: Number of members in the customer's family.

  6. Gender: Whether the customer is Female ('1') or male ('0').

Let's load the data and look at the structure.

1df_seg = read_csv("segdata.csv")
2glimpse(df_seg)
{r}

Output:

1Observations: 150
2Variables: 6
3$ mean_HH_income     <int> 88006, 88005, 88004, 88003, 88002, 88001, 88000, 64...
4$ ABV                <int> 355, 354, 353, 352, 351, 350, 349, 348, 347, 346, 3...
5$ No_bills_month     <int> 41, 21, 21, 53, 21, 44, 49, 22, 10, 58, 25, 52, 41,...
6$ satisfaction_score <int> 1, 3, 4, 4, 2, 1, 3, 4, 4, 3, 5, 1, 5, 3, 3, 5, 4, ...
7$ family_size        <int> 4, 5, 3, 7, 6, 9, 6, 5, 8, 10, 3, 3, 3, 8, 7, 3, 3,...
8$ Gender             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...

The next step is to look at the summary of the numerical variables.

1summary(df_seg)
{r}

Output:

1mean_HH_income       ABV         No_bills_month  satisfaction_score
2 Min.   :45000   Min.   : 26.00   Min.   : 5.00   Min.   :1.000     
3 1st Qu.:52349   1st Qu.: 62.25   1st Qu.:16.00   1st Qu.:2.000     
4 Median :64277   Median :286.50   Median :26.50   Median :3.000     
5 Mean   :59873   Mean   :215.39   Mean   :30.20   Mean   :2.873     
6 3rd Qu.:64314   3rd Qu.:323.75   3rd Qu.:44.75   3rd Qu.:4.000     
7 Max.   :88006   Max.   :355.00   Max.   :60.00   Max.   :5.000     
8  family_size         Gender      
9 Min.   : 1.000   Min.   :0.0000  
10 1st Qu.: 3.000   1st Qu.:0.0000  
11 Median : 6.000   Median :0.0000  
12 Mean   : 5.513   Mean   :0.3333  
13 3rd Qu.: 8.000   3rd Qu.:1.0000  
14 Max.   :10.000   Max.   :1.0000 

From the output above, we can infer that the variables have different scales that will adversely affect the calculation of distances for clustering. To prevent this, we'll normalize the data using the code below. We can normalize the variables in a data frame by using the preProcess function in the caret package.

The first line of code below pre-processes the data, while the second line performs the normalization. If we look at the summary of the segNorm variable, we'll see that all the variables now have a mean of zero.

1p1_object = preProcess(df_seg)
2
3segNorm = predict(p1_object, df_seg)
4
5summary(segNorm)
{r}

Output:

1mean_HH_income         ABV          No_bills_month    satisfaction_score
2 Min.   :-1.6354   Min.   :-1.5124   Min.   :-1.4713   Min.   :-1.31892  
3 1st Qu.:-0.8274   1st Qu.:-1.2229   1st Qu.:-0.8291   1st Qu.:-0.61487  
4 Median : 0.4842   Median : 0.5679   Median :-0.2160   Median : 0.08918  
5 Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000  
6 3rd Qu.: 0.4883   3rd Qu.: 0.8654   3rd Qu.: 0.8495   3rd Qu.: 0.79323  
7 Max.   : 3.0935   Max.   : 1.1149   Max.   : 1.7399   Max.   : 1.49728  
8  family_size          Gender       
9 Min.   :-1.5167   Min.   :-0.7047  
10 1st Qu.:-0.8446   1st Qu.:-0.7047  
11 Median : 0.1635   Median :-0.7047  
12 Mean   : 0.0000   Mean   : 0.0000  
13 3rd Qu.: 0.8356   3rd Qu.: 1.4095  
14 Max.   : 1.5077   Max.   : 1.4095 

After normalizing the data, the next step is to run the clustering algorithm. We'll use the popular kmeans algorithm because it is powerful and interpretable. The number of clusters we will create is five, as given by the argument centers=5 in the first line of code. The second line displays the number of observations in each cluster.

1kmeans_Clust = kmeans(segNorm, centers=5, iter.max=1000)
2
3table(kmeans_Clust$cluster)
{r}

Output:

1 1  2  3  4  5 
231 44 13 21 41 

From the output above, we can infer that the smallest cluster, 3, has 13 observations, while the largest cluster, 2, has 44 observations. If we want to understand the clusters, we can do that with the command below.

The first line of code below adds a new variable in the original un-normalized data, df-seg. The second line uses the lapply() function to create and print the summary of each cluster.

1df_seg$kmeansseg = kmeans_Clust$cluster
2
3lapply(split(df_seg, df_seg$kmeansseg), colMeans)
{r}

Output:

1$`1`
2    mean_HH_income                ABV     No_bills_month satisfaction_score 
3      51832.129032          46.645161          22.709677           2.258065 
4       family_size             Gender          kmeansseg 
5          4.354839           1.000000           1.000000 
6
7$`2`
8    mean_HH_income                ABV     No_bills_month satisfaction_score 
9      67507.954545         311.727273          36.818182           1.909091 
10       family_size             Gender          kmeansseg 
11          5.204545           0.000000           2.000000 
12
13$`3`
14    mean_HH_income                ABV     No_bills_month satisfaction_score 
15      51437.230769         129.076923          28.692308           2.153846 
16       family_size             Gender          kmeansseg 
17          7.384615           0.000000           3.000000 
18
19$`4`
20    mean_HH_income                ABV     No_bills_month satisfaction_score 
21      5.182395e+04       1.424762e+02       4.519048e+01       3.619048e+00 
22       family_size             Gender          kmeansseg 
23      6.142857e+00       9.047619e-01       4.000000e+00 
24
25$`5`
26    mean_HH_income                ABV     No_bills_month satisfaction_score 
27      64556.170732         304.292683          21.560976           4.219512 
28       family_size             Gender          kmeansseg 
29          5.804878           0.000000           5.000000 

Sales Forecasting

Forecasting needs no introduction, and it is an important part of marketing teams, often used for forecasting sales. Let's start by loading the data.

Data

For this forecasting exercise, we'll use fictitious monthly sales data containing 81 observations, and the task will be to create a forecasting model to predict sales over the next 12 months.

1library(forecast)
2library(fpp2)
3library(TTR)
4library(dplyr)
5
6dat <- read_csv("salesforecasting.csv")
7glimpse(dat) 
{r}

Output:

1Observations: 81
2Variables: 3
3$ Date  <chr> "01-Mar-13", "01-Apr-13", "01-May-13", "01-Jun-13", "01-Jul-13",...
4$ Sales <int> 192, 198, 197, 213, 202, 186, 215, 213, 213, 220, 210, 219, 233,...
5$ Class <chr> "Train", "Train", "Train", "Train", "Train", "Train", "Train", "...

The next step is to create training and test sets for model validation. The first couple of lines of code below perform this task by using the Class variable, which already has labels Train and Test. The third line of code prints the number of rows in the training and the test data. Note that the test data set has 12 rows because we will be building the model to forecast for 12 months ahead.

1dat_train = subset(dat, Class == 'Train')
2dat_test = subset(dat, Class == 'Test')
3 
4nrow(dat_train); nrow(dat_test)
{r}

Output:

1[1] 69
2[1] 12

Preparing the Time Series Object

To run forecasting models in 'R', we need to convert the data into a time series object, which is done in the first line of code below. The start and end argument specifies the time of the first and the last observations, respectively. The argument frequency specifies the number of observations per unit of time.

We will also create a utility function for calculating Mean Absolute Percentage Error (or MAPE), which will be used to evaluate the performance of the forecasting models. The lower the MAPE value, the better the forecasting model. This is done in the second to fourth lines of code.

1dat_ts <- ts(dat_train[, 2], start = c(2013, 3), end = c(2018, 11), frequency = 12)
2 
3#lines 2 to 4
4mape <- function(actual,pred){
5  mape <- mean(abs((actual - pred)/actual))*100
6  return (mape)
7}
{r}

With the data and the mape function prepared, we move to the forecasting technique to be used in this case.

Holt's Trend Method

We'll use Holt's Trend forecasting method, a traditional statistical algorithm, which considers the trend component while generating forecasts. This method involves two smoothing equations, one for the level and one for the trend component.

The first line of code below creates the Holt's winter model and stores it in an object 'holt_model'. The second line prints the summary and the forecasts for the next 12 months.

1holt_model <- holt(dat_ts, h = 12)
2summary(holt_model)
{r}

Output:

1Forecast method: Holt's method
2
3Model Information:
4Holt's method 
5
6Call:
7 holt(y = dat_ts, h = 12) 
8
9  Smoothing parameters:
10    alpha = 0.8342 
11    beta  = 1e-04 
12    
13    Initial states:
14    l = 193.7099 
15    b = 3.4287 
16
17  sigma:  16.2391
18
19     AIC     AICc      BIC 
20682.6969 683.6492 693.8674 
21
22Error measures:
23                     ME     RMSE     MAE        MPE     MAPE      MASE        ACF1
24Training set -0.8870433 15.76137 11.8633 -0.4614957 4.084311 0.2366019 0.004106845
25
26Forecasts:
27         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
28Dec 2018       382.4585 361.6473 403.2698 350.6305 414.2866
29Jan 2019       385.8799 358.7765 412.9833 344.4288 427.3309
30Feb 2019       389.3012 357.1116 421.4908 340.0715 438.5309
31Mar 2019       392.7225 356.1461 429.2989 336.7838 448.6612
32Apr 2019       396.1438 355.6521 436.6355 334.2171 458.0706
33May 2019       399.5651 355.5036 443.6267 332.1789 466.9514
34Jun 2019       402.9865 355.6225 450.3504 330.5496 475.4233
35Jul 2019       406.4078 355.9563 456.8593 329.2489 483.5667
36Aug 2019       409.8291 356.4676 463.1906 328.2197 491.4385
37Sep 2019       413.2504 357.1288 469.3721 327.4198 499.0811
38Oct 2019       416.6717 357.9187 475.4248 326.8168 506.5267
39Nov 2019       420.0931 358.8209 481.3652 326.3854 513.8008

The output above shows that the MAPE for the training data is 4.1 percent. Let's now evaluate the model performance on the test data, which is done in the lines of code below. The MAPE error on the test data comes out to be 2.7 percent, which is an improvement over the training data result.

1df_holt = as.data.frame(holt_model)
2dat_test$holt = df_holt$`Point Forecast`
3mape(dat_test$Sales, dat_test$holt) 
{r}

Output:

1[1] 2.735082

Overall Forecasting for Next 12 Months

The accuracy number we got is respectable, so we can create the forecast for the next 12 months using the line of codes below.

1df_forecast <- ts(dat[, 2], start = c(2013, 3), end = c(2019, 11), frequency = 12)
2arima_model_ahead <- auto.arima(df_forecast)
3forecast_12 = forecast::forecast(arima_model_ahead, h=12)
4forecast_12
{r}

Output:

1         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2Dec 2019       438.1473 419.0833 457.2113 408.9915 467.3031
3Jan 2020       441.9584 417.3683 466.5485 404.3510 479.5658
4Feb 2020       446.0598 416.9752 475.1444 401.5787 490.5409
5Mar 2020       442.3232 409.3512 475.2952 391.8968 492.7496
6Apr 2020       443.8119 407.3648 480.2591 388.0708 499.5530
7May 2020       445.3007 405.6821 484.9193 384.7092 505.8921
8Jun 2020       449.9827 407.4283 492.5370 384.9014 515.0640
9Jul 2020       459.3094 414.0091 504.6097 390.0286 528.5902
10Aug 2020       463.9914 416.1024 511.8804 390.7515 537.2313
11Sep 2020       462.8675 412.5227 513.2122 385.8718 539.8631
12Oct 2020       462.3241 409.6379 515.0103 381.7476 542.9007
13Nov 2020       463.5226 408.5947 518.4505 379.5176 547.5275

Other Areas

Marketing analytics is a humongous area, and it is not possible to cover all the concepts and techniques in one guide, or for that matter even one book. We have covered the most widely used marketing analytics techniques above. Some other popular ones are mentioned below.

  1. Customer Life Time Value (or CLV): In this technique, you can decide which customers are most valuable to the business by modeling their CLV.

  2. Natural Language Processing (or NLP): NLP is a domain in itself and is increasingly being used for sentiment analysis, topic modeling, and text classification.

  3. Dimensionality Reduction: This technique is often used for the survey of CRM data, which can be extensive.

  4. Affinity Analysis: In this technique, you can decide which items have a higher likelihood of being bought together.

  5. Marketing Campaign Optimization: In this technique, you can model the effectiveness of a marketing campaign and optimize your marketing budget.