Marketing and customer-related decisions are a top priority for every business. With the help of statistical modeling and analytics, it's possible to support decision makers and help them make strategic decisions based on data, not just instincts. The integration of statistical modeling with marketing strategy can also be referred to as marketing analytics.
In this series of two guides, we explore techniques for implementing marketing analytics in R.
In the previous guide, Part 1, we covered:
Customer Churn Prediction
In this guide, Part 2, we will cover:
Clustering
Sales Forecasting
Let's start by loading the required libraries.
1library(plyr)
2library(readr)
3library(dplyr)
4library(caret)
5library(ggplot2)
6library(repr)
7library(caret)
Segmentation is the marketing strategy of dividing or segmenting customers to build and deploy effective marketing plans. The basis of segmenting can be demographic, geographic, or any other metric. The statistical technique of dividing the customers into homogeneous segments is called clustering. Let's start by loading the data.
For this clustering exercise, we'll use a fictitious dataset of retail store customers containing 150 observations and 5 variables, as described below:
mean_HH_income
: The mean household income of the customer.
ABV
: Average bill value per purchase by a customer.
No_bills_month
: Average number of bills in a month from the customer.
satisfaction_score
: The customer's satisfaction score. The higher the satisfaction score, the more satisfied the customer is with the store.
family_size
: Number of members in the customer's family.
Gender
: Whether the customer is Female ('1') or male ('0'). Let's load the data and look at the structure.
1df_seg = read_csv("segdata.csv")
2glimpse(df_seg)
Output:
1Observations: 150
2Variables: 6
3$ mean_HH_income <int> 88006, 88005, 88004, 88003, 88002, 88001, 88000, 64...
4$ ABV <int> 355, 354, 353, 352, 351, 350, 349, 348, 347, 346, 3...
5$ No_bills_month <int> 41, 21, 21, 53, 21, 44, 49, 22, 10, 58, 25, 52, 41,...
6$ satisfaction_score <int> 1, 3, 4, 4, 2, 1, 3, 4, 4, 3, 5, 1, 5, 3, 3, 5, 4, ...
7$ family_size <int> 4, 5, 3, 7, 6, 9, 6, 5, 8, 10, 3, 3, 3, 8, 7, 3, 3,...
8$ Gender <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
The next step is to look at the summary of the numerical variables.
1summary(df_seg)
Output:
1mean_HH_income ABV No_bills_month satisfaction_score
2 Min. :45000 Min. : 26.00 Min. : 5.00 Min. :1.000
3 1st Qu.:52349 1st Qu.: 62.25 1st Qu.:16.00 1st Qu.:2.000
4 Median :64277 Median :286.50 Median :26.50 Median :3.000
5 Mean :59873 Mean :215.39 Mean :30.20 Mean :2.873
6 3rd Qu.:64314 3rd Qu.:323.75 3rd Qu.:44.75 3rd Qu.:4.000
7 Max. :88006 Max. :355.00 Max. :60.00 Max. :5.000
8 family_size Gender
9 Min. : 1.000 Min. :0.0000
10 1st Qu.: 3.000 1st Qu.:0.0000
11 Median : 6.000 Median :0.0000
12 Mean : 5.513 Mean :0.3333
13 3rd Qu.: 8.000 3rd Qu.:1.0000
14 Max. :10.000 Max. :1.0000
From the output above, we can infer that the variables have different scales that will adversely affect the calculation of distances for clustering. To prevent this, we'll normalize the data using the code below. We can normalize the variables in a data frame by using the preProcess
function in the caret
package.
The first line of code below pre-processes the data, while the second line performs the normalization. If we look at the summary of the segNorm
variable, we'll see that all the variables now have a mean of zero.
1p1_object = preProcess(df_seg)
2
3segNorm = predict(p1_object, df_seg)
4
5summary(segNorm)
Output:
1mean_HH_income ABV No_bills_month satisfaction_score
2 Min. :-1.6354 Min. :-1.5124 Min. :-1.4713 Min. :-1.31892
3 1st Qu.:-0.8274 1st Qu.:-1.2229 1st Qu.:-0.8291 1st Qu.:-0.61487
4 Median : 0.4842 Median : 0.5679 Median :-0.2160 Median : 0.08918
5 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
6 3rd Qu.: 0.4883 3rd Qu.: 0.8654 3rd Qu.: 0.8495 3rd Qu.: 0.79323
7 Max. : 3.0935 Max. : 1.1149 Max. : 1.7399 Max. : 1.49728
8 family_size Gender
9 Min. :-1.5167 Min. :-0.7047
10 1st Qu.:-0.8446 1st Qu.:-0.7047
11 Median : 0.1635 Median :-0.7047
12 Mean : 0.0000 Mean : 0.0000
13 3rd Qu.: 0.8356 3rd Qu.: 1.4095
14 Max. : 1.5077 Max. : 1.4095
After normalizing the data, the next step is to run the clustering algorithm. We'll use the popular kmeans
algorithm because it is powerful and interpretable. The number of clusters we will create is five, as given by the argument centers=5
in the first line of code. The second line displays the number of observations in each cluster.
1kmeans_Clust = kmeans(segNorm, centers=5, iter.max=1000)
2
3table(kmeans_Clust$cluster)
Output:
1 1 2 3 4 5
231 44 13 21 41
From the output above, we can infer that the smallest cluster, 3
, has 13 observations, while the largest cluster, 2
, has 44 observations. If we want to understand the clusters, we can do that with the command below.
The first line of code below adds a new variable in the original un-normalized data, df-seg
. The second line uses the lapply()
function to create and print the summary of each cluster.
1df_seg$kmeansseg = kmeans_Clust$cluster
2
3lapply(split(df_seg, df_seg$kmeansseg), colMeans)
Output:
1$`1`
2 mean_HH_income ABV No_bills_month satisfaction_score
3 51832.129032 46.645161 22.709677 2.258065
4 family_size Gender kmeansseg
5 4.354839 1.000000 1.000000
6
7$`2`
8 mean_HH_income ABV No_bills_month satisfaction_score
9 67507.954545 311.727273 36.818182 1.909091
10 family_size Gender kmeansseg
11 5.204545 0.000000 2.000000
12
13$`3`
14 mean_HH_income ABV No_bills_month satisfaction_score
15 51437.230769 129.076923 28.692308 2.153846
16 family_size Gender kmeansseg
17 7.384615 0.000000 3.000000
18
19$`4`
20 mean_HH_income ABV No_bills_month satisfaction_score
21 5.182395e+04 1.424762e+02 4.519048e+01 3.619048e+00
22 family_size Gender kmeansseg
23 6.142857e+00 9.047619e-01 4.000000e+00
24
25$`5`
26 mean_HH_income ABV No_bills_month satisfaction_score
27 64556.170732 304.292683 21.560976 4.219512
28 family_size Gender kmeansseg
29 5.804878 0.000000 5.000000
Forecasting needs no introduction, and it is an important part of marketing teams, often used for forecasting sales. Let's start by loading the data.
For this forecasting exercise, we'll use fictitious monthly sales data containing 81 observations, and the task will be to create a forecasting model to predict sales over the next 12 months.
1library(forecast)
2library(fpp2)
3library(TTR)
4library(dplyr)
5
6dat <- read_csv("salesforecasting.csv")
7glimpse(dat)
Output:
1Observations: 81
2Variables: 3
3$ Date <chr> "01-Mar-13", "01-Apr-13", "01-May-13", "01-Jun-13", "01-Jul-13",...
4$ Sales <int> 192, 198, 197, 213, 202, 186, 215, 213, 213, 220, 210, 219, 233,...
5$ Class <chr> "Train", "Train", "Train", "Train", "Train", "Train", "Train", "...
The next step is to create training and test sets for model validation. The first couple of lines of code below perform this task by using the Class
variable, which already has labels Train and Test. The third line of code prints the number of rows in the training and the test data. Note that the test data set has 12 rows because we will be building the model to forecast for 12 months ahead.
1dat_train = subset(dat, Class == 'Train')
2dat_test = subset(dat, Class == 'Test')
3
4nrow(dat_train); nrow(dat_test)
Output:
1[1] 69
2[1] 12
To run forecasting models in 'R', we need to convert the data into a time series object, which is done in the first line of code below. The start
and end
argument specifies the time of the first and the last observations, respectively. The argument frequency
specifies the number of observations per unit of time.
We will also create a utility function for calculating Mean Absolute Percentage Error (or MAPE), which will be used to evaluate the performance of the forecasting models. The lower the MAPE value, the better the forecasting model. This is done in the second to fourth lines of code.
1dat_ts <- ts(dat_train[, 2], start = c(2013, 3), end = c(2018, 11), frequency = 12)
2
3#lines 2 to 4
4mape <- function(actual,pred){
5 mape <- mean(abs((actual - pred)/actual))*100
6 return (mape)
7}
With the data and the mape function prepared, we move to the forecasting technique to be used in this case.
We'll use Holt's Trend forecasting method, a traditional statistical algorithm, which considers the trend component while generating forecasts. This method involves two smoothing equations, one for the level and one for the trend component.
The first line of code below creates the Holt's winter model and stores it in an object 'holt_model'. The second line prints the summary and the forecasts for the next 12 months.
1holt_model <- holt(dat_ts, h = 12)
2summary(holt_model)
Output:
1Forecast method: Holt's method
2
3Model Information:
4Holt's method
5
6Call:
7 holt(y = dat_ts, h = 12)
8
9 Smoothing parameters:
10 alpha = 0.8342
11 beta = 1e-04
12
13 Initial states:
14 l = 193.7099
15 b = 3.4287
16
17 sigma: 16.2391
18
19 AIC AICc BIC
20682.6969 683.6492 693.8674
21
22Error measures:
23 ME RMSE MAE MPE MAPE MASE ACF1
24Training set -0.8870433 15.76137 11.8633 -0.4614957 4.084311 0.2366019 0.004106845
25
26Forecasts:
27 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
28Dec 2018 382.4585 361.6473 403.2698 350.6305 414.2866
29Jan 2019 385.8799 358.7765 412.9833 344.4288 427.3309
30Feb 2019 389.3012 357.1116 421.4908 340.0715 438.5309
31Mar 2019 392.7225 356.1461 429.2989 336.7838 448.6612
32Apr 2019 396.1438 355.6521 436.6355 334.2171 458.0706
33May 2019 399.5651 355.5036 443.6267 332.1789 466.9514
34Jun 2019 402.9865 355.6225 450.3504 330.5496 475.4233
35Jul 2019 406.4078 355.9563 456.8593 329.2489 483.5667
36Aug 2019 409.8291 356.4676 463.1906 328.2197 491.4385
37Sep 2019 413.2504 357.1288 469.3721 327.4198 499.0811
38Oct 2019 416.6717 357.9187 475.4248 326.8168 506.5267
39Nov 2019 420.0931 358.8209 481.3652 326.3854 513.8008
The output above shows that the MAPE for the training data is 4.1 percent. Let's now evaluate the model performance on the test data, which is done in the lines of code below. The MAPE error on the test data comes out to be 2.7 percent, which is an improvement over the training data result.
1df_holt = as.data.frame(holt_model)
2dat_test$holt = df_holt$`Point Forecast`
3mape(dat_test$Sales, dat_test$holt)
Output:
1[1] 2.735082
The accuracy number we got is respectable, so we can create the forecast for the next 12 months using the line of codes below.
1df_forecast <- ts(dat[, 2], start = c(2013, 3), end = c(2019, 11), frequency = 12)
2arima_model_ahead <- auto.arima(df_forecast)
3forecast_12 = forecast::forecast(arima_model_ahead, h=12)
4forecast_12
Output:
1 Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2Dec 2019 438.1473 419.0833 457.2113 408.9915 467.3031
3Jan 2020 441.9584 417.3683 466.5485 404.3510 479.5658
4Feb 2020 446.0598 416.9752 475.1444 401.5787 490.5409
5Mar 2020 442.3232 409.3512 475.2952 391.8968 492.7496
6Apr 2020 443.8119 407.3648 480.2591 388.0708 499.5530
7May 2020 445.3007 405.6821 484.9193 384.7092 505.8921
8Jun 2020 449.9827 407.4283 492.5370 384.9014 515.0640
9Jul 2020 459.3094 414.0091 504.6097 390.0286 528.5902
10Aug 2020 463.9914 416.1024 511.8804 390.7515 537.2313
11Sep 2020 462.8675 412.5227 513.2122 385.8718 539.8631
12Oct 2020 462.3241 409.6379 515.0103 381.7476 542.9007
13Nov 2020 463.5226 408.5947 518.4505 379.5176 547.5275
Marketing analytics is a humongous area, and it is not possible to cover all the concepts and techniques in one guide, or for that matter even one book. We have covered the most widely used marketing analytics techniques above. Some other popular ones are mentioned below.
Customer Life Time Value (or CLV): In this technique, you can decide which customers are most valuable to the business by modeling their CLV.
Natural Language Processing (or NLP): NLP is a domain in itself and is increasingly being used for sentiment analysis, topic modeling, and text classification.
Dimensionality Reduction: This technique is often used for the survey of CRM data, which can be extensive.
Affinity Analysis: In this technique, you can decide which items have a higher likelihood of being bought together.
In this series of guides, you learned about the most popular use cases of marketing analytics. You learned the concepts along with the implementation in R.
To learn more about data science with R, please refer to the following guides: