Title: | Probabilistic Models for Assessing and Predicting your Customer Base |
---|---|
Description: | Provides advanced statistical methods to describe and predict customers' purchase behavior in a non-contractual setting. It uses historic transaction records to fit a probabilistic model, which then allows to compute quantities of managerial interest on a cohort- as well as on a customer level (Customer Lifetime Value, Customer Equity, P(alive), etc.). This package complements the BTYD package by providing several additional buy-till-you-die models, that have been published in the marketing literature, but whose implementation are complex and non-trivial. These models are: NBD [Ehrenberg (1959) <doi:10.2307/2985810>], MBG/NBD [Batislam et al (2007) <doi:10.1016/j.ijresmar.2006.12.005>], (M)BG/CNBD-k [Reutterer et al (2020) <doi:10.1016/j.ijresmar.2020.09.002>], Pareto/NBD (HB) [Abe (2009) <doi:10.1287/mksc.1090.0502>] and Pareto/GGG [Platzer and Reutterer (2016) <doi:10.1287/mksc.2015.0963>]. |
Authors: | Michael Platzer [aut, cre] |
Maintainer: | Michael Platzer <[email protected]> |
License: | GPL-3 |
Version: | 1.2.0 |
Built: | 2024-11-04 04:50:44 UTC |
Source: | https://github.com/mplatzer/btydplus |
Simulate data according to Pareto/NBD (Abe) model assumptions
abe.GenerateData( n, T.cal, T.star, params, date.zero = "2000-01-01", covariates = NULL )
abe.GenerateData( n, T.cal, T.star, params, date.zero = "2000-01-01", covariates = NULL )
n |
Number of customers. |
T.cal |
Length of calibration period. If a vector is provided, then it
is assumed that customers have different 'birth' dates, i.e.
|
T.star |
Length of holdout period. This may be a vector. |
params |
A list of model parameters: |
date.zero |
Initial date for cohort start. Can be of class character, Date or POSIXt. |
covariates |
Provide matrix of customer covariates. If NULL then random covariate values between [-1,1] are drawn. |
List of length 2:
cbs |
A data.frame with a row for each customer and the summary statistic as columns. |
elog |
A data.frame with a row for each transaction, and columns |
# generate artificial Pareto/NBD (Abe) with 2 covariates params <- list() params$beta <- matrix(c(0.18, -2.5, 0.5, -0.3, -0.2, 0.8), byrow = TRUE, ncol = 2) params$gamma <- matrix(c(0.05, 0.1, 0.1, 0.2), ncol = 2) data <- abe.GenerateData(n = 200, T.cal = 32, T.star = 32, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
# generate artificial Pareto/NBD (Abe) with 2 covariates params <- list() params$beta <- matrix(c(0.18, -2.5, 0.5, -0.3, -0.2, 0.8), byrow = TRUE, ncol = 2) params$gamma <- matrix(c(0.05, 0.1, 0.1, 0.2), ncol = 2) data <- abe.GenerateData(n = 200, T.cal = 32, T.star = 32, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
Returns draws from the posterior distributions of the Pareto/NBD (Abe) parameters, on cohort as well as on customer level.
abe.mcmc.DrawParameters( cal.cbs, covariates = c(), mcmc = 2500, burnin = 500, thin = 50, chains = 2, mc.cores = NULL, trace = 100 )
abe.mcmc.DrawParameters( cal.cbs, covariates = c(), mcmc = 2500, burnin = 500, thin = 50, chains = 2, mc.cores = NULL, trace = 100 )
cal.cbs |
Calibration period customer-by-sufficient-statistic (CBS)
data.frame. It must contain a row for each customer, and columns |
covariates |
A vector of columns of |
mcmc |
Number of MCMC steps. |
burnin |
Number of initial MCMC steps which are discarded. |
thin |
Only every |
chains |
Number of MCMC chains to be run. |
mc.cores |
Number of cores to use in parallel (Unix only). Defaults to |
trace |
Print logging statement every |
See demo('pareto-abe')
for how to apply this model.
List of length 2:
level_1 |
list of |
level_2 |
|
Abe, M. (2009). "Counting your customers" one by one: A hierarchical Bayes extension to the Pareto/NBD model. Marketing Science, 28(3), 541-553. doi:10.1287/mksc.1090.0502
abe.GenerateData
mcmc.PAlive
mcmc.DrawFutureTransactions
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") cbs$cov1 <- as.integer(cbs$cust) %% 2 # create dummy covariate param.draws <- abe.mcmc.DrawParameters(cbs, c("cov1"), mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast # cohort-level parameter draws as.matrix(param.draws$level_2) # customer-level parameter draws for customer with ID '4' as.matrix(param.draws$level_1[["4"]]) # estimate future transactions xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws, cbs$T.star) xstar.est <- apply(xstar.draws, 2, mean) head(xstar.est)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") cbs$cov1 <- as.integer(cbs$cust) %% 2 # create dummy covariate param.draws <- abe.mcmc.DrawParameters(cbs, c("cov1"), mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast # cohort-level parameter draws as.matrix(param.draws$level_2) # customer-level parameter draws for customer with ID '4' as.matrix(param.draws$level_1[["4"]]) # estimate future transactions xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws, cbs$T.star) xstar.est <- apply(xstar.draws, 2, mean) head(xstar.est)
Efficient implementation for the conversion of an event log into a
customer-by-sufficient-statistic (CBS) data.frame
, with a row for each
customer, which is the required data format for estimating model parameters.
elog2cbs(elog, units = "week", T.cal = NULL, T.tot = NULL)
elog2cbs(elog, units = "week", T.cal = NULL, T.tot = NULL)
elog |
Event log, a |
units |
Time unit, either |
T.cal |
End date of calibration period. Defaults to
|
T.tot |
End date of the observation period. Defaults to
|
The time unit for expressing t.x
, T.cal
and litt
are
determined via the argument units
, which is passed forward to method
difftime
, and defaults to weeks
.
Argument T.tot
allows one to specify the end of the observation period,
i.e. the last possible date of an event to still be included in the event
log. If T.tot
is not provided, then the date of the last recorded event
will be assumed to coincide with the end of the observation period. If
T.tot
is provided, then any event that occurs after that date is discarded.
Argument T.cal
allows one to split the summary statistics into a
calibration and a holdout period. This can be useful for evaluating
forecasting accuracy for a given dataset. If T.cal
is not provided,
then the whole observation period is considered, and is then subsequently
used for for estimating model parameters. If it is provided, then the
returned data.frame
contains two additional fields, with x.star
representing the number of repeat transactions during the holdout period of
length T.star
. And only those customers are contained, who have had at
least one event during the calibration period.
Transactions with identical cust
and date
field are treated as
a single transaction, with sales
being summed up.
data.frame
with fields:
cust |
Customer id (unique key). |
x |
Number of recurring events in calibration period. |
t.x |
Time between first and last event in calibration period. |
litt |
Sum of logarithmic intertransaction timings during calibration period. |
sales |
Sum of sales in calibration period, incl. initial transaction. Only if |
sales.x |
Sum of sales in calibration period, excl. initial transaction. Only if |
first |
Date of first transaction in calibration period. |
T.cal |
Time between first event and end of calibration period. |
T.star |
Length of holdout period. Only if |
x.star |
Number of events within holdout period. Only if |
sales.star |
Sum of sales within holdout period. Only if |
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31", T.tot = "2007-12-30") head(cbs)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31", T.tot = "2007-12-30") head(cbs)
Aggregates an event log to either incremental or cumulative number of
transactions. If first=TRUE
then the initial transactions of each
customer are included in the count as well.
elog2cum(elog, by = 7, first = FALSE) elog2inc(elog, by = 7, first = FALSE)
elog2cum(elog, by = 7, first = FALSE) elog2inc(elog, by = 7, first = FALSE)
elog |
Event log, a |
by |
Only return every |
first |
If TRUE, then the first transaction for each customer is being counted as well |
Duplicate transactions with identical cust
and date
(or
t
) field are counted only once.
Numeric vector of transaction counts.
data("groceryElog") cum <- elog2cum(groceryElog) plot(cum, typ="l", frame = FALSE) inc <- elog2inc(groceryElog) plot(inc, typ="l", frame = FALSE)
data("groceryElog") cum <- elog2cum(groceryElog) plot(cum, typ="l", frame = FALSE) inc <- elog2inc(groceryElog) plot(inc, typ="l", frame = FALSE)
The models (M)BG/CNBD-k and Pareto/GGG are capable of leveraging regularity within transaction timings for improving forecast accuracy. This method provides a quick check for the degree of regularity in the event timings. A return value of close to 1 supports the assumption of exponentially distributed intertransaction times, whereas values significantly larger than 1 reveal the presence of regularity.
estimateRegularity( elog, method = "wheat", plot = FALSE, title = "", min = NULL )
estimateRegularity( elog, method = "wheat", plot = FALSE, title = "", min = NULL )
elog |
Event log, a |
method |
Either |
plot |
If |
title |
Plot title. |
min |
Minimum number of intertransaction times per customer. Customers
with less than |
Estimation is either done by 1) assuming the same degree of regularity across
all customers (Wheat & Morrison (1990) via method = "wheat"
), or 2) by
estimating regularity for each customer separately, as the shape parameter of
a fitted gamma distribution, and then return the median across estimates. The
latter methods, though, require sufficient (>=min
) transactions per
customer.
Wheat & Morrison (1990)'s method calculates for each customer a statistic
M
based on her last two number of intertransaction times as
ITT_1 / (ITT_1 + ITT_2)
. That measure is known to follow a
Beta(k, k)
distribution, and k
can be estimated as
(1-4*Var(M))/(8*Var(M))
. The corresponding diagnostic plot (plot
= TRUE
) shows the actual distribution of M
vs. the theoretical
distribution for k = 1
and k = 2
.
Estimated real-valued regularity parameter.
Wheat, Rita D., and Donald G. Morrison. "Estimating purchase regularity with two interpurchase times." Journal of Marketing Research (1990): 87-93.
Dunn, Richard, Steven Reader, and Neil Wrigley. 'An investigation of the assumptions of the NBD model' Applied Statistics (1983): 249-259.
Wu, Couchen, and H-L. Chen. 'A consumer purchasing model with learning and departure behaviour.' Journal of the Operational Research Society (2000): 583-591.
https://tminka.github.io/papers/minka-gamma.pdf
data("groceryElog") estimateRegularity(groceryElog, plot = TRUE, method = 'wheat') estimateRegularity(groceryElog, plot = TRUE, method = 'mle-minka') estimateRegularity(groceryElog, plot = TRUE, method = 'mle-thom') estimateRegularity(groceryElog, plot = TRUE, method = 'cv')
data("groceryElog") estimateRegularity(groceryElog, plot = TRUE, method = 'wheat') estimateRegularity(groceryElog, plot = TRUE, method = 'mle-minka') estimateRegularity(groceryElog, plot = TRUE, method = 'mle-thom') estimateRegularity(groceryElog, plot = TRUE, method = 'cv')
These data came from an online retailer offering a broad range of grocery categories. The original data set spans four years, but lacked the customers' acquisition date. Therefore, we constructed a quasi cohort by limiting the provided data analysis to those customers who haven't purchased at all in the first two years, and had their first purchase in the first quarter of 2006. This resulted in 10483 transactions being recorded for 1525 customers during a period of two years (2006-2007).
groceryElog
groceryElog
A data frame with 10483 rows and 2 variables:
customer ID, factor vector
transaction date, Date vector
Thomas Reutterer <[email protected]>
Platzer, M., & Reutterer, T. (2016). Ticking away the moments: Timing regularity helps to better predict customer activity. Marketing Science, 35(5), 779-799. doi:10.1287/mksc.2015.0963
Calculates the log-likelihood of the (M)BG/CNBD-k model.
mbgcnbd.cbs.LL(params, cal.cbs) mbgcnbd.LL(params, x, t.x, T.cal, litt) bgcnbd.cbs.LL(params, cal.cbs) bgcnbd.LL(params, x, t.x, T.cal, litt)
mbgcnbd.cbs.LL(params, cal.cbs) mbgcnbd.LL(params, x, t.x, T.cal, litt) bgcnbd.cbs.LL(params, cal.cbs) bgcnbd.LL(params, x, t.x, T.cal, litt)
params |
A vector with model parameters |
cal.cbs |
Calibration period customer-by-sufficient-statistic (CBS)
data.frame. It must contain a row for each customer, and columns |
x |
frequency, i.e. number of re-purchases |
t.x |
recency, i.e. time elapsed from first purchase to last purchase |
T.cal |
total time of observation period |
litt |
sum of logarithmic interpurchase times |
For bgcnbd.cbs.LL
, the total log-likelihood of the provided
data. For bgcnbd.LL
, a vector of log-likelihoods as long as the
longest input vector (x
, t.x
, or T.cal
).
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
Uses (M)BG/CNBD-k model parameters and a customer's past transaction behavior to return the number of transactions they are expected to make in a given time period.
mbgcnbd.ConditionalExpectedTransactions(params, T.star, x, t.x, T.cal) bgcnbd.ConditionalExpectedTransactions(params, T.star, x, t.x, T.cal)
mbgcnbd.ConditionalExpectedTransactions(params, T.star, x, t.x, T.cal) bgcnbd.ConditionalExpectedTransactions(params, T.star, x, t.x, T.cal)
params |
A vector with model parameters |
T.star |
Length of time for which we are calculating the expected number of transactions. |
x |
Number of repeat transactions in the calibration period T.cal, or a vector of calibration period frequencies. |
t.x |
Recency, i.e. length between first and last transaction during calibration period. |
T.cal |
Length of calibration period, or a vector of calibration period lengths. |
Number of transactions a customer is expected to make in a time period of length t, conditional on their past behavior. If any of the input parameters has a length greater than 1, this will be a vector of expected number of transactions.
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) # estimate transactions for next 12 weeks xstar.est <- mbgcnbd.ConditionalExpectedTransactions(params, T.star = 12, cbs$x, cbs$t.x, cbs$T.cal) head(xstar.est) # expected number of transactions for first 6 customers sum(xstar.est) # expected total number of transactions during holdout ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) # estimate transactions for next 12 weeks xstar.est <- mbgcnbd.ConditionalExpectedTransactions(params, T.star = 12, cbs$x, cbs$t.x, cbs$T.cal) head(xstar.est) # expected number of transactions for first 6 customers sum(xstar.est) # expected total number of transactions during holdout ## End(Not run)
Estimates parameters for the (M)BG/CNBD-k model via Maximum Likelihood Estimation.
mbgcnbd.EstimateParameters( cal.cbs, k = NULL, par.start = c(1, 3, 1, 3), max.param.value = 10000, trace = 0 ) bgcnbd.EstimateParameters( cal.cbs, k = NULL, par.start = c(1, 3, 1, 3), max.param.value = 10000, trace = 0 ) mbgnbd.EstimateParameters( cal.cbs, par.start = c(1, 3, 1, 3), max.param.value = 10000, trace = 0 )
mbgcnbd.EstimateParameters( cal.cbs, k = NULL, par.start = c(1, 3, 1, 3), max.param.value = 10000, trace = 0 ) bgcnbd.EstimateParameters( cal.cbs, k = NULL, par.start = c(1, 3, 1, 3), max.param.value = 10000, trace = 0 ) mbgnbd.EstimateParameters( cal.cbs, par.start = c(1, 3, 1, 3), max.param.value = 10000, trace = 0 )
cal.cbs |
Calibration period customer-by-sufficient-statistic (CBS)
data.frame. It must contain a row for each customer, and columns |
k |
Integer-valued degree of regularity for Erlang-k distributed
interpurchase times. By default this |
par.start |
Initial (M)BG/CNBD-k parameters. A vector with |
max.param.value |
Upper bound on parameters. |
trace |
If larger than 0, then the parameter values are is printed every
|
A vector of estimated parameters.
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
Batislam, E. P., Denizel, M., & Filiztekin, A. (2007). Empirical validation and comparison of models for customer base analysis. International Journal of Research in Marketing, 24(3), 201-209. doi:10.1016/j.ijresmar.2006.12.005
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) (params <- mbgcnbd.EstimateParameters(cbs)) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) (params <- mbgcnbd.EstimateParameters(cbs)) ## End(Not run)
Returns the number of repeat transactions that a randomly chosen customer
(for whom we have no prior information) is expected to make in a given time
period, i.e. .
mbgcnbd.Expectation(params, t) bgcnbd.Expectation(params, t)
mbgcnbd.Expectation(params, t) bgcnbd.Expectation(params, t)
params |
A vector with model parameters |
t |
Length of time for which we are calculating the expected number of repeat transactions. |
Note: Computational time increases with the number of unique values of
t
.
Number of repeat transactions a customer is expected to make in a time period of length t.
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) mbgcnbd.Expectation(params, t = c(26, 52)) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) mbgcnbd.Expectation(params, t = c(26, 52)) ## End(Not run)
Calculates the expected cumulative total repeat transactions by all customers for the calibration and holdout periods.
mbgcnbd.ExpectedCumulativeTransactions(params, T.cal, T.tot, n.periods.final) bgcnbd.ExpectedCumulativeTransactions(params, T.cal, T.tot, n.periods.final)
mbgcnbd.ExpectedCumulativeTransactions(params, T.cal, T.tot, n.periods.final) bgcnbd.ExpectedCumulativeTransactions(params, T.cal, T.tot, n.periods.final)
params |
A vector with model parameters |
T.cal |
A vector to represent customers' calibration period lengths. |
T.tot |
End of holdout period. Must be a single value, not a vector. |
n.periods.final |
Number of time periods in the calibration and holdout periods. |
Note: Computational time increases with the number of unique values of
T.cal
.
Vector of length n.periods.final
with expected cumulative
total repeat transactions by all customers.
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) # Returns a vector containing expected cumulative repeat transactions for 104 # weeks, with every eigth week being reported. mbgcnbd.ExpectedCumulativeTransactions(params, T.cal = cbs$T.cal, T.tot = 104, n.periods.final = 104 / 8) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) # Returns a vector containing expected cumulative repeat transactions for 104 # weeks, with every eigth week being reported. mbgcnbd.ExpectedCumulativeTransactions(params, T.cal = cbs$T.cal, T.tot = 104, n.periods.final = 104 / 8) ## End(Not run)
Simulate data according to (M)BG/CNBD-k model assumptions
mbgcnbd.GenerateData(n, T.cal, T.star = NULL, params, date.zero = "2000-01-01") bgcnbd.GenerateData(n, T.cal, T.star = NULL, params, date.zero = "2000-01-01")
mbgcnbd.GenerateData(n, T.cal, T.star = NULL, params, date.zero = "2000-01-01") bgcnbd.GenerateData(n, T.cal, T.star = NULL, params, date.zero = "2000-01-01")
n |
Number of customers. |
T.cal |
Length of calibration period. If a vector is provided, then it
is assumed that customers have different 'birth' dates, i.e.
|
T.star |
Length of holdout period. This may be a vector. |
params |
A vector with model parameters |
date.zero |
Initial date for cohort start. Can be of class character, Date or POSIXt. |
List of length 2:
cbs |
A data.frame with a row for each customer and the summary statistic as columns. |
elog |
A data.frame with a row for each transaction, and columns |
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
params <- c(k = 3, r = 0.85, alpha = 1.45, a = 0.79, b = 2.42) data <- mbgcnbd.GenerateData(n = 200, T.cal = 24, T.star = 32, params) # customer by sufficient summary statistic - one row per customer head(data$cbs) # event log - one row per event/transaction head(data$elog)
params <- c(k = 3, r = 0.85, alpha = 1.45, a = 0.79, b = 2.42) data <- mbgcnbd.GenerateData(n = 200, T.cal = 24, T.star = 32, params) # customer by sufficient summary statistic - one row per customer head(data$cbs) # event log - one row per event/transaction head(data$elog)
Uses (M)BG/CNBD-k model parameters and a customer's past transaction behavior to return the probability that they are still alive at the end of the calibration period.
mbgcnbd.PAlive(params, x, t.x, T.cal) bgcnbd.PAlive(params, x, t.x, T.cal)
mbgcnbd.PAlive(params, x, t.x, T.cal) bgcnbd.PAlive(params, x, t.x, T.cal)
params |
A vector with model parameters |
x |
Number of repeat transactions in the calibration period T.cal, or a vector of calibration period frequencies. |
t.x |
Recency, i.e. length between first and last transaction during calibration period. |
T.cal |
Length of calibration period, or a vector of calibration period lengths. |
Probability that the customer is still alive at the end of the calibration period.
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) palive <- mbgcnbd.PAlive(params, cbs$x, cbs$t.x, cbs$T.cal) head(palive) # Probability of being alive for first 6 customers mean(palive) # Estimated share of customers to be still alive ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) palive <- mbgcnbd.PAlive(params, cbs$x, cbs$t.x, cbs$T.cal) head(palive) # Probability of being alive for first 6 customers mean(palive) # Estimated share of customers to be still alive ## End(Not run)
Plots a histogram and returns a matrix comparing the actual and expected number of customers who made a certain number of repeat transactions in the calibration period, binned according to calibration period frequencies.
mbgcnbd.PlotFrequencyInCalibration( params, cal.cbs, censor = 7, xlab = "Calibration period transactions", ylab = "Customers", title = "Frequency of Repeat Transactions" ) bgcnbd.PlotFrequencyInCalibration( params, cal.cbs, censor = 7, xlab = "Calibration period transactions", ylab = "Customers", title = "Frequency of Repeat Transactions" )
mbgcnbd.PlotFrequencyInCalibration( params, cal.cbs, censor = 7, xlab = "Calibration period transactions", ylab = "Customers", title = "Frequency of Repeat Transactions" ) bgcnbd.PlotFrequencyInCalibration( params, cal.cbs, censor = 7, xlab = "Calibration period transactions", ylab = "Customers", title = "Frequency of Repeat Transactions" )
params |
A vector with model parameters |
cal.cbs |
Calibration period CBS (customer by sufficient statistic). It must contain columns for frequency ('x') and total time observed ('T.cal'). |
censor |
Cutoff point for number of transactions in plot. |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the y axis. |
title |
Title placed on the top-center of the plot. |
Calibration period repeat transaction frequency comparison matrix (actual vs. expected).
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) mbgcnbd.PlotFrequencyInCalibration(params, cbs) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) mbgcnbd.PlotFrequencyInCalibration(params, cbs) ## End(Not run)
Plots the actual and conditional expected number transactions made by customers in the holdout period, binned according to calibration period frequencies, and returns this comparison in a matrix.
mbgcnbd.PlotFreqVsConditionalExpectedFrequency( params, T.star, cal.cbs, x.star, censor, xlab = "Calibration period transactions", ylab = "Holdout period transactions", xticklab = NULL, title = "Conditional Expectation" ) bgcnbd.PlotFreqVsConditionalExpectedFrequency( params, T.star, cal.cbs, x.star, censor, xlab = "Calibration period transactions", ylab = "Holdout period transactions", xticklab = NULL, title = "Conditional Expectation" )
mbgcnbd.PlotFreqVsConditionalExpectedFrequency( params, T.star, cal.cbs, x.star, censor, xlab = "Calibration period transactions", ylab = "Holdout period transactions", xticklab = NULL, title = "Conditional Expectation" ) bgcnbd.PlotFreqVsConditionalExpectedFrequency( params, T.star, cal.cbs, x.star, censor, xlab = "Calibration period transactions", ylab = "Holdout period transactions", xticklab = NULL, title = "Conditional Expectation" )
params |
A vector with model parameters |
T.star |
Length of the holdout period. |
cal.cbs |
Calibration period CBS (customer by sufficient statistic). It must contain columns for frequency ('x'), recency ('t.x') and total time observed ('T.cal'). |
x.star |
Vector of transactions made by each customer in the holdout period. |
censor |
Cutoff point for number of transactions in plot. |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the x axis. |
xticklab |
A vector containing a label for each tick mark on the x axis. |
title |
Title placed on the top-center of the plot. |
Holdout period transaction frequency comparison matrix (actual vs. expected).
bgcnbd.PlotFreqVsConditionalExpectedFrequency
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-09-30") params <- mbgcnbd.EstimateParameters(cbs, k=2) mbgcnbd.PlotFreqVsConditionalExpectedFrequency(params, T.star=52, cbs, cbs$x.star, censor=7) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-09-30") params <- mbgcnbd.EstimateParameters(cbs, k=2) mbgcnbd.PlotFreqVsConditionalExpectedFrequency(params, T.star=52, cbs, cbs$x.star, censor=7) ## End(Not run)
Plots the actual and conditional expected number transactions made by customers in the holdout period, binned according to calibration period recencies, and returns this comparison in a matrix.
mbgcnbd.PlotRecVsConditionalExpectedFrequency( params, cal.cbs, T.star, x.star, xlab = "Calibration period recency", ylab = "Holdout period transactions", xticklab = NULL, title = "Actual vs. Conditional Expected Transactions by Recency" ) bgcnbd.PlotRecVsConditionalExpectedFrequency( params, cal.cbs, T.star, x.star, xlab = "Calibration period recency", ylab = "Holdout period transactions", xticklab = NULL, title = "Actual vs. Conditional Expected Transactions by Recency" )
mbgcnbd.PlotRecVsConditionalExpectedFrequency( params, cal.cbs, T.star, x.star, xlab = "Calibration period recency", ylab = "Holdout period transactions", xticklab = NULL, title = "Actual vs. Conditional Expected Transactions by Recency" ) bgcnbd.PlotRecVsConditionalExpectedFrequency( params, cal.cbs, T.star, x.star, xlab = "Calibration period recency", ylab = "Holdout period transactions", xticklab = NULL, title = "Actual vs. Conditional Expected Transactions by Recency" )
params |
A vector with model parameters |
cal.cbs |
Calibration period CBS (customer by sufficient statistic). It must contain columns for frequency ('x'), recency ('t.x') and total time observed ('T.cal'). |
T.star |
Length of the holdout period. |
x.star |
Vector of transactions made by each customer in the holdout period. |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the x axis. |
xticklab |
A vector containing a label for each tick mark on the x axis. |
title |
Title placed on the top-center of the plot. |
Matrix comparing actual and conditional expected transactions in the holdout period.
bgcnbd.PlotFreqVsConditionalExpectedFrequency
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-09-30") params <- mbgcnbd.EstimateParameters(cbs, k=2) mbgcnbd.PlotRecVsConditionalExpectedFrequency(params, cbs, T.star=52, cbs$x.star) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-09-30") params <- mbgcnbd.EstimateParameters(cbs, k=2) mbgcnbd.PlotRecVsConditionalExpectedFrequency(params, cbs, T.star=52, cbs$x.star) ## End(Not run)
Plots the actual and expected cumulative total repeat transactions by all customers for the calibration and holdout periods, and returns this comparison in a matrix.
mbgcnbd.PlotTrackingCum( params, T.cal, T.tot, actual.cu.tracking.data, xlab = "Week", ylab = "Cumulative Transactions", xticklab = NULL, title = "Tracking Cumulative Transactions", ymax = NULL, legend = c("Actual", "Model") ) bgcnbd.PlotTrackingCum( params, T.cal, T.tot, actual.cu.tracking.data, xlab = "Week", ylab = "Cumulative Transactions", xticklab = NULL, title = "Tracking Cumulative Transactions", ymax = NULL, legend = c("Actual", "Model") )
mbgcnbd.PlotTrackingCum( params, T.cal, T.tot, actual.cu.tracking.data, xlab = "Week", ylab = "Cumulative Transactions", xticklab = NULL, title = "Tracking Cumulative Transactions", ymax = NULL, legend = c("Actual", "Model") ) bgcnbd.PlotTrackingCum( params, T.cal, T.tot, actual.cu.tracking.data, xlab = "Week", ylab = "Cumulative Transactions", xticklab = NULL, title = "Tracking Cumulative Transactions", ymax = NULL, legend = c("Actual", "Model") )
params |
A vector with model parameters |
T.cal |
A vector to represent customers' calibration period lengths. |
T.tot |
End of holdout period. Must be a single value, not a vector. |
actual.cu.tracking.data |
A vector containing the cumulative number of repeat transactions made by customers for each period in the total time period (both calibration and holdout periods). |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the y axis. |
xticklab |
A vector containing a label for each tick mark on the x axis. |
title |
Title placed on the top-center of the plot. |
ymax |
Upper boundary for y axis. |
legend |
plot legend, defaults to 'Actual' and 'Model'. |
Note: Computational time increases with the number of unique values of
T.cal
.
Matrix containing actual and expected cumulative repeat transactions.
mbgcnbd.ExpectedCumulativeTransactions
## Not run: data("groceryElog") groceryElog <- groceryElog[groceryElog$date < "2006-06-30", ] cbs <- elog2cbs(groceryElog, T.cal = "2006-04-30") cum <- elog2cum(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) mbgcnbd.PlotTrackingCum(params, cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), cum) ## End(Not run)
## Not run: data("groceryElog") groceryElog <- groceryElog[groceryElog$date < "2006-06-30", ] cbs <- elog2cbs(groceryElog, T.cal = "2006-04-30") cum <- elog2cum(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) mbgcnbd.PlotTrackingCum(params, cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), cum) ## End(Not run)
Plots the actual and expected incremental total repeat transactions by all customers for the calibration and holdout periods, and returns this comparison in a matrix.
mbgcnbd.PlotTrackingInc( params, T.cal, T.tot, actual.inc.tracking.data, xlab = "Week", ylab = "Transactions", xticklab = NULL, title = "Tracking Weekly Transactions", ymax = NULL, legend = c("Actual", "Model") ) bgcnbd.PlotTrackingInc( params, T.cal, T.tot, actual.inc.tracking.data, xlab = "Week", ylab = "Transactions", xticklab = NULL, title = "Tracking Weekly Transactions", ymax = NULL, legend = c("Actual", "Model") )
mbgcnbd.PlotTrackingInc( params, T.cal, T.tot, actual.inc.tracking.data, xlab = "Week", ylab = "Transactions", xticklab = NULL, title = "Tracking Weekly Transactions", ymax = NULL, legend = c("Actual", "Model") ) bgcnbd.PlotTrackingInc( params, T.cal, T.tot, actual.inc.tracking.data, xlab = "Week", ylab = "Transactions", xticklab = NULL, title = "Tracking Weekly Transactions", ymax = NULL, legend = c("Actual", "Model") )
params |
A vector with model parameters |
T.cal |
A vector to represent customers' calibration period lengths. |
T.tot |
End of holdout period. Must be a single value, not a vector. |
actual.inc.tracking.data |
A vector containing the incremental number of repeat transactions made by customers for each period in the total time period (both calibration and holdout periods). |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the y axis. |
xticklab |
A vector containing a label for each tick mark on the x axis. |
title |
Title placed on the top-center of the plot. |
ymax |
Upper boundary for y axis. |
legend |
plot legend, defaults to 'Actual' and 'Model'. |
Note: Computational time increases with the number of unique values of
T.cal
.
Matrix containing actual and expected incremental repeat transactions.
mbgcnbd.ExpectedCumulativeTransactions
## Not run: data("groceryElog") groceryElog <- groceryElog[groceryElog$date < "2006-06-30", ] cbs <- elog2cbs(groceryElog, T.cal = "2006-04-30") inc <- elog2inc(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) mbgcnbd.PlotTrackingInc(params, cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), inc) ## End(Not run)
## Not run: data("groceryElog") groceryElog <- groceryElog[groceryElog$date < "2006-06-30", ] cbs <- elog2cbs(groceryElog, T.cal = "2006-04-30") inc <- elog2inc(groceryElog) params <- mbgcnbd.EstimateParameters(cbs, k = 2) mbgcnbd.PlotTrackingInc(params, cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), inc) ## End(Not run)
Uses (M)BG/CNBD-k model parameters to return the probability distribution of
purchase frequencies for a random customer in a given time period, i.e.
.
mbgcnbd.pmf(params, t, x) bgcnbd.pmf(params, t, x)
mbgcnbd.pmf(params, t, x) bgcnbd.pmf(params, t, x)
params |
A vector with model parameters |
t |
Length end of time period for which probability is being computed. May also be a vector. |
x |
Number of repeat transactions for which probability is calculated. May also be a vector. |
. If either
t
or x
is a
vector, then the output will be a vector as well. If both are vectors, the
output will be a matrix.
(M)BG/CNBD-k: Reutterer, T., Platzer, M., & Schroeder, N. (2020). Leveraging purchase regularity for predicting customer behavior the easy way. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2020.09.002
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) mbgcnbd.pmf(params, t = 52, x = 0:6) mbgcnbd.pmf(params, t = c(26, 52), x = 0:6) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog) params <- mbgcnbd.EstimateParameters(cbs) mbgcnbd.pmf(params, t = 52, x = 0:6) mbgcnbd.pmf(params, t = c(26, 52), x = 0:6) ## End(Not run)
For each customer and each provided MCMC parameter draw this method will
sample the number of transactions during the holdout period T.star
. If
argument size
is provided then it returns a flexible number of draws,
whereas for each customer and each draw it will first make a draw from the
parameter draws.
mcmc.DrawFutureTransactions( cal.cbs, draws, T.star = cal.cbs$T.star, sample_size = NULL )
mcmc.DrawFutureTransactions( cal.cbs, draws, T.star = cal.cbs$T.star, sample_size = NULL )
cal.cbs |
Calibration period customer-by-sufficient-statistic (CBS) data.frame. |
draws |
MCMC draws as returned by |
T.star |
Length of period for which future transactions are counted. |
sample_size |
Number of samples to draw. Defaults to the same number of
parameter draws that are passed to |
2-dim matrix [draw x customer] with sampled future transactions.
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws) cbs$xstar.est <- apply(xstar.draws, 2, mean) cbs$pactive <- mcmc.PActive(xstar.draws) head(cbs)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws) cbs$xstar.est <- apply(xstar.draws, 2, mean) cbs$pactive <- mcmc.PActive(xstar.draws) head(cbs)
Uses model parameter draws to return the expected number of repeat transactions that a randomly chosen customer (for whom we have no prior information) is expected to make in a given time period.
.
mcmc.Expectation(draws, t, sample_size = 10000)
mcmc.Expectation(draws, t, sample_size = 10000)
draws |
MCMC draws as returned by |
t |
Length of time for which we are calculating the expected number of transactions. May also be a vector. |
sample_size |
Sample size for estimating the probability distribution. |
The expected transactions need to be sampled. Due to this sampling, the
return result varies from one call to another. Larger values of
sample_size
will generate more stable results.
Number of repeat transactions a customer is expected to make in a time period of length t.
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast mcmc.Expectation(param.draws, t = c(26, 52))
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast mcmc.Expectation(param.draws, t = c(26, 52))
Uses model parameter draws to return the expected number of repeat transactions that a randomly chosen customer (for whom we have no prior information) is expected to make in a given time period.
mcmc.ExpectedCumulativeTransactions( draws, T.cal, T.tot, n.periods.final, sample_size = 10000, covariates = NULL )
mcmc.ExpectedCumulativeTransactions( draws, T.cal, T.tot, n.periods.final, sample_size = 10000, covariates = NULL )
draws |
MCMC draws as returned by |
T.cal |
A vector to represent customers' calibration period lengths (in
other words, the |
T.tot |
End of holdout period. Must be a single value, not a vector. |
n.periods.final |
Number of time periods in the calibration and holdout periods. |
sample_size |
Sample size for estimating the probability distribution. |
covariates |
(optional) Matrix of covariates, for Pareto/NBD (Abe)
model, passed to |
The expected transactions need to be sampled. Due to this sampling, the
return result varies from one call to another. Larger values of
sample_size
will generate more stable results.
Numeric vector of expected cumulative total repeat transactions by all customers.
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast # Returns a vector containing expected cumulative repeat transactions for 104 # weeks, with every eigth week being reported. mcmc.ExpectedCumulativeTransactions(param.draws, T.cal = cbs$T.cal, T.tot = 104, n.periods.final = 104/8, sample_size = 1000)
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast # Returns a vector containing expected cumulative repeat transactions for 104 # weeks, with every eigth week being reported. mcmc.ExpectedCumulativeTransactions(param.draws, T.cal = cbs$T.cal, T.tot = 104, n.periods.final = 104/8, sample_size = 1000)
Calculates P(active) based on drawn future transactions.
mcmc.PActive(xstar)
mcmc.PActive(xstar)
xstar |
Future transaction draws as returned by
|
numeric A vector with the customers' probabilities of being active during the holdout period.
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws) cbs$pactive <- mcmc.PActive(xstar.draws) head(cbs)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws) cbs$pactive <- mcmc.PActive(xstar.draws) head(cbs)
Calculates P(alive) based on MCMC parameter draws
mcmc.PAlive(draws)
mcmc.PAlive(draws)
draws |
MCMC draws as returned by |
Numeric vector with the customers' probabilities of being still alive at end of calibration period
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast palive <- mcmc.PAlive(param.draws) head(palive) mean(palive)
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast palive <- mcmc.PAlive(param.draws) head(palive) mean(palive)
Plots a histogram and returns a matrix comparing the actual and expected number of customers who made a certain number of repeat transactions in the calibration period, binned according to calibration period frequencies.
mcmc.PlotFrequencyInCalibration( draws, cal.cbs, censor = 7, xlab = "Calibration period transactions", ylab = "Customers", title = "Frequency of Repeat Transactions", sample_size = 1000 )
mcmc.PlotFrequencyInCalibration( draws, cal.cbs, censor = 7, xlab = "Calibration period transactions", ylab = "Customers", title = "Frequency of Repeat Transactions", sample_size = 1000 )
draws |
MCMC draws as returned by |
cal.cbs |
Calibration period customer-by-sufficient-statistic (CBS) data.frame. It must contain columns for frequency ('x') and total time observed ('T.cal'). |
censor |
Cutoff point for number of transactions in plot. |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the y axis. |
title |
Title placed on the top-center of the plot. |
sample_size |
Sample size for estimating the probability distribution.
See |
The method mcmc.pmf
is called to calculate the expected numbers
based on the corresponding model.
Calibration period repeat transaction frequency comparison matrix (actual vs. expected).
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast mcmc.PlotFrequencyInCalibration(param.draws, cbs, sample_size = 100) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast mcmc.PlotFrequencyInCalibration(param.draws, cbs, sample_size = 100) ## End(Not run)
Draw diagnostic plot to inspect error in P(active).
mcmc.plotPActiveDiagnostic(cbs, xstar, title = "Diagnostic Plot for P(active)")
mcmc.plotPActiveDiagnostic(cbs, xstar, title = "Diagnostic Plot for P(active)")
cbs |
A data.frame with column |
xstar |
Future transaction draws as returned by
|
title |
Plot title. |
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws) mcmc.plotPActiveDiagnostic(cbs, xstar.draws)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws) mcmc.plotPActiveDiagnostic(cbs, xstar.draws)
Plots the actual and expected cumulative total repeat transactions by all customers for the calibration and holdout periods, and returns this comparison in a matrix.
mcmc.PlotTrackingCum( draws, T.cal, T.tot, actual.cu.tracking.data, xlab = "Week", ylab = "Cumulative Transactions", xticklab = NULL, title = "Tracking Cumulative Transactions", ymax = NULL, sample_size = 10000, covariates = NULL, legend = c("Actual", "Model") )
mcmc.PlotTrackingCum( draws, T.cal, T.tot, actual.cu.tracking.data, xlab = "Week", ylab = "Cumulative Transactions", xticklab = NULL, title = "Tracking Cumulative Transactions", ymax = NULL, sample_size = 10000, covariates = NULL, legend = c("Actual", "Model") )
draws |
MCMC draws as returned by |
T.cal |
A vector to represent customers' calibration period lengths (in
other words, the |
T.tot |
End of holdout period. Must be a single value, not a vector. |
actual.cu.tracking.data |
A vector containing the cumulative number of repeat transactions made by customers for each period in the total time period (both calibration and holdout periods). |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the y axis. |
xticklab |
A vector containing a label for each tick mark on the x axis. |
title |
Title placed on the top-center of the plot. |
ymax |
Upper boundary for y axis. |
sample_size |
Sample size for estimating the probability distribution.
See |
covariates |
(optional) Matrix of covariates, for Pareto/NBD (Abe)
model, passed to |
legend |
plot legend, defaults to 'Actual' and 'Model'. |
The expected transactions need to be sampled. Due to this sampling, the
return result varies from one call to another. Larger values of
sample_size
will generate more stable results.
Matrix containing actual and expected cumulative repeat transactions.
mcmc.PlotTrackingInc
mcmc.ExpectedCumulativeTransactions
elog2cum
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") cum <- elog2cum(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs) mat <- mcmc.PlotTrackingCum(param.draws, T.cal = cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), actual.cu.tracking.data = cum) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") cum <- elog2cum(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs) mat <- mcmc.PlotTrackingCum(param.draws, T.cal = cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), actual.cu.tracking.data = cum) ## End(Not run)
Plots the actual and expected incremental total repeat transactions by all customers for the calibration and holdout periods, and returns this comparison in a matrix.
mcmc.PlotTrackingInc( draws, T.cal, T.tot, actual.inc.tracking.data, xlab = "Week", ylab = "Transactions", xticklab = NULL, title = "Tracking Weekly Transactions", ymax = NULL, sample_size = 10000, covariates = NULL, legend = c("Actual", "Model") )
mcmc.PlotTrackingInc( draws, T.cal, T.tot, actual.inc.tracking.data, xlab = "Week", ylab = "Transactions", xticklab = NULL, title = "Tracking Weekly Transactions", ymax = NULL, sample_size = 10000, covariates = NULL, legend = c("Actual", "Model") )
draws |
MCMC draws as returned by |
T.cal |
A vector to represent customers' calibration period lengths (in
other words, the |
T.tot |
End of holdout period. Must be a single value, not a vector. |
actual.inc.tracking.data |
A vector containing the incremental number of repeat transactions made by customers for each period in the total time period (both calibration and holdout periods). |
xlab |
Descriptive label for the x axis. |
ylab |
Descriptive label for the y axis. |
xticklab |
A vector containing a label for each tick mark on the x axis. |
title |
Title placed on the top-center of the plot. |
ymax |
Upper boundary for y axis. |
sample_size |
Sample size for estimating the probability distribution.
See |
covariates |
(optional) Matrix of covariates, for Pareto/NBD (Abe)
model, passed to |
legend |
plot legend, defaults to 'Actual' and 'Model'. |
The expected transactions need to be sampled. Due to this sampling, the
return result varies from one call to another. Larger values of
sample_size
will generate more stable results.
Matrix containing actual and expected incremental repeat transactions.
mcmc.PlotTrackingCum
mcmc.ExpectedCumulativeTransactions
elog2inc
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") inc <- elog2inc(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs) mat <- mcmc.PlotTrackingInc(param.draws, T.cal = cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), actual.inc.tracking.data = inc) ## End(Not run)
## Not run: data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") inc <- elog2inc(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs) mat <- mcmc.PlotTrackingInc(param.draws, T.cal = cbs$T.cal, T.tot = max(cbs$T.cal + cbs$T.star), actual.inc.tracking.data = inc) ## End(Not run)
Return the probability distribution of purchase frequencies for a random
customer in a given time period, i.e. . This is estimated by
generating
sample_size
number of random customers that follow the
provided parameter draws. Due to this sampling, the return result varies from
one call to another.
mcmc.pmf(draws, t, x, sample_size = 10000, covariates = NULL)
mcmc.pmf(draws, t, x, sample_size = 10000, covariates = NULL)
draws |
MCMC draws as returned by |
t |
Length of time for which we are calculating the expected number of transactions. May also be a vector. |
x |
Number of transactions for which probability is calculated. May also be a vector. |
sample_size |
Sample size for estimating the probability distribution. |
covariates |
(optional) Matrix of covariates, for Pareto/NBD (Abe)
model, passed to |
. If either
t
or x
is a vector, then the
output will be a vector as well. If both are vectors, the output will be a
matrix.
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast mcmc.pmf(param.draws, t = c(26, 52), x = 0:6)
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast mcmc.pmf(param.draws, t = c(26, 52), x = 0:6)
(Re-)set burnin of MCMC chains.
mcmc.setBurnin(draws, burnin)
mcmc.setBurnin(draws, burnin)
draws |
MCMC draws as returned by |
burnin |
New start index. |
2-element list with MCMC draws
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast param.draws.stable <- mcmc.setBurnin(param.draws, burnin = 80)
data("groceryElog") cbs <- elog2cbs(groceryElog) param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast param.draws.stable <- mcmc.setBurnin(param.draws, burnin = 80)
Calculate the log-likelihood of the NBD model
nbd.cbs.LL(params, cal.cbs)
nbd.cbs.LL(params, cal.cbs)
params |
NBD parameters - a vector with r and alpha, in that order. |
cal.cbs |
Calibration period CBS. It must contain columns for frequency
|
The total log-likelihood for the provided data.
data("groceryElog") cbs <- elog2cbs(groceryElog) params <- nbd.EstimateParameters(cbs) nbd.cbs.LL(params, cbs)
data("groceryElog") cbs <- elog2cbs(groceryElog) params <- nbd.EstimateParameters(cbs) nbd.cbs.LL(params, cbs)
Uses NBD model parameters and a customer's past transaction behavior to return the number of transactions they are expected to make in a given time period.
nbd.ConditionalExpectedTransactions(params, T.star, x, T.cal)
nbd.ConditionalExpectedTransactions(params, T.star, x, T.cal)
params |
NBD parameters - a vector with |
T.star |
Length of time for which we are calculating the expected number of transactions. |
x |
Number of repeat transactions in the calibration period |
T.cal |
Length of calibration period, or a vector of calibration period lengths. |
Number of transactions a customer is expected to make in a time period of length t, conditional on their past behavior. If any of the input parameters has a length greater than 1, this will be a vector of expected number of transactions.
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") params <- nbd.EstimateParameters(cbs) xstar.est <- nbd.ConditionalExpectedTransactions(params, cbs$T.star, cbs$x, cbs$T.cal) sum(xstar.est) # expected total number of transactions during holdout
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") params <- nbd.EstimateParameters(cbs) xstar.est <- nbd.ConditionalExpectedTransactions(params, cbs$T.star, cbs$x, cbs$T.cal) sum(xstar.est) # expected total number of transactions during holdout
Estimates parameters for the NBD model via Maximum Likelihood Estimation.
nbd.EstimateParameters(cal.cbs, par.start = c(1, 1), max.param.value = 10000)
nbd.EstimateParameters(cal.cbs, par.start = c(1, 1), max.param.value = 10000)
cal.cbs |
Calibration period CBS. It must contain columns for frequency
|
par.start |
Initial NBD parameters - a vector with |
max.param.value |
Upper bound on parameters. |
List of estimated parameters.
Ehrenberg, A. S. (1959). The pattern of consumer purchases. Journal of the Royal Statistical Society: Series C (Applied Statistics), 8(1), 26-41. doi:10.2307/2985810
data("groceryElog") cbs <- elog2cbs(groceryElog) nbd.EstimateParameters(cbs)
data("groceryElog") cbs <- elog2cbs(groceryElog) nbd.EstimateParameters(cbs)
Simulate data according to NBD model assumptions
nbd.GenerateData(n, T.cal, T.star, params, date.zero = "2000-01-01")
nbd.GenerateData(n, T.cal, T.star, params, date.zero = "2000-01-01")
n |
Number of customers. |
T.cal |
Length of calibration period. |
T.star |
Length of holdout period. This may be a vector. |
params |
NBD parameters - a vector with |
date.zero |
Initial date for cohort start. Can be of class character, Date or POSIXt. |
List of length 2:
cbs |
A data.frame with a row for each customer and the summary statistic as columns. |
elog |
A data.frame with a row for each transaction, and columns |
n <- 200 # no. of customers T.cal <- 32 # length of calibration period T.star <- 32 # length of hold-out period params <- c(r = 0.85, alpha = 4.45) # purchase frequency lambda_i ~ Gamma(r, alpha) data <- nbd.GenerateData(n, T.cal, T.star, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
n <- 200 # no. of customers T.cal <- 32 # length of calibration period T.star <- 32 # length of hold-out period params <- c(r = 0.85, alpha = 4.45) # purchase frequency lambda_i ~ Gamma(r, alpha) data <- nbd.GenerateData(n, T.cal, T.star, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
Calculate the log-likelihood of the NBD model
nbd.LL(params, x, T.cal)
nbd.LL(params, x, T.cal)
params |
NBD parameters - a vector with |
x |
Frequency, i.e. number of re-purchases. |
T.cal |
Total time of observation period. |
A numeric vector of log-likelihoods.
Simulate data according to Pareto/GGG model assumptions
pggg.GenerateData(n, T.cal, T.star, params, date.zero = "2000-01-01")
pggg.GenerateData(n, T.cal, T.star, params, date.zero = "2000-01-01")
n |
Number of customers. |
T.cal |
Length of calibration period. If a vector is provided, then it
is assumed that customers have different 'birth' dates, i.e.
|
T.star |
Length of holdout period. This may be a vector. |
params |
A list of model parameters |
date.zero |
Initial date for cohort start. Can be of class character, Date or POSIXt. |
List of length 2:
cbs |
A data.frame with a row for each customer and the summary statistic as columns. |
elog |
A data.frame with a row for each transaction, and columns |
Platzer, M., & Reutterer, T. (2016). Ticking away the moments: Timing regularity helps to better predict customer activity. Marketing Science, 35(5), 779-799. doi:10.1287/mksc.2015.0963
params <- list(t = 4.5, gamma = 1.5, r = 5, alpha = 10, s = 0.8, beta = 12) data <- pggg.GenerateData(n = 200, T.cal = 32, T.star = 32, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
params <- list(t = 4.5, gamma = 1.5, r = 5, alpha = 10, s = 0.8, beta = 12) data <- pggg.GenerateData(n = 200, T.cal = 32, T.star = 32, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
Returns draws from the posterior distributions of the Pareto/GGG parameters, on cohort as well as on customer level.
pggg.mcmc.DrawParameters( cal.cbs, mcmc = 2500, burnin = 500, thin = 50, chains = 2, mc.cores = NULL, param_init = NULL, trace = 100 )
pggg.mcmc.DrawParameters( cal.cbs, mcmc = 2500, burnin = 500, thin = 50, chains = 2, mc.cores = NULL, param_init = NULL, trace = 100 )
cal.cbs |
Calibration period customer-by-sufficient-statistic (CBS)
data.frame. It must contain a row for each customer, and columns |
mcmc |
Number of MCMC steps. |
burnin |
Number of initial MCMC steps which are discarded. |
thin |
Only every |
chains |
Number of MCMC chains to be run. |
mc.cores |
Number of cores to use in parallel (Unix only). Defaults to |
param_init |
List of start values for cohort-level parameters. |
trace |
Print logging statement every |
See demo('pareto-ggg')
for how to apply this model.
List of length 2:
level_1 |
list of |
level_2 |
|
Platzer, M., & Reutterer, T. (2016). Ticking away the moments: Timing regularity helps to better predict customer activity. Marketing Science, 35(5), 779-799. doi:10.1287/mksc.2015.0963
pggg.GenerateData
mcmc.PAlive
mcmc.DrawFutureTransactions
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pggg.mcmc.DrawParameters(cbs, mcmc = 20, burnin = 10, thin = 2, chains = 1) # short MCMC to run demo fast # cohort-level parameter draws as.matrix(param.draws$level_2) # customer-level parameter draws for customer with ID '4' as.matrix(param.draws$level_1[["4"]]) # estimate future transactions xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws, cbs$T.star) xstar.est <- apply(xstar.draws, 2, mean) head(xstar.est)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pggg.mcmc.DrawParameters(cbs, mcmc = 20, burnin = 10, thin = 2, chains = 1) # short MCMC to run demo fast # cohort-level parameter draws as.matrix(param.draws$level_2) # customer-level parameter draws for customer with ID '4' as.matrix(param.draws$level_1[["4"]]) # estimate future transactions xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws, cbs$T.star) xstar.est <- apply(xstar.draws, 2, mean) head(xstar.est)
Plots and returns the estimated gamma distribution of k (customers' regularity in interpurchase times).
pggg.plotRegularityRateHeterogeneity( draws, xmax = NULL, fn = NULL, title = "Distribution of Regularity Rate k" )
pggg.plotRegularityRateHeterogeneity( draws, xmax = NULL, fn = NULL, title = "Distribution of Regularity Rate k" )
draws |
MCMC draws as returned by |
xmax |
Upper bound for x-scale. |
fn |
Optional function to summarize individual-level draws for k, e.g. 'mean'. |
title |
Plot title. |
Platzer, M., & Reutterer, T. (2016). Ticking away the moments: Timing regularity helps to better predict customer activity. Marketing Science, 35(5), 779-799. doi:10.1287/mksc.2015.0963
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pggg.mcmc.DrawParameters(cbs, mcmc = 20, burnin = 10, thin = 2, chains = 1) # short MCMC to run demo fast pggg.plotRegularityRateHeterogeneity(param.draws)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pggg.mcmc.DrawParameters(cbs, mcmc = 20, burnin = 10, thin = 2, chains = 1) # short MCMC to run demo fast pggg.plotRegularityRateHeterogeneity(param.draws)
Plot timing patterns of sampled customers
plotTimingPatterns( elog, n = 40, T.cal = NULL, T.tot = NULL, title = "Sampled Timing Patterns", headers = NULL )
plotTimingPatterns( elog, n = 40, T.cal = NULL, T.tot = NULL, title = "Sampled Timing Patterns", headers = NULL )
elog |
Event log, a |
n |
Number of sampled customers. |
T.cal |
End of calibration period, which is visualized as a vertical line. |
T.tot |
End of observation period |
title |
Plot title. |
headers |
Vector of length 2 for adding headers to the plot, e.g.
|
data("groceryElog") plotTimingPatterns(groceryElog, T.tot = "2008-12-31") plotTimingPatterns(groceryElog, T.cal = "2006-12-31", headers = c("Calibration", "Holdout"))
data("groceryElog") plotTimingPatterns(groceryElog, T.tot = "2008-12-31") plotTimingPatterns(groceryElog, T.cal = "2006-12-31", headers = c("Calibration", "Holdout"))
Simulate data according to Pareto/NBD model assumptions
pnbd.GenerateData(n, T.cal, T.star, params, date.zero = "2000-01-01")
pnbd.GenerateData(n, T.cal, T.star, params, date.zero = "2000-01-01")
n |
Number of customers. |
T.cal |
Length of calibration period. If a vector is provided, then it
is assumed that customers have different 'birth' dates, i.e.
|
T.star |
Length of holdout period. This may be a vector. |
params |
A list of model parameters |
date.zero |
Initial date for cohort start. Can be of class character, Date or POSIXt. |
List of length 2:
cbs |
A data.frame with a row for each customer and the summary statistic as columns. |
elog |
A data.frame with a row for each transaction, and columns |
params <- list(r = 5, alpha = 10, s = 0.8, beta = 12) data <- pnbd.GenerateData(n = 200, T.cal = 32, T.star = 32, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
params <- list(r = 5, alpha = 10, s = 0.8, beta = 12) data <- pnbd.GenerateData(n = 200, T.cal = 32, T.star = 32, params) cbs <- data$cbs # customer by sufficient summary statistic - one row per customer elog <- data$elog # Event log - one row per event/purchase
Returns draws from the posterior distributions of the Pareto/NBD (HB) parameters, on cohort as well as on customer level.
pnbd.mcmc.DrawParameters( cal.cbs, mcmc = 2500, burnin = 500, thin = 50, chains = 2, mc.cores = NULL, use_data_augmentation = TRUE, param_init = NULL, trace = 100 )
pnbd.mcmc.DrawParameters( cal.cbs, mcmc = 2500, burnin = 500, thin = 50, chains = 2, mc.cores = NULL, use_data_augmentation = TRUE, param_init = NULL, trace = 100 )
cal.cbs |
Calibration period customer-by-sufficient-statistic (CBS)
data.frame. It must contain a row for each customer, and columns |
mcmc |
Number of MCMC steps. |
burnin |
Number of initial MCMC steps which are discarded. |
thin |
Only every |
chains |
Number of MCMC chains to be run. |
mc.cores |
Number of cores to use in parallel (Unix only). Defaults to |
use_data_augmentation |
deprecated |
param_init |
List of start values for cohort-level parameters. |
trace |
Print logging statement every |
See demo('pareto-ggg')
for how to apply this model.
2-element list:
level_1
list of mcmc.list
s, one for each customer, with draws for customer-level parameters lambda
, tau
, z
, mu
level_2
mcmc.list
, with draws for cohort-level parameters r
, alpha
, s
, beta
Ma, S. H., & Liu, J. L. (2007, August). The MCMC approach for solving the Pareto/NBD model and possible extensions. In Third international conference on natural computation (ICNC 2007) (Vol. 2, pp. 505-512). IEEE. doi:10.1109/ICNC.2007.728
Abe, M. (2009). "Counting your customers" one by one: A hierarchical Bayes extension to the Pareto/NBD model. Marketing Science, 28(3), 541-553. doi:10.1287/mksc.1090.0502
pnbd.GenerateData
mcmc.DrawFutureTransactions
mcmc.PAlive
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast # cohort-level parameter draws as.matrix(param.draws$level_2) # customer-level parameter draws for customer with ID '4' as.matrix(param.draws$level_1[["4"]]) # estimate future transactions xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws, cbs$T.star) xstar.est <- apply(xstar.draws, 2, mean) head(xstar.est)
data("groceryElog") cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31") param.draws <- pnbd.mcmc.DrawParameters(cbs, mcmc = 100, burnin = 50, thin = 10, chains = 1) # short MCMC to run demo fast # cohort-level parameter draws as.matrix(param.draws$level_2) # customer-level parameter draws for customer with ID '4' as.matrix(param.draws$level_1[["4"]]) # estimate future transactions xstar.draws <- mcmc.DrawFutureTransactions(cbs, param.draws, cbs$T.star) xstar.est <- apply(xstar.draws, 2, mean) head(xstar.est)