680 likes | 1.02k Views
Probability Models in Marketing. Marketing models attempt to describe or predict behaviour Usually include a random element to allow for imperfect knowledge We will develop probability models that specify a random model for individual behaviour
E N D
Probability Models in Marketing • Marketing models attempt to describe or predict behaviour • Usually include a random element to allow for imperfect knowledge • We will develop probability models that specify a random model for individual behaviour • Sum this across individuals to get a model of aggregate measures • May need to incorporate differences between individuals into the model
Uses of Probability Models • Understand and profile individual behaviour • Understand market-level patterns, and their origin in individual behaviours • Provide norms or benchmarks for comparison • Ehrenburg: Understanding Buyer Behaviour; and Repeat-Buying (1988) • Latter book available free online at http://www.empgens.com/ehrenberg.html#repeat • Prediction or forecasting of: • Aggregate results beyond current observation period • Individual behaviour, given knowledge of past actions
Product Trial Example • Have a newly launched product • Multi-pack juice drink, aimed at children • Launched in test market • May be rolled out nationally if successful • Measure trial over time • Based on household scanner panel data, e.g. ACNielsen’s HomeScan • Have data from first 13 weeks • Want to predict trial 13 weeks later
Develop Probability Model • Variable of interest (for individual households) • When did they first try the product? • Treat time of first purchase T as a random variable • Assume this has an exponential distribution, with trial rate λ • Probability of trial by time t for each household is • Averaging this across all households would give the same result, but this would not be realistic – why?
Market Level Model • Assume there are two groups of consumers • One group may try product (λ>0) • Other group will never try product (λ~0) • In proportions p and 1-p respectively • “Exponential with never-triers” model: • Note: technically this is not a cdf as it does not =1 as t approaches infinity, but as we are only dealing in relatively small values of t this approximation is valid.
Estimate Parameters • Model has parameters p and λ • Estimate these parameters using maximum likelihood • The likelihood function is the probability that this dataset would be observed • Viewed as a function of the parameters • Assumes the model holds • L(parameters) = P(this data observed|parameters) • The maximum likelihood estimates (MLEs) of the parameters are the values that maximise L(.), for the given dataset • Can equivalently maximise l(.), the log-likelihood
Implementing MLE • The maximum likelihood method can be implemented relatively easily in many software environments • E.g. R, SAS, Excel • It may already be implemented if the model is commonly used • R code for exponential w. never-triers model: • trial<-c(8,14,16,32,40,47,50,52,57,60,65,67,68) • Trial1 <- trial – c(0,trial[1:12]) • F <- function(t,p,lambda) { p*(1-exp(-lambda*t)) }
R Code (continued) l <- function(p,lambda,data) { week <- 1:13 if ((p>=0) && (p<=1)) { sum(data*log(F(week,p,lambda) - F(week-1,p,lambda))) + (1499-sum(data))*log(1-F(13,p,lambda)) } else {NaN} } optim(c(.2,.2),function(param) {-l(param[1],param[2],Trial1)}) • Result: maximum value of log-likelihood is -445.84, which is achieved at p=0.060 and λ=0.109 • Complications due to sample design and weighting have been ignored
Forecasting • Can use fitted model to forecast trial • Let N(t) be a random variable, being the number of households in the panel purchasing the product by time t • Forecast trial as:
Model Extensions • Current model assumes same trial rate for all households, except never triers • May be overly simplistic • Can allow for multiple segments of households, each with different underlying trial rate
Model Extensions • Finite mixture models can be hard to fit • Local minima are common • Another alternative that allows for consumer heterogeneity is a continuous mixture model • Assume trial rates are distributed with pdf g(λ) • The discrete mixture model can be thought of as an approximation to the underlying continuous distribution of trial rates
Gamma Trial Rate Distribution • Assume trial rates are distributed according to a gamma distribution where α is a shape parameter and β is an inverse scale parameter • The gamma distribution is a flexible, unimodal, mathematically tractable distribution
Market-Level Model • The resulting cumulative distribution of first trial times, at an overall market level, is • This is called an exponential-gamma model
Estimating Parameters • R Code for finding MLEs: Fg <- function(t,alpha,beta) { 1 - (beta/(beta+t))^alpha } lg <- function(alpha,beta,data) { week <- 1:13 sum(data*log(Fg(week,alpha,beta) - Fg(week-1,alpha,beta))) + (1499-sum(data))*log(1-Fg(13,alpha,beta)) } optim(c(1,1),function(param) {-lg(param[1],param[2],trial1)}) • Result: maximum value of log-likelihood is -446.64, which is achieved at α=0.0416 and β=6.32
Further Extensions • Could add a “never try” component into the exponential-gamma model • Could incorporate the effects of marketing covariates • E.g. advertising weight over time • Could incorporate the effects of household covariates • E.g. presence of children
Building a Probability Model:General Approach • Determine the marketing problem or information needed • Identify the behaviour of interest at the individual level • Make sure this is observable; denote by x • Choose an appropriate probability distribution f(x|θ) • The parameters θ of this distribution can be thought of as latent traits of each individual • Latent or underlying traits; not observed directly but affect x
General Approach (continued) • Specify a distribution for the latent traits across the population • Denote this by g(θ) • Called the mixing distribution • Can be discrete, continuous or a combination • Obtain the resulting aggregate market-level distribution (if this is observed or of interest) by integrating with respect to θ
General Approach (continued) • Estimate the parameters of the mixing distribution • Usually done using maximum likelihood • Check model fit, graphically if possible • Use the fitted model to solve the marketing problem or to obtain the required information
Outdoor Advertising Example • Advertisers can buy a “monthly showing” on a set of specific billboards • Effectiveness of the showing is primarily evaluated through three measures • Reach, frequency and gross ratings points (GRPs) • Measures derived from daily travel maps filled in by a sample of people • An “exposure” is counted when a respondent goes past one of the billboards, while facing the billboard • Have data from each person for one week • Want to project from this data to get measures for the relevant month (or four weeks)
Measures of Advertising Exposure • Three measures are commonly used • Reach is the proportion of people exposed to the advertising at least once during the month • Frequency is the number of times each person is exposed to the advertising message • Usually summarised as the average frequency, which is the average number of exposures experienced among those who were exposed • Gross rating points (GRPs) is the mean number of exposures per 100 people • This is just the product of the reach (expressed as a percentage) with the average frequency
Model: Aim and Approach • Goal: Develop a model that uses one week data to provide an estimate of the monthly performance measures • Approach • Model the weekly exposure distribution • Derive the monthly exposure distribution under this model, and estimate summary statistics for the month
Probability Model • Let X denote the number of billboard exposures during one week • For each person, X is assumed to have a Poisson distribution with rate parameter λ • We assume that the exposure rates λ have a gamma distribution
Probability Model • Aggregating across the population (i.e. integrating with respect to λ) gives • This Poisson-Gamma distribution is also known as the negative binomial distribution, or NBD • It has mean α/β and variance α(β+1)/β2
Estimating Model Parameters • R Code: expodist <- c(48,37,30,24,20,16,13,11,9,7,6,5,5,3,3,2,2,2,1,1,2,1,1,1) lnbd <- function(alpha,beta,data) { expos <- 0:23 prob <- beta/(beta+1) sum(data*log(dnbinom(expos,alpha,prob))) } optim(c(1,1),function(param) {-lnbd(param[1],param[2],expodist)}) • Result: maximum value of log-likelihood is -649.7, which is achieved at α=0.969 and β=0.218
NBD For More Than 1 Week • Let X(t) denote the number of exposures experienced by a person over t weeks • Suppose that over one week, the exposure distribution for that person is Poisson(λ) • Then X(t) is also Poisson, with rate parameter λt
NBD For More Than 1 Week • The market-level exposure distribution is • This has mean
Performance of Monthly Showing • For t=4: • P(X(t)=0) = 0.056 • E[X(t)] = 17.82 • So: • Reach = 1 - P(X(t)=0) = 94.4% • Average Frequency = E[X(t)] / (1 - P(X(t)=0) = 18.9 • GRPs = 100* E[X(t)] =1782
Log-Likelihood Calculation • If data available as counts (for discrete or discretised data), can use • Sum of (count times log probability) • E.g. sum(data*log(dnbinom(expos,alpha,prob))) • Sum of (count times (increase in distribution function)) • E.g. sum(data*log(F(week,p,lambda) - F(week-1,p,lambda))) + (1499-sum(data))*log(1-F(13,p,lambda))
Direct Marketing Example • Have customer database containing data on past purchases • 126 segments defined based on purchase histories • We’ll cover segmentation methods later • Believe that some customers are more likely to respond to mail-out than others • Send test mail-out to 3% sample of customers • Analyse response by segment to identify most profitable groups to target
Target Segments • Profitable to send mail-out if it costs less than the profit on resulting sales • i.e. if the expected rate of purchase response (PRR) is above the following cut-off: PRR > cost per letter of mail-out / unit margin • Mail-out cost is 33.43 cents per letter • Unit margin is $161.50 • Cut-off rate is 0.21% • Standard approach • Conduct full mail-out to all segments with test PRR above this cut-off value – 51 segments in this case • There is a problem with this rule – what is it? • Manager chose to mail-out to 47 of these segments, plus another 24 segments
Develop Probability Model • Objective is to enable better decisions based on the test mail-out dataset • Outcome variable is the number of responses for a specified number of letters mailed, by segment • Suggests a binomial distribution
Model Development • Notation: • Ns = size of segment s (for s = 1, 2, …, S) • ms = number of test letters sent to members of segment s • Xs = number of purchases due to responses from segment s • Assume that all members of segment s have the same probability of purchase response ps, and they respond/purchase independently • Then Xs is a binomial random variable