320 likes | 534 Views
Kinshuk Jerath, Carnegie Mellon University Peter S. Fader, Wharton/Univ. of Penn Bruce G. S. Hardie, London Business School. Customer-Base Analysis Using Aggregated Data ( Or: The Joys of RCSS ). Customer-Base Analysis. Faced with a customer transaction database, we may wish to determine
E N D
Kinshuk Jerath, Carnegie Mellon University Peter S. Fader, Wharton/Univ. of Penn Bruce G. S. Hardie, London Business School Customer-Base Analysis Using Aggregated Data (Or: The Joys of RCSS)
Customer-Base Analysis Faced with a customer transaction database, we may wish to determine • The level of transactions we expect in future periods, both collectively and individually • Key characteristics of the cohort (e.g., degree of heterogeneity in behavior) • Formal financial metrics (such as “customer lifetime value”) to guide resource allocation decisions
Typical Data Structure Models for customer-base analysis typically require access to individual-customer-level data
Barriers to Disaggregate Data Many firms may not (be able to) keep detailed individual-level records: • General weaknesses with the firm’s information systems capabilities • Corporate information silos make data integration difficult • Wariness given high-profile stories on data loss • Data protection laws (with bans on trans-border data flows) “Anonymizing” (and other statistical disclosure control methods) costly and potentially ineffective
Key Challenges • What data formats are • easy to create/maintain • privacy preserving • Can we adapt our “tried and true” models to accommodate these data limitations but still work well? • How much do we lose in the process?
How would we proceed if we had disaggregate data? 12
“Buy Till You Die” Model Transaction Process (“Buy”) • While “alive”, a customer purchases randomly around his mean transaction rate • Transaction rates vary across customers Dropout Process (“Till You Die”) • Each customer has an unobserved “lifetime” • Dropout rates vary across customers
The Pareto/NBD Model(Schmittlein, Morrison, and Colombo 1987) Transaction Process: While active, number of transactions made by a customer follows a Poisson process with transaction rate λ Transaction rates are distributed gamma(r,α) across the population Dropout Process: Each customer has an unobserved lifetime of length τ, which is distributed exponential with dropout rate μ Dropout rates are distributed gamma(s,β) across the population Astonishingly good fit and predictive performance 15
The Pareto/NBD works very well… …given individual-level (disaggregate) data. 16
Same assumptions as for the usual Pareto/NBD implementation Calculate purchase probabilities over discrete intervals: P(X(t, t +1)) = x, P(X(t +1, t +2)) = x,P(X(t +2, t +3)) = x, etc. Apply to RCSS histograms and use standard MLE estimation Parameter estimation is fast, stable, and robust All of the usual Pareto/NBD diagnostics (e.g., “P(Alive)”) can be obtained from the parameter estimates Pareto/NBD using RCSS data
Do We Need All Five Years of Data? Calibrate the model on years 1-3 only, predict for years 4 and 5.
Customer-Base Analysis Using Repeated Cross-Sectional Summary (RCSS) Data Under more general conditions, what is the “information loss” by aggregating data? Under what conditions can a model built using aggregated data accurately mimic its individual-level counterpart? How much aggregated data is required to do this job well? 21
Research Design • Manipulate the four parameters of the Pareto/NBD • r, s = 0.5, 1.0, 1.5 • α, β = 5, 10, 15 We have 34 = 81 “worlds” • For each “world,” simulate 104 weeks of data for five synthetic panels of 2500 customers (first 78 weeks for calibration, last 26 weeks for holdout) • Fit the Pareto/NBD model to the raw transaction data – obtain disaggregate LL and parameters • “Backward-looking” (“Chopping it up”) analysis • “Forward-looking” (“Build as you go”) analysis
“Backward-Looking” Analysis How many cross-sectional summaries should be created? (How to “chop it up?”) • One 78-week histogram? • Two 39-week histograms? • Three 26-week histograms? • … • Six 13-week histograms? For each of the six aggregation conditions, fit the Pareto/NBD to the resulting RCSS data, and: • Compare RCSS parameter estimates to the disaggregate benchmarks • Evaluate the disaggregate LL functions using the RCSS parameter estimates and compare to the disaggregate benchmark LL • Evaluate the fit of the predicted histograms from RCSS and disaggregate parameter estimates to the actual holdout histograms
“Forward-Looking” Analysis How many quarterly (13-week) histograms are required? (How many to “build as you go?”) • One (total 13 weeks)? • Two (total 26 weeks)? • Three (total 39 weeks)? • … • Six (total 78 weeks)? For each of the six “number of histogram” conditions, fit the Pareto/NBD to the resulting RCSS data, and: • Compare RCSS parameter estimates to the disaggregate benchmarks • Evaluate the disaggregate LL functions on the full data using the RCSS parameter estimates and compare to the disaggregate benchmark LL • Evaluate the fit of the predicted histograms from RCSS and disaggregate parameter estimates to the actual holdout histograms
Summary of Results • Using three or more quarters always provides the same performance as disaggregate data in terms of: • Parameter recovery • In-sample LL • Out-of-sample predictions
Conclusions • We can estimate the Pareto/NBD using RCSS data; the findings from the Tuscan Lifestyles study are generalizable • Useful/interesting model diagnostics still emerge – even in the absence of any individual-level data • Three cross-sections are generally sufficient
Other Desirable Properties • Just the percentage of total customers in each bucket is sufficient – don’t even need actual numbers • Data can be “aperiodic” (they just have to be “repeated”) • Histograms can be of different time lengths, e.g., 3-month + 6-month + 4-month • Histograms can be missing, e.g., Qtr. 1, –, Qtr. 3, Qtr. 4 • Data management/storage benefits