490 likes | 620 Views
Consumer Behavior Prediction using Parametric and Nonparametric Methods. Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery, Rich Caruana, Christos Faloutsos. Outline. Introduction Data Economics Overview Baseline Models New Hybrid Models Results
E N D
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery, Rich Caruana, Christos Faloutsos
Outline • Introduction • Data • Economics Overview • Baseline Models • New Hybrid Models • Results • Conclusions and Future Work
Background • Retail chains are aiming to customize prices in individual stores • Pricing strategies should adapt to the neighborhood demand • Stores can increase operating profit margins by 33% to 83%
inelastic elastic Price Elasticity Q is quantity purchased P is price of product consumer’s response to price change
Assumptions • Independence • Substitutes: fresh fruit, other juices • Other Stores • Stationarity • Change over time • Holidays
Price of Product 1 Quantity bought of Product 1 Quantity bought of Product 2 Price of Product 2 Predictor Price of Product 3 Quantity bought of Product 3 Category . . . . . . “I know your customers” convert to ln space convert to original space Quantity bought of Product N Price of Product N “The” Model Need to multiply this across many stores, many categories.
Existing Methods • Traditionally – using parametric models (linear regression) • Recently – using non-parametric models (neural networks)
Take Advantage: use the known functional form to bias the NN • Build hybrid models from the baseline models new accuracy NN LR robustness Our Goal • Advantage of LR: known functional form (linear in log space), extrapolation ability • Advantage of NN: flexibility, accuracy
Datasets • weekly store-level cash register data at the product level • Chilled Orange Juice category • 2 years • 12 products • 10 random stores selected
Evaluation Measure • Root Mean Squared Error (RMS) • the average deviation between the predicted quantity and the true quantity
Models • Hybrids • Smart Prior • MultiTask Learning • Jumping Connections • Frozen Jumping Connections • Baselines • Linear Regression • Neural Networks
Baselines • Linear Regression • Neural Networks
Linear Regression • q is the quantity demanded • pi is the price for the ith product • K products overall • The coefficients a and bi are determined by the condition that the sum of the square residuals is as small as possible.
Neural Networks • generic nonlinear function approximators • a collection of basic units (neurons), computing a (non)linear function of their input • backpropagation
Neural Networks 1 hidden layer, 100 units, sigmoid activation function
Hybrids • Smart Prior • MultiTask Learning • Jumping Connections • Frozen Jumping Connections
Smart Prior Idea: start the NN at a “good” set of weights, help it start from a “smart” prior. • Take this prior from the known “linearity” • NN first trained on synthetic data generated by the LR model • NN then trained on the real data
Multitask Learning Idea: learning an additional related task in parallel, using a shared representation • Adding the output of the LR model (built over the same inputs) as an extra output to the NN • Make the net share its hidden nodes between both tasks • Custom halting function • Custom RMS function
Jumping Connections Idea: fusing LR and NN • change architecture • add connections which “jump” over the hidden layer • Gives the effect of simulating a LR and NN all together
Frozen Jumping Connections Idea: you have the linearity, now use it! • same architecture as Jumping Connections, plus really emphasizing the linearity • freeze the weights of the jumping layer, so the network can’t “forget” about the linearity
Models • Baselines: • Linear Regression • Neural Networks • Hybrids • Smart Prior • MultiTask Learning • Jumping Connections • Frozen Jumping Connections • Combinations • Voting • Weighted Average
Combining Models Idea: Ensemble Learning • Committee Voting – equal weights for each model’s prediction • Weighted Average – optimal weights determined by a linear regression model 2 baseline and 3 hybrid models (Smart Prior, MultiTask Learning, Frozen Jumping Conections)
Committee Voting • Average the predictions of the models
Weighted Average – Model Regression • Linear regression on baselines and hybrid models to determine vote weights
Normalized RMS Error • Compare model performance across stores • Stores of different sizes, ages, locations, etc • Need to normalize • Compare to baselines • Take the error of the LR benchmark as unit error
Conclusions • Clearly improved models for customer choice prediction • Will allow stores to price the products more strategically and optimize profits • Maintain better inventories • Understand product interaction
Future Work Ideas • analyze Weighted Average model • compare extrapolation ability of new models • use other domain knowledge • shrinkage model – a “super” store model with data pooled across all stores
Acknowledgements I would like to thank my advisors and my CALDling friends and colleagues
The Most Important Slide for this presentation and the paper: www.cs.cmu.edu/~eneva/research.htm eneva@cs.cmu.edu
References • Montgomery, A. (1997). Creating Micro-Marketing Pricing Strategies Using Supermarket Scanner Data • West, P., Brockett, P. and Golden, L (1997) A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice • Guadagni, P. and Little, J. (1983) A Logit Model of Brand Choice Calibrated on Scanner data • Rossi, P. and Allenby, G. (1993) A Bayesian Approach to Estimating Household Parameters