260 likes | 496 Views
Welcome to the PMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL. Modeling Promotional Response with Kernel Methods. Paul DuBose, Ph.D. VP Analytics, Principled Strategies paul.dubose@principledstrategies.com. Outline. Introduction to Promotional Response Background
E N D
Welcome to thePMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL
Modeling Promotional Response with Kernel Methods Paul DuBose, Ph.D. VP Analytics, Principled Strategies paul.dubose@principledstrategies.com
Outline • Introduction to Promotional Response • Background • Primary issues • Modeling Promotional Response • Modeling technologies • Kernel methods
Promotional Response: Background • To understand the response to any promotional component • Need to model all major promotions simultaneously • Allows isolating the impact of specific marketing or sales activities • The major sales / marketing activities • Details • Samples • Professional Meetings • Professional advertising • Direct to consumer
Promotional Response: Background (continued) • Key HCP attributes to model promotional response • Rx related variables • Number Rx in therapeutic class • TRx / NRx ratio • Rx distribution for competing products – information entropy • Payer access variables • % third party, % government and other summary stats • Census data • Median income in zip code, median rental cost in zip code • Segmentation information • Specialty
Issues: Correlation • How do you assign a value for the independent influence of two activities, e.g. samples and details? Rx Samples Details
Issues: Correlation (continued) • Problem is most severe in standard linear regression • High correlations leads to estimates with large confidence intervals • Symptom – model coefficients are of wrong sign • Model appears to fit the data but inaccurately estimates the result • Compromise to allow some bias but decrease the variance • Mean square error • Approach is called “regularization” in kernel methods technology & ridge regression in linear regression Unbiased with large variance Biased but small variance
Issues: Outliers • Outliers can heavily influence curve-fitting algorithms. With samples, group practice effects show a number of high-writing HCPs with zero samples Rx Samples
Issues: Sampling • There are two features to note about sampling activity that require it to be considered carefully • Sampling has different legal status • Excessive sampling is “buying business” and out of compliance • Sampling in excess can cause a decrease in Rx • Excessive detailing, DTC, and Professional Meetings do not cause a fall in Rx • Over-sampling cause a HCP to use samples in place of an Rx • Loss of Rx is called cannibalization
Kernel Methods • Characteristics • Uses “kernels” to create high dimensional and non-linear feature space (derived variables) • Training incorporates generalization derived from statistical learning theory • Sufficiently rich complexity to solve very difficult problems • Solution is computationally efficient • Power of this approach • Provides strong generalization properties • Significant improvement over linear regression confidence limits which depend on one pre-specified hypothesis • Searches an entire hypothesis space • A modern, powerful method that outperforms most other systems in a wide variety of applications
Kernel Methods: Intuition • Nonlinear pattern may appear linear in Feature Space • Not all input vectors are needed to support the final shape Support Vector o X X x x o X X O x X x O o x X O o o O O x O O o o Data Space Feature Space
Kernel Methods: Example 1 • Distinguish between two spirals: blue versus red • Circles are training data • Plus signs are test data • Kernel method classification accuracy • 100% on training data • 100% on test data • Linear regression accuracy • For y = f(1,x,y,x2,y2,x*y) • 49% on train data • 48% test data • Guessing would give an expected accuracy of 50% Spiral Model Results
Kernel Methods: Example 2 • Create a test case promotional response model • Compare performance of KM and Linear Regression • New Rx = Details Rx + Samples Rx; no noise
Kernel Methods: Example 2 (continued) • Details and Samples from bivariate normal • Correlation ( Details, Samples ) = 0.8
Kernel Methods: Example 2 (continued) • Linear Regression Model 1 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.994; mean abs error = 0.333 • Prediction of sample and detail response has room for improvement • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D2,S*D,D0.33)
Kernel Methods: Example 2 (continued) • Linear Regression Model 2 – Simplify model and explore Ridge Regression • Collinearity from samples and details may create problem • Notice several model coefficients change when “k”, the ridge parameter changes • Using cross-validation, the best predictive model occurs when k = 0.00 • So no need to use ridge regression in this particular case • Note that variables S*D and D2 are removed from model Rx = f(S,S2,D,D0.33)
Kernel Methods: Example 2 (continued) • Linear Regression Model 2 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.971; mean abs error = 0.453 • Prediction of samples improves, but prediction of details is not as good • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D0.33)
Kernel Methods: Example 2 (continued) • Linear Regression Model 3 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.9997.; mean abs error = 0.178 • Prediction of samples improves, prediction of details good from 5 to 12 details • Ridge regression analysis showed no benefit for non-zero k parameter • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D2,D0.33)
Kernel Methods: Example 2 (continued) • Kernel Methods model • Correlation ( Predicted Rx, Actual Rx ) = 0.9998; mean abs error = 0.062 • Prediction of response to samples and details close to actual data • Predict for details when samples = 0; for samples when details = 0
Kernel Methods: Example 2 (continued) • Kernel Methods model 2 – After tuning model parameters • Correlation ( Predicted Rx, Actual Rx ) = 1.0000; mean abs error = 0.003 • Prediction of response to samples and details a bit closer to actual data • Predict for details when samples = 0; for samples when details = 0
Kernel Methods: Example 2 (continued) • Linear Regression versus Kernel Methods model
Kernel Methods Explained • Support Vector Machine (SVM) • Most common kernel method • Provides a non-linear regression method • Select relevant input variables; select specific kernel • Kernel creates a very high dimensional feature space with non-linear transformations of the raw input data • The Gaussian kernel, also named radial basis function kernel, is the most frequently used kernel for numerical data with SVMs • The Kernel Matrix is of dimension n by n, where n is the number of observations and the i, jth element for a Gaussian kernel is of the form • K(Xi,Xj) = exp(-sigma*||Xi – Xj||2) • Solve dual of Lagrangian for the regression • Because of convexity, a solution is guaranteed!
Kernel Methods Explained (continued) • Dual Lagrangian formulated: • Prediction can be made as:
Kernel Methods Modularity Modular Stages of Kernel Methods: Data Kernel Method Κ(X,Z) Pattern Algorithm Pattern Function f(x) = ΣαiK(xi,x) • Polynomial • Gaussian • Support Vector Machine • Principal Components
Resources for Kernel Methods • Software • MatLab® is a good environment for kernel methods • A number of free machine learning software libraries are available including “Spider” • Weka is a good program to gain experience • Free program, go to http://www.cs.waikato.ac.nz/~ml/weka/ • Primarily useful for small data sets • Internet • www.kernel-machines.org • Book • Kernel Methods for Pattern Analysis by Shawe-Taylor and Christianina, ISBN 0 521 81397 2 Hardback, 2004