1 / 26

Welcome to the PMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL

Welcome to the PMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL. Modeling Promotional Response with Kernel Methods. Paul DuBose, Ph.D. VP Analytics, Principled Strategies paul.dubose@principledstrategies.com. Outline. Introduction to Promotional Response Background

infinity
Download Presentation

Welcome to the PMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to thePMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL

  2. Modeling Promotional Response with Kernel Methods Paul DuBose, Ph.D. VP Analytics, Principled Strategies paul.dubose@principledstrategies.com

  3. Outline • Introduction to Promotional Response • Background • Primary issues • Modeling Promotional Response • Modeling technologies • Kernel methods

  4. Promotional Response: Background • To understand the response to any promotional component • Need to model all major promotions simultaneously • Allows isolating the impact of specific marketing or sales activities • The major sales / marketing activities • Details • Samples • Professional Meetings • Professional advertising • Direct to consumer

  5. Promotional Response: Background (continued) • Key HCP attributes to model promotional response • Rx related variables • Number Rx in therapeutic class • TRx / NRx ratio • Rx distribution for competing products – information entropy • Payer access variables • % third party, % government and other summary stats • Census data • Median income in zip code, median rental cost in zip code • Segmentation information • Specialty

  6. Issues: Correlation • How do you assign a value for the independent influence of two activities, e.g. samples and details? Rx Samples Details

  7. Issues: Correlation (continued) • Problem is most severe in standard linear regression • High correlations leads to estimates with large confidence intervals • Symptom – model coefficients are of wrong sign • Model appears to fit the data but inaccurately estimates the result • Compromise to allow some bias but decrease the variance • Mean square error • Approach is called “regularization” in kernel methods technology & ridge regression in linear regression Unbiased with large variance Biased but small variance

  8. Issues: Outliers • Outliers can heavily influence curve-fitting algorithms. With samples, group practice effects show a number of high-writing HCPs with zero samples Rx Samples

  9. Issues: Sampling • There are two features to note about sampling activity that require it to be considered carefully • Sampling has different legal status • Excessive sampling is “buying business” and out of compliance • Sampling in excess can cause a decrease in Rx • Excessive detailing, DTC, and Professional Meetings do not cause a fall in Rx • Over-sampling cause a HCP to use samples in place of an Rx • Loss of Rx is called cannibalization

  10. Modeling Technologies

  11. Kernel Methods • Characteristics • Uses “kernels” to create high dimensional and non-linear feature space (derived variables) • Training incorporates generalization derived from statistical learning theory • Sufficiently rich complexity to solve very difficult problems • Solution is computationally efficient • Power of this approach • Provides strong generalization properties • Significant improvement over linear regression confidence limits which depend on one pre-specified hypothesis • Searches an entire hypothesis space • A modern, powerful method that outperforms most other systems in a wide variety of applications

  12. Kernel Methods: Intuition • Nonlinear pattern may appear linear in Feature Space • Not all input vectors are needed to support the final shape Support Vector o X X x x o X X O x X x O o x X O o o O O x O O o o Data Space Feature Space

  13. Kernel Methods: Example 1 • Distinguish between two spirals: blue versus red • Circles are training data • Plus signs are test data • Kernel method classification accuracy • 100% on training data • 100% on test data • Linear regression accuracy • For y = f(1,x,y,x2,y2,x*y) • 49% on train data • 48% test data • Guessing would give an expected accuracy of 50% Spiral Model Results

  14. Kernel Methods: Example 2 • Create a test case promotional response model • Compare performance of KM and Linear Regression • New Rx = Details Rx + Samples Rx; no noise

  15. Kernel Methods: Example 2 (continued) • Details and Samples from bivariate normal • Correlation ( Details, Samples ) = 0.8

  16. Kernel Methods: Example 2 (continued) • Linear Regression Model 1 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.994; mean abs error = 0.333 • Prediction of sample and detail response has room for improvement • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D2,S*D,D0.33)

  17. Kernel Methods: Example 2 (continued) • Linear Regression Model 2 – Simplify model and explore Ridge Regression • Collinearity from samples and details may create problem • Notice several model coefficients change when “k”, the ridge parameter changes • Using cross-validation, the best predictive model occurs when k = 0.00 • So no need to use ridge regression in this particular case • Note that variables S*D and D2 are removed from model Rx = f(S,S2,D,D0.33)

  18. Kernel Methods: Example 2 (continued) • Linear Regression Model 2 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.971; mean abs error = 0.453 • Prediction of samples improves, but prediction of details is not as good • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D0.33)

  19. Kernel Methods: Example 2 (continued) • Linear Regression Model 3 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.9997.; mean abs error = 0.178 • Prediction of samples improves, prediction of details good from 5 to 12 details • Ridge regression analysis showed no benefit for non-zero k parameter • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D2,D0.33)

  20. Kernel Methods: Example 2 (continued) • Kernel Methods model • Correlation ( Predicted Rx, Actual Rx ) = 0.9998; mean abs error = 0.062 • Prediction of response to samples and details close to actual data • Predict for details when samples = 0; for samples when details = 0

  21. Kernel Methods: Example 2 (continued) • Kernel Methods model 2 – After tuning model parameters • Correlation ( Predicted Rx, Actual Rx ) = 1.0000; mean abs error = 0.003 • Prediction of response to samples and details a bit closer to actual data • Predict for details when samples = 0; for samples when details = 0

  22. Kernel Methods: Example 2 (continued) • Linear Regression versus Kernel Methods model

  23. Kernel Methods Explained • Support Vector Machine (SVM) • Most common kernel method • Provides a non-linear regression method • Select relevant input variables; select specific kernel • Kernel creates a very high dimensional feature space with non-linear transformations of the raw input data • The Gaussian kernel, also named radial basis function kernel, is the most frequently used kernel for numerical data with SVMs • The Kernel Matrix is of dimension n by n, where n is the number of observations and the i, jth element for a Gaussian kernel is of the form • K(Xi,Xj) = exp(-sigma*||Xi – Xj||2) • Solve dual of Lagrangian for the regression • Because of convexity, a solution is guaranteed!

  24. Kernel Methods Explained (continued) • Dual Lagrangian formulated: • Prediction can be made as:

  25. Kernel Methods Modularity Modular Stages of Kernel Methods: Data Kernel Method Κ(X,Z) Pattern Algorithm Pattern Function f(x) = ΣαiK(xi,x) • Polynomial • Gaussian • Support Vector Machine • Principal Components

  26. Resources for Kernel Methods • Software • MatLab® is a good environment for kernel methods • A number of free machine learning software libraries are available including “Spider” • Weka is a good program to gain experience • Free program, go to http://www.cs.waikato.ac.nz/~ml/weka/ • Primarily useful for small data sets • Internet • www.kernel-machines.org • Book • Kernel Methods for Pattern Analysis by Shawe-Taylor and Christianina, ISBN 0 521 81397 2 Hardback, 2004

More Related