810 likes | 928 Views
Welcome to the PMSA 2007 Special Tutorial Radisson Philadelphia, PA. Analyzing Impact of Sampling on Prescription Behavior. Dr. KANNAN SRINIVASAN Tepper School of Business, Carnegie Mellon University Dr. PAUL DuBOSE Principled Strategies. Q & A. Contact 412 260 2411 (kannans@cmu.edu).
E N D
Welcome to thePMSA 2007 Special Tutorial Radisson Philadelphia, PA
Analyzing Impact of Sampling on Prescription Behavior Dr. KANNAN SRINIVASAN Tepper School of Business, Carnegie Mellon University Dr. PAUL DuBOSE Principled Strategies
Q & A • Contact 412 260 2411 (kannans@cmu.edu)
Tutorial: Sampling Optimization Paul DuBose, Ph.D. VP Analytics, Principled Strategies paul.dubose@principledstrategies.com
Tutorial Outline • Introduction to Sampling • Sampling goals and impact • Sampling economics • Modeling Sampling • Input data, group practices • Survey of modeling choices • Validating Models • General concepts • Example of validating a sampling model • Applying the Model • Developing optimal plan • Practical considerations in applying plan
Section One • Introduction to Sampling • Sampling goals and impact • Sampling economics • Modeling Sampling • Input data, group practices • Survey of modeling choices • Validating Models • General concepts • Example of validating a sampling model • Applying the Model • Developing optimal plan • Practical considerations in applying plan
Sampling theory – first principles • Sampling has influence on HCP prescribing behavior • Reduces HCP risk of prescribing product • Reduces patient risk of starting new therapy • Low sample inventory results in fewer Rx • The influence of sampling on prescriptions generally diminishes with exposure • HCPs have less use for the next sample than the previous sample • Too many samples may induce an HCP to distribute samples as a substitute for prescribing the product – cannibalization • If the above first principles are true this implies: • The true relationship of Rx versus samples is smooth and doesn’t change direction exception at cannibalization • Would it make sense for a HCP to respond positively to 4, negatively to 6, and then positively again to 8 additional samples?
Plausible relationships between samples and Rx Quadratic Cubic Downward sloping Upward sloping
Differences between samples and other field activities • There are two features to note about sampling activity that require it to be considered carefully • Sampling is not considered a promotional activity in the same sense as detailing or journal advertising • Sampling in excess can cause a decrease in Rx • Over-promoting in other arenas such as detailing, DTC, and Promotional Meetings does not cause a fall in Rx • This happens because over-sampling can lead a HCP to use samples in place of an Rx
Over-Supply and Compliance • Sampling can not be used to buy business • Too many samples may be considered a “gift” • Going significantly beyond maximum Rx may be considered out of compliance • For controlled substances this is an issue of considerable importance • It is prudent to understand the appropriate supply level and to encourage the sales force to adhere at the HCP level
How do we find this shape for individual HCPs • Multiple methods exist for estimating the shape of any HCP sample response curve • Four major methods • Multiple Linear Regression • k-Nearest Neighbors • Kernel methods • Statistical inventory control • All methods when properly employed can find the response relationship • Max cumulative margin and max Rx • Smooth forms with appropriate shapes
Model Economics • To apply a sampling model, we must consider the underlying economics of sampling • Cost of a sample • Sample cost + package cost + shipping cost + % of detail cost • Value of an Rx • Impact of over/under supply of samples • There are two logical points around which to frame a sampling program • Provide samples that may maximize Rx • No concern for sample cost • Provide samples that may maximize cumulative marginal utility (until adding samples decreases cumulative utility) • Concern for cost
Rx Max Rx Samples Economics of sampling Max Margin Per Sample Margin Samples Cumulative Marginal Max Cumulative Margin Samples
Establishing relationship between samples & Rx • Several challenges exist in establishing a relationship between samples and Rx • Sampling activity is highly correlated with other marketing and Sales Professional activities • Current effect is confounded with the effects of previous activities • There is an indeterminate time period between the delivery of a sample and it ultimate use
The problem of correlation • How do you assign a value for the independent influence of both samples and details?? Rx Samples Details
Section Two • Introduction to Sampling • Sampling goals and impact • Sampling economics • Modeling Sampling • Input data, group practices • Survey of modeling choices • Validating Models • General concepts • Example of validating a sampling model • Applying the Model • Developing optimal plan • Practical considerations in applying plan
Topics Common to All Models • Outliers and Group Practices • Outliers or data anomalies can be a problem for any modeling technique • Some methodologies are less sensitive to outliers • In a group practice, one HCP may sign for samples while another may use the samples • Often an HCP with ostensibly 0 samples will be responsible for significant Rx • Selection of Explanatory Variables • HCP practice variables • Pharmaceutical company activities • Demographics and Census data
CommonTopics – GroupPractices • Group Practice has a strong impact on sample response modeling • Without knowledge of group practices • Input for modeling may be highly misleading • Sampling adherence estimates likely to be highly misleading • What to do about group practices? • Nothing – this is often done but will create misleading results • Infer group practices • Which potential HCPs can be eliminated from data to minimize bias? • In the absence of 3rd party Group Practice data, what steps can be taken to specifically identify likely group practice HCPs? • Purchase data to identify group practice HCPs
Group Practices – Dropping HCPs from modeling data to minimize group practice bias • HCPs with Zero Samples • Review of data shows numerous high writers with zero samples • Reasonable to assume a non-trivial proportion are in a group practice using samples that were signed for by another HCP • The zero sample point is only one point on the X axis • If there are many HCPs with zero samples (and there will be in many cases) then a highly weighted model value will have a biased Rx and create problems with model estimation • The response curve for non-zero samples is of primary interest • Consider eliminating HCPs with zero samples from universe of modeled HCPs • Can extrapolate to expected Rx when HCP is given 0 samples
HCPs with Zero Samples The Rx for HCPs with zero samples offers at best little information about HCP response to samples and at worst misleading information Rx Samples
Common Topics – Data Frequency • Data Frequency: Bi-weekly, Monthly, Quarterly? • Not all data available on bi-weekly basis so very difficult to combine data sources for a bi-weekly model • Many / most models are produced using monthly data for details, samples, Rx and other data • For lower decile HCPs even this data frequency is problematic • High variation between months • For nearest neighbor models where the goal is to find similar HCPs it makes sense to consider grouping data into 3 month bins
Common Topics – Selection of Key Explanatory Variables • HCP Practice • Number of market Rx • Information on payer access such as % third party, % government • Information on relative share between competitors – for example entropy • TRx / NRx ratio • Pharmaceutical Company Activities • Details, both primary and secondary • Meeting Events • Demographics • Specialty, State • Census Data • Average income and average rental cost in HCP zip code • Concentration: Population per zip code divided by the number of prescribing HCPs in that zip code
Common Topics – Entropy • Information Entropy - Concept • Complicated word, but easy to compute and provides a very insightful HCP metric • Measures distribution of market share between competitive products • For example if 5 competing products • If HCP gives 20% of business to each product then HCP has no brand loyalty and entropy has a maximum value • If HCP gives 100% of business to one product then HCP has complete brand loyalty and entropy has a minimum value • InformationEntropy - Formula • Assume N products and each has a proportional share of business Pi • Entropy = - Σ ( Pi * log Pi ) • If Pi = 0, the Pi * log Pi is defined as 0
Common Topics – Parametric and Non-Parametric Models • Parametric Model • Analyst provides a formula that specifies the exact relationship between explanatory variables and response • Linear Regression is best known example of parametric model • Often an analyst will try many alternative models to find the best possible fit for the specific data • All assumptions regarding confidence intervals are violated when this happens • Each coefficient is assumed to be the same for all cases • It can be difficult to have sufficient information so that a model works well across a wide range of data • Outliers affect prediction model equally for all cases
Parametric Model Polynomial regression fits data points Rx Samples
Common Topics – Parametric and Non-Parametric Models • Non Parametric Model • Analyst provides the input variables and some model specific options • k-Nearest Neighbors and Kernel Methods are non-parametric • Most non-parametric models don’t have confidence intervals • However Kernel Methods has a very strong method for confidence limits • Impact of variables depends on other similar cases • Easier to fit a wide range of data • Outliers have minimal impact on predictions – when not close to prediction regions • Models can only be interpreted empirically, not by coefficients • View graphs and tables of outputs versus inputs
Non Parametric Model Local smoothing technique used to find shape of distribution Rx Samples
#1: Linear Regression Introduction • Presentation assumes audience has basic familiarity with Linear Regression • Method takes skill to use successfully for sampling response models • Primary issues • Selection of explanatory variables and data pre-processing • Non-linear response to explanatory variables • Interaction or synergy between explanatory variables • Correlation between explanatory variables • Carryover of impact from historical sampling, details and other explanatory variables • Primary strengths • Software widely available • Many analysts have high level of expertise • Management familiar with technology
Linear Regression - Introduction • In matrix format, create a linear model of form • y = X * b • y is an n by 1 vector of responses where n is the number of observations • X is an n by m matrix where m is number of explanatory variables in model • b is an m by 1 vector • An example of a formula in non-matrix format is: • TRx = b0 + b1*samples1 + b2* samples12 + b3*samples1*details1 + b4*samples2 + … • samples1 are samples in the most recent time period • samples2 are samples in the previous time period • Solving the linear model • Best linear unbiased estimator: b = (X’X)-1 * X’ y • Ridge regression: b = (X’X + lambda * I)-1 * X’ y • I is identity matrix and lambda is a user-specified parameter
Linear Regression – Process • Resolve outliers and group practice effect • Derive a rich set of input variables from the data • Interactions, square terms, and lag terms • Indicator variables for geographic-oriented promotions, specialties, etc. • Draw a data sample and select best combination of input variables • Find subsets of input variables with minimal correlation to find maximum number of moderately correlated explanatory variables • Use ridge regression to minimize distortion from correlation • Create variance regularization so small changes in input values will not create large changes in predicted values • A parameter lambda controls the tradeoff between model “complexity” and model accuracy
Linear Regression – Non-linear response to explanatory variables • Models must have the flexibility to capture real-world non-linearity in the data • Saturation • Points of inflection • Tipping-points • Models must include non-linear terms • A quadratic term for sampling is necessary to provide information on HCP saturation • Cubic terms can be useful but can also over emphasize outliers which the model must strain to fit – losing precision • Promotional events such as details and DTC advertising should also include non-linearity to show market saturation
Linear Regression – Interaction or synergy between explanatory variables • Interactions allow models to capture synergies between sampling activity and promotional variables • Sampling may be more effective due to detailing, Physician Meetings, or DTC advertising • Some DTC advertising recommends patients ask their HCP for a sample • Unique interactions may be helpful • Sampling and the lagged dependent variable • Interactions for both samples and a quadratic term of samples • Interactions between the lagged effects of dynamic variables • Use cross products of variables to model interactions • Interaction between X1 and X2 is modeled as X1 * X2
Linear Regression – Correlation between explanatory variables • Problem is most severe in standard linear regression • Including highly correlated variables leads to estimates that are unbiased but have large confidence intervals • Symptom – model coefficients are of wrong sign • Model appears to fit the data but inaccurately estimates the result of changes in the explanatory variable – defeating the model purpose • Compromise the objective function to allow a bit of bias but decrease the variance – mean square error Unbiased with large variance Biased but small variance
Linear Regression – Correlation between explanatory variables • Use ridge regression to minimize collinearity impact • Matrix Formula for Linear Regression • Y = Xβ • Standard matrix solution to linear regression • β = (X’X)-1 * X’Y • Ridge regression solution to linear regression • β = (X’X + λI)-1 * X’Y, where “I” is the identity matrix (diagonal elements all ones, other elements are zeros) • Moderates impact of unusual values; introduces bias but decreases variance of estimate • In practice, a number of small “λ” values are considered and impact on results observed • When R2 begins to drop, λ is beginning to impact result • Do coefficient values change markedly or coefficient signs change?
Linear Regression – Carryover of impact from historical sampling, details, etc. • Samples can have influence over multiple time periods • HCPs can hold samples in their closet for many months/quarters • Some approaches use a rigid lag distribution scheme • Influence falls over time but by how much – each HCP varies • Geometrically distributed lags • Assumed weighted such as 100% for current quarter, 75% for last quarter, 25% for observations two quarters back, etc. • Other approaches allow the model to determine the lag structure • Include additional month/quarter lags based on performance measure such as the adjusted R2
Linear Regression Result Actual HCP TRx sample curve from Linear Regression Model
#2: K-Nearest Neighbors Introduction • Presentation assumes audience has minimal familiarity with k-Nearest Neighbors, a non-parametric approach • Basic Method • For each HCP, find the most similar HCPs • Find relationship between samples and Rx for the similar HCPs • With similar practices, small sample inventories reduce Rx opportunities, large inventories create cannibalization • For each HCP, develop a response curve that fits the relationship between samples and Rx • Use response curve to optimally sample each HCP • Basic concept • Create a natural experiment that isolates the variables of interest • Interpolate based on similar cases
K-Nearest Neighbors - Introduction • Differences between k-NN and Clusters • Clustering divides entire population into N similar groups using a similarity measure • N may be in range of 30 to 60, then clusters may be further combined for marketing purposes • So if there are 100,000 HCPs and 25 clusters, then there will be an average of 4,000 similar HCPs per cluster • K-NN finds the closest N neighbors for each HCP • Typically N is between 50 to 150 for large populations with many variables • So if there are 100,000 HCPs, then k-NN must be applied 100,000 times to get info for 100,000 response models
k-Nearest Neighbors - Introduction Nearest Neighbors Rx Response to Samples (KBN)
K-Nearest Neighbors - Introduction • Strengths and Weaknesses almost inverse of linear regression • Primary issues • Algorithms not available in some popular software packages • Many analysts do not have experience with methodology • Managers are not familiar with methodology • Primary strengths • Works well for data with non-linear variables and synergistic variables • Works well for data with correlated variables • Can find and remove outlier data, where the outlying data is found in a highly dimensional space without the need for explicit models • Provides a natural upper limit on samples for each HCP • Easy to explain results, minimal black box
Example dimensions of similarity In measuring similarity all variables are given the same scale; for example 0 to 1
Measuring Similarity • Measure distance between two arrays of real numbers • Let X and Z be arrays of real numbers representing values for Dr. “X” and Dr. “Z” • Euclidean Distance: • D2 = (X-Z)’ * (X-Z) • Square the distance between all values, then sum the squares, then take the square root • Mahalanobis Distance – Reduce impact of collinearity • Let Σ be the covariance matrix of the similarity variables • D2 = (X-Z)’ * Σ-1 * (X-Z) • Distance is now like an ellipsoid • If two variables are independent their individual distances are added • If two variables are highly correlated the second individual distance has a minimal increment over the first distance
Mahalanobis Distance • Mahalanobis Distance (x, y) ≈ Mahalanobis Distance (x, z) • Some values of vector y that differ from vector x are highly correlated • If using Euclidean distance, E. Distance (x, y) > E. Distance (x, z) z y x
Response of TRx to Samples for nearest neighbors of an HCP Rx Samples
Impact of Outliers Outliers, especially those on the Y-axis (representing HCPs not sampled) can heavily influence curve-fitting algorithms Rx Samples
Kernel smoothed response curve after outliers removed Observations close to the focus point are weighted heavily, the weight decreases triangularly as the observation moves away from the focus point Rx Focus point Samples
Kernel Smoothed Response Curve A locally smoothed curve fits distribution & minimizes impact of outliers Local Minimum Rx Samples
Polynomial fitted line through kernel smoothed response curve We fit a polynomial line through the smoothed curve to remove any local minima or maxima TRx optimal Rx Samples
#3: Kernel Methods • Characteristics • Hypothesis space of linear functions in a high dimensional and non-linear feature space • Trained using optimization theory that incorporates generalization derived from statistical learning theory • Sufficiently rich complexity to solve very difficult problems • Solution is computationally efficient • Power of this approach • Provides strong generalization properties • Significant improvement over confidence limits used in linear regression which depend on one pre-specified hypothesis • Search an entire hypothesis space • A modern, powerful method that outperforms most other systems in a wide variety of applications