What’s in your wallet? Opportunity modeling approaches and applications

What’s in your wallet? Opportunity modeling approaches and applications Claudia Perlich Chief Scientist Formerly: IBM Research Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al.

Publications & Recognition 2009 Finalist in the INFORMS Edelman competition 2007 Data Mining Practice Prize at KDD 2007, “Predictive modeling for marketing”, Runner Up 2007 IBM Outstanding Technical Award, “Opportunity models and validation for the Market Alignment Program (MAP)” 2005 IBM Research Award for contributions to Market Alignment Program (MAP) • “Operations Research Improves Sales Force Productivity at IBM” R. Lawrence, C.Perlich, S.Rosset, et al. Forthcoming INFORMS Journal on Computing • “Analytics-driven solutions for customer targeting and sales force allocation”, J. Arroyo, M. Callahan, M. Collins, A. Ershov, I. Khabibrakhmanov, R. Lawrence, S.Mahatma, M. Niemaszyk, C. Perlich, S. Rosset, S. Weiss. IBM Systems Journal 46 (4) (2007) • “A Data Mining Case Study: Analytics-driven solutions for customer targeting and sales force allocation” R. Lawrence, C. Perlich, S. Rosset, I. Khabibrakhmanov, S. Mahatma, S. Weiss. Second Workshop on Data Mining Case Studies and Practice Prize at SIGKDD 2007 • “High Quantile Modeling for Customer Wallet Estimation with Other Applications” Perlich, C., S. Rosset, R. Lawrence, and B. Zadrozny, 13th SIGKDD International Conference on Knowledge Discovery and Data Mining 2007 • “Quantile Modeling for Marketing”, Perlich, C., S. Rosset and B. Zadrozny. Workshop on Data Mining for Business Applications at 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006 • “A New Multi-View Regression Approach with an Application to Customer Wallet Estimation” Merugu, S. S.Rosset and C. Perlich. 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006 • “Wallet Estimation Models” Rosset, S., C. Perlich, B. Zadrozny, S. Merugu, S. Weiss and R. Lawrence. International Workshop on Customer Relationship Management: Data Mining Meets Marketing, NYU 2005 • “Modeling Quantiles” Perlich, C., S. Rosset and B.Zadrozny. In Encyclopedia of Data Warehousing and Mining, Second Edition

Presentation Outline • Wallet Definitions and Business Considerations • Modeling Approaches • Evaluation of Wallet Models • Business Impact – Market Alignment Project (MAP)

Company Revenue IT Wallet IBM Sales What is Wallet/Opportunity? • Total amount of money that the customer (company) can spend in a certain product category in a given period Company Revenue IBM sales  IT wallet  Company revenue

Why Are We Interested in Wallet? • Customer targeting • Focus on acquiring customers with high wallet • Evaluate customers’ growth potential by combining wallet estimates and sales history • For existing customers, focus on high wallet, low share-of-wallet customers • Sales force management • Make resource assignment decisions • Concentrate resources on untapped • Evaluate success of sales personnel and sales channel by share-of-wallet they attain

Wallet Modeling Challenge • The customer wallet is never observed • Nothing to “fit a model” • Even if you have a model, how do you evaluate it? • Need a predictive approach from available data • Firmographics (Sales, Industry, Employees) • IBM Sales and transaction history

Existing Approaches to Wallet Modeling • Bottom up: learn a model for individual companies • Get “true” wallet values through surveys • Very expensive • Small, typically not representative sample • Unreliable because ill defined • Coarse level of IT categories • Top down: this approach was used by IBM Market Intelligence in North America (called ITEM) • Use econometric models to assign total “opportunity” to segment (e.g., industry  geography) • Assign to companies in segment proportional to their size • Completely Ad hoc without any validation

Company Revenue TOTAL SERVED REALISTIC IBM Sales Multiple Wallet Definitions • TOTAL: Total customer available budget in the relevant area (e.g., total IT) • Can we really hope to attain all of it? • SERVED: Total customer spending on IT products covered by IBM • Better definition for our marketing purposes • REALISTIC: IBM spending of the “best similar customers” REALISTIC  SERVED  TOTAL

We formulate the problem as Quantile Estimation • Imagine 1,000 customers with identical customer features • Consider the distribution of the IBM Sales to these customers: Best Customers IBM Sales Opportunity is High Quantile

E(s|x) REALISTIC Formally: Percentile of Conditional • Distribution of IBM sales s to the customer given customer attributes x: s|x ~ f,x • Two obvious ways to get at the pth percentile: • Estimate the conditional by integrating over a neighborhood of similar customers Take pth percentile of spending in neighborhood • Create a global model for pth percentile Build global regression models, e.g.,

Overview of analytical approaches ‘Ad HOC’ Optimization kNN -Industry - Size Quantile Regression Decomposition Evaluation and Validation - Quantile Loss - MAP Feedback General kNN - K - Distance - Features Model Form - Linear - Decision Tree - Quanting - Linear Model - Adjustment

Distance metric: Industry match Euclidean distance on firmographics and past IBM sales Scaling issung Neighborhood sizes (k): Neighborhood size has significant effect on prediction quality Prediction: Quantile of firms in the neighborhood Universe of IBM customers with D&B information K-Nearest Neighbor Industry Target company i Revenue Employees Neighborhood of target company Wallet Estimate Frequency IBM Sales

p=0.8 p=0.5 (absolute loss) Global Estimation: the Quantile Loss Function • The mean minimizes a sum of squared residuals: • The median minimizes a sum of absolute residuals. • The p-th quantile minimizes an asymmetrically weighted sum of absolute residuals:

Quantile Regression • Traditional Regression: • Estimation of conditional expected value by minimizing sum of squares: • Quantile Regression: • Minimize Quantile loss: • Implementation: • assume linear function , solution using linear programming quantile regression loss function

Linear Quantile Regression (Koenker)

Quantile Regression Tree • Motivation: • Identify a locally optimal definition of neighborhood • Inherently nonlinear • Adjustments of M5/CART for Quantile prediction: • Predict the percentile rather than the mean of the leaf • Splitting/pruning criteria: Quantile or squared error loss?

C1000 C2000 C3000 C4000 C5000 C6000 Prediction 1 1 0 0 0 0 250 1 1 1 0 0 0 350 Quanting • Transform the quantile regression into a series of classification • non-linearity, if non-linear classifiers are used • theoretical guarantee: if the classifiers minimize the expected classification error, the quanting algorithm minimizes the quantile loss • Training • Each classifier is trained to decide whether or not the conditional quantile is above a threshold T • Original observations are re-labeled and re-weighted to train each classifier appropriately similar to the quantile loss • Prediction • Find the threshold where the classifier predictions switch from one to zero

(Graphical model approach to SERVED Wallets) Historical relationshipwith IBM Company firmographics • Wallet is unobserved, all other variables are • Two families of variables --- firmographics and IBM relationship are conditionally independent given wallet • We develop inference procedures and demonstrate them • Theoretically attractive, practically questionable SERVEDWallet IT spendwith IBM

Empirical Evaluation of Quantile Estimation • Setup • Four domains with relevant quantile modeling problems • Performance on test set in terms of 0.9 quantile loss • Approaches: Linear quantile regression, Q-kNN, Quantile trees, Bagged quantile trees, Quanting • Baselines • Best constant • Traditional regression models for expected values, adjusted under Gaussian assumption (+1.28)

Performance on Quantile Loss Best result in BOLD, variance in parenthesis • Observations • Regression + 1.28 is not competitive (because the residuals are not normal) • Splitting criterion is irrelevant • Q-kNN is not competitive • Quanting (using decision trees) and bagged quantile tree perform comparably

Additional Insights • Irrelevance of splitting criterion • Good news! Because squared error is much more efficient • Reason: • SSE measures the decrease of the conditional variance • SSE measures the ‘goodness’ of the local neighborhood • Good estimate of the conditional distribution -> good quantile • Linear model does well on IBM and KDD-CUP98 • Match of model bias • Both domains have strong autocorrelation • Last years donation/revenue is a great predictor of this years • Hard for tree-based models to express linear relationships

Evaluating REALISTIC Wallet • We still don’t know the truth • Quantile loss only evaluates the ability to predict quantile – but is a quantile a good wallet? • Which quantile 80%, 90%, 99%? • Distribution is highly skewed • Most error measure are very sensitive to outliers • What is the right scale ? Log? • Even good survey data is not the truth • Not available on a IBM product level • Probably irrelevant for the REALISTIC wallet

Old Sales Process Use prior-year revenue as proxy for future revenue generation Assign quota based largely on recent revenue history New Sales Process UsingMAP ... Use OR models to develop forward-looking view of opportunity by client Assign quota based on future opportunity and productivity MAP: Market Alignment Program Re-deploying IBM sales resources Focused on Existing Relationship Focused on Future Opportunity

The MAP process and components IBM Sales Team Interviews MAP Workshops Model Estimates Expert Feedback MAP Web Interface Modeled Opportunity MAP Models Integrated Data Validated Opportunity Data Model Realign Sales Resources

Explanatory features are extracted from multiple sources Dun & Bradstreet (D&B) Data IBM Client Transactions Entity Matching Feature Extraction D&B Features IBM Transactional Features • Prior-year revenue in other product brands • Long-term revenue in other product brands • … • Industry • Revenue (Rank) • Employees • State • D&B Structure Code • … • Train model against current year revenue based on previous year • Apply model by rolling forward to current year and predicting future opportunity

MAP Validation and Expert Feedback Expert Validates Opportunity (log) Model Opportunity (log)

Observations • Many accounts are set for external reasons to zero • Exclude from evaluation since no model can predict the competitive environment • Exponential distribution of opportunities • Evaluation on the original (non-log) scale suffers from huge outliers • Experts seem to make percentage adjustments • Consider log scale evaluation in addition to original scale and root as intermediate • Suspect strong “anchoring” bias, 45% of opportunities were not touched

Evaluation Measures • Different scales to avoid outlier artifacts • Original: e = model - expert • Root: e = root(model) - root(expert) • Log: e = log(model) - log(expert) • Statistics on the distribution of the errors • Mean of e2 • Mean of |e| • Total of 6 criteria

Model Comparison Results We count how often a model scores within the top 10 and 20 for each of the 6 measures: (Anchoring) (Best)

MAP Experiments Conclusions • Q-kNN performs very well after flooring but is typically inferior prior to flooring • 80th percentile Linear quantile regression performs consistently well (flooring has a minor effect) • Experts are strongly influenced by displayed opportunity (and displayed revenue of previous years) • Models without last year’s revenue don’t perform well Use Linear Quantile Regression with q=0.8 in MAP 06

Scope and some of the tedious details • 3 Million customers • 20 Brands (Product categories) • 4 Markets • Annual model refresh • The Quantile is chosen for each brand and market separately based on market insights on IBM market share • Whitespace model for customers with no prior IBM revenue are build using the same methodology but only D&B features • Entity matching between IBM customer records and D&B hierarchy is HARD • Evaluation remains somewhat subjective and we collect feedback

g 2005 2006 2007 2008 , In 2008 MAP covered 50+ countries and ~100% of IBM revenue and opportunity • Resources shifted to high growth Markets and Accounts • Shifted resources performed >10 pts better

MAP output drives account segmentation and resource allocation decisions Invest High growth potential Core Growth Modest growth potential Resource implications • Shift resources to Core Growth and Invest Accounts • Reduce resource overlap • 8,000 sellers shifted (2006 – 2009 ) Sellers shifted Validated Revenue Opportunity Opportunistic Small Accounts Core Optimize Flat or declining Prior Year Actual Revenue

MAP drove significant revenue impact in 2008 $53B of Revenue Invest Core Growth 3,000 sellers shifted (2008) 30,000 sellers Validated Revenue Opportunity $9B of Revenue Core Optimize Opportunistic Prior Year Actual Revenue [3,000 Sellers] x [$2M Revenue / Seller] x [10% Performance Improvement] = $600M (2008 Revenue Impact)

MAP Take away • Interesting predictive modeling task that calls for an unorthodox loss function • Combination of data mining AND expert feedback • Integration into the annual sales management cycle • Significant effort on data collection and preparation • Many additional analytical tools were build on top of MAP • Territory definition and assignment • Quota assignment • Substantial impact on the bottom line

Questions?

What’s in your wallet? Opportunity modeling approaches and applications