270 likes | 429 Views
Analytical Model Development & Implementation Experience from the Field. Bhavani Raskutti. Topics to be covered. Model development & implementation process Case Study 1: Corporate Customer Modelling at Telcos Case Study 2: Sales Opportunities for wholesalers Take-Home Points.
E N D
Analytical Model Development & ImplementationExperience from the Field Bhavani Raskutti
Topics to be covered • Model development & implementation process • Case Study 1:Corporate Customer Modelling at Telcos • Case Study 2: Sales Opportunities for wholesalers • Take-Home Points
Model Development & Implementation Process Solution enabling business to make strategic & operational decisions • Deployment Data Matrix • Model Validation • Data • Acquisition & Preparation • Presentation • DAP • Analytical • Problem • Definition • Mathematical • Modelling • (Algorithms) • Decision-making by users • Insights via GUI • Automation • Training • Documentation • IT Support • Business Problem • Model Development • Iterative • 90% DAP • APD • MM • MV • P • D
Topics to be covered • Model development & implementation process • Case Study 1:Corporate Customer Modelling at Telcos • Case Study 2: Sales Opportunities for wholesalers • Take-Home Points
Business Problem • Large drops in margins & revenue in corporate customer base • Partial churn of some corporate customers to other telcos • Lack of understanding of customer’s needs • Project will target revenue improvement opportunities with an indicative $15 million in sales by: • undertaking a rapid analysis of Customer data from core systems, including front of house, customer satisfaction and marketing for customers with a spend greater than $100k, excluding state and local government • outcomes are to be validated using artificial intelligence tools and rigorous methodology by … Verbatim from client’s presentation to stake holders • Using data analysis, increase revenue from corporate customers whose spend is > $100k
1. Analytical Problem Definition • Using data analysis, increase revenue from corporate customers whose spend is > $100k • Increase revenue from corporate customers by • Win-back (database look-up)? • churn reduction? • Up-sell/cross-sell to an existing customer? • Customer data • Relationship with customer • Customer satisfaction survey data • Service assurance data (customer complaints) • Demographic information about business customer • Industry segment information • Number of sites • Revenue from customer • Quarterly revenue from different products • Create models to predict up-sell based on revenue data 1. Analytical Problem Definition
2. Data Acquisition & Processing • Using revenue data, create models to predict customers likely to take up a specific product • Population: • Customers in a segment who currentlydo not have the product being modelled • Target or positive case definition: • Customers in the segment who take up the product within a time period • Predictors for modelling 2. Data Acquisition & Processing
Population and Target Definition • Let riP be the revenue from a customer on product P in billing period i • Population in periodiincludes all customers with r(i-1)P = 0 • Target or Product take-up in periodiiffr(i-1)P=0 and riP>TUMIN • TUMIN > 0 is the minimum take-up amount determined by the business TRAIN: r(i-1)P = 0 i-1 i Predictors Labels i i+1 Predict for riP = 0 2. Data Acquisition & Processing
Low take-up rates: not enough targets • Average number of take-ups for any product in any period is small • Large businesses • Less than 20 take-ups in a period for 70 of the 100+ products • Less than 10 take-ups for 45 products • Medium businesses • Less than 20 take-ups for 71 products • Less than 10 take-ups for 60 products • Reasons • “niche” products • Saturated products 2. Data Acquisition & Processing
Low take-up rates (Cont’d) Minimum take-ups(n) for modelling Minimum take-ups(n) for modelling • Aggregate data over multiple billing periods k • Producttake-up in periods i to i+k-1iffr(i-j)P=0 for j=1..k and j=0..k-1r(i+j)P >(kTUMIN)) Impact of data aggregation i-3 i-2 i-1 i TRAIN target: r(i-j)P = 0, j = 0..1 i-1 i i+1 i+2 Labels Predictors Predict if r(i+j)P = 0 or 1; j = 1..2 k=2 is useful 2. Data Acquisition & Processing
Low take-up rates (cont’d) • Use of time interleaving • Aggregate data with k=2 • Generate 3 sets of data moved forward by a period • Concatenate the 3 sets to get 3 times as much training data as for data aggregation with k=2 Impact of time interleaving i-4 i-3 i-2 i-1 i-3 i-2 i-1 i Labels T R A I N i-5 i-4 i-3 i-2 • Time interleaving enormously enhances modellability i-1 i i+1 i+2 Prediction Predictors 2. Data Acquisition & Processing
Predictors for Modelling Labels Predictors TRAIN target: r(i-j)P = 0, j = 0..1 • Revenue predictors used • r(i-3)Q – revenue for all products in billing period i-3 • Change in revenue from period i-3to i-2, r(i-3)Q - r(i-2)Q • Projected revenue for periodi-1, 2r(i-3)Q - r(i-2)Q • All revenue predictors used both as raw values, and normalised by total customer revenue • Binary predictors indicating churn/take-up in period i-2 • All continuous predictors converted to binary using 10 equisize bins • Overcomes the negative impact of large variance in revenues • Allows generation of non-linear models using linear techniques i-3 i-2 i-1 i 2. Data Acquisition & Processing
3. Mathematical Modelling • Imbalance in class sizes • Large businesses • 51 products with < 0.5% take-up on average • 25 products with < 0.1% take-up • Medium businesses • 74 products with < 0.5% take-up on average • 54 products with < 0.1% take-up • Maximisation of total take-up revenue • Identifying new high value customers is a priority • Extent of variance • Take-up amounts range from TUMIN to over a million dollars • Take-up amounts are not correlated with total revenue in previous billing periods 3. Mathematical Modelling
Imbalance in class sizes • m+ and m- : number of +ve and -ve examples • C+ and C- : weight of +ve and -ve examples • Use of Support Vector Machines (SVMs) instead of decision trees, neural nets or logistic regression • Based on Vapnik’s statistical learning theory • Maximises the margin of separation between two classes • Two different SVM implementations • SVMstd : equal weight to all training examples • SVMbal : class dependent weights so all take-ups have a higher weight than all non-take-ups 3. Mathematical Modelling
Identifying high value take-up • m+ and m- : number of +ve and -ve examples • C- : weight of -ve examples • TU(i): Take-up amount of the ith +ve example • C+(i): weight of the ith +ve example • SVMval: SVM with different weights for different positive (take-up) training examples • All take-up examples have a higher weight than all the non-take-up examples (as for SVMbal) • Each take-up training example has a weight proportional to the amount of take-up 3. Mathematical Modelling
4. Model Validation • Model assessment • Two tests for assessing quality of models (~4,000 models) • 10-fold cross validation tests to determine the best of the 3 SVMs • Tests in production setting to evaluate time interleaving • All tests on 30 product take-up prediction problems in 4 segments • Performance measures on unseen test set • Area under receiver operating characteristic curve (AUC) • Measures quality of sorting • Decision threshold independent metric • Value weighted AUC (VAUC) • Indicates potential revenue from the sorting • SVMval with time interleaved data is used for generating models • SVMval significantly more accurate than the other two • Time interleaving produces more stable models 4. Model Validation
Model Validation by Business • Predictive models identify more sales opportunities than that identified manually • 3 times as many in large businesses segment • 5 times as many in medium businesses segment • Results for 2 different regions in medium businesses • Region 1: Predictions for just 5 products generated 9 new opportunities with an increase in revenue of ~400K A$ • Region 2: Predictions identified opportunities that were already being processed by sales consultants • Predictive modelling spreads the techniques of good sales teams across the whole organisation 4. Model Validation
5. Presentation • Output in Excel Spread Sheet automatically generated • One customer list per segment with: • Take-up likelihood for all modelled products • Last quarter revenue for all products 5. Presentation
6. Deployment • Implementation in Matlab & C with output in Excel • Automatic quarterly updates of model after consolidated revenue figures are available • Models for ~50 products for each of the 4 business segments • Output delivered to business analytics group • Different cut-offs for different products/regions • Superimposition of other data for filtering/sorting • Use of output by sales consultants for renegotiating contracts with customers 6. Deployment
Project Timeline • Initial approach to data availability for pilot: 12 weeks • Data to pilot: 6 weeks • Model validation by business: 12 weeks • Pilot deployment (5 products, 1 segment): 6 weeks • Acceptance by business teams: over 9 months • Final deployment: 4 weeks • In operation for more than 8 years!! 6. Deployment
Key Success Factors • Willingness of stake-holders to try non-standard solutions • Innovative solution: Paper published in KDD 2005 • Target definition using multiple overlapping time periods to boost the number of rare events for modelling • Use of support vector machines for customer analytics • Being lazy • Scope change from 4 to 50 products • Scope change from 2 to 4 segments • Development of ~200 predictive models in one shot • No stale models in production • Working with business analysts to instigate change: • Product-centric modelling to customer-centric product packaging
Topics to be covered • Model development & implementation process • Case Study 1:Corporate Customer Modelling at Telcos • Case Study 2: Sales Opportunities for wholesalers • Take-Home Points
Case Study: Wholesale Sales • - Weekly SOH & sales for each store & SKU • - SKU master • - Store master • - Sales demand • - Similar products @ similar outlets have similar demand to sales relationship • - Anomaly may be due to lack of stock • Simple univariate regression in SQL • MV • - Self-serve report for each sales rep • - Presents list of products with sales opportunities • - Click thru’ to detailed graphs • - Absolute error • - Validate with retail • Perform comparisons & find anomalies with stock issues • Increase wholesale sales into major retailers • DAP • - Quantify demand • - Define normalised sell-rate • - Define a long term in-stock measure • - Define products & outlets that are similar • APD • MM • P • D
Case Study: Wholesale Sales (Cont’d) • R1 • R2 • Possible reasons for difference • Competing product at R2 • Pricing at R2 vs R1 • Lack of stock at R2 Sell Rate Demand In-stock % • Sell rate vsConsumer Demand plot • Each point is a store • R1 & R2 are comparable retailers • Values for the same product Demand
Case Study: Wholesale Sales (Cont’d) • - Weekly SOH & sales for each store & SKU • - SKU master • - Store master • - Sales demand • - Similar products @ similar outlets have similar demand to sales relationship • - Anomaly may be due to lack of stock • Simple univariate regression in SQL • MV • - SQL & Cognos • - Automatic weekly updates • - Training by corporate training team • - Support from IT helpdesk • - Self-serve report for each sales rep • - Presents list of products with sales opportunities • - Click thru’ to detailed graphs • - Absolute error • - Validate with retail • Perform comparisons & find anomalies with stock issues • Increase wholesale sales into major retailers • DAP • - Quantify demand • - Define normalised sell-rate • - Define a long term in-stock measure • - Define products & outlets that are similar • APD • MM • P • D
Topics to be covered • Model development & implementation process • Case Study 1:Corporate Customer Modelling at Telcos • Case Study 2: Sales Opportunities for wholesalers • Take-Home Points
Take-home points • Data acquisition & processing phase forms 80-90% of any analytics project • Business users are tool agnostic • R, SAS, Matlab, SPSS, … for statistical analysis • Tableau, Cognos, Excel, VB, … for presentation • Business adoption of analytics driven by • Utility of application • Validation of results by using real-life cases • Ease of decision-making from insights • Ability to explain insights