150 likes | 281 Views
Data Mining in Industry: Putting T heory into Practice. Bhavani Raskutti. Agenda. What do analysts in industry actually do? Who are our customers & colleagues? What resources do we use? Who uses analytics in Australian Industry? Case studies Take-home Points.
E N D
Data Mining in Industry:Putting Theory into Practice Bhavani Raskutti
Agenda • What do analysts in industry actually do? • Who are our customers & colleagues? • What resources do we use? • Who uses analytics in Australian Industry? • Case studies • Take-home Points
Business understanding of complex trends To make strategic & operational decisions • Data • Acquisition & Preparation • Presentation Data Matrix • Deployment • DAP • Problem • Definition • Mathematical • Modelling • (Algorithms) What do analysts in industry actually do? • Decision-making by users • Insights via GUI • Automation • Training • Documentation • IT Support • Business Problem • PD • MM • P • Initial Development • Iterative • 90% DAP • D
Customers of • Analytics Market Research Behavioural analysis psych/mktg/SocSc graduates • Analytics Business Intelligence Historical Reporting CS/IT graduates • Marketing • Design • Business/ Corporate • Information • Technology • Sales Data Mining Statistical analysis, machine learning Maths/Stats/Science graduates Who are our customers & colleagues? • Supply • Chain • Senior • Management
What resources do we use? Data Extraction • SQL: from databases such as Oracle, DB2, mySQL, … Exploratory/Visualisation • Tableau: Multi-dimensional visual analysis with ability to publish and connectivity to most databases • Qlikview: Very similar to Tableau, later entrant into Australia • Excel: Great for exploration, although businesses use it as the only analysis tool Statistical Modelling • Expensive commercial tools used in financial & telecommunications industry. • SAS: Industry leader with broad statistical service offering, but license is expensive • KXEN: Recent entrant, but innovative with particular focus on large datasets & automation. • Salford systems: Well established leader with focus on regression trees and explainable models. • SPSS, Statistica, Matlab: Niche players appealing to certain communities. • Open source or low priced data mining tools: • Weka is open source software issued under the GNU General Public License. • RapidMiner is available under a dual license: GNU licence or a proprietary license. • R is a free software environment for statistical computing and graphics. Needs compilation. Presentation • Cognos, Business Objects, Tableau, …
Who uses analytics in Australian industry? • Government, Utilities, Pharmaceuticals, Manufacturing, Web service providers • Consulting firms, Data mining vendors
Who uses analytics in Australian industry? • Government, Utilities, Pharmaceuticals, Manufacturing, Web service providers, … • Consulting firms, Data mining vendors, Market research firms, …
Case Study: Wholesale Industry • Simple univariate regression in SQL • - Sales demand • - Similar products @ similar outlets have similar demand to sales relationship • - Anomaly may be due to lack of stock • - Weekly SOH & sales for each store & SKU • - SKU master • - Store master • - Self-serve report in Cognos for each sales rep • - Presents list of products with opportunities • - Opportunities click through to detailed graphs showing demand, sales & stock position of the two products compared • Perform comparisons & find anomalies with stock issues • Increase wholesale sales into major retailers • DAP • - Quantify demand • - Define normalised sell-rate • - Define a long term in-stock measure • - Define products & outlets that are similar • PD • MM • P • D
Case Study: Wholesale Industry (Cont’d) • R1 • R2 Sell Rate Demand In-stock % Demand
Case Study: Wholesale Industry (Cont’d) • Simple univariate regression in SQL • - Sales demand • - Similar products @ similar outlets have similar demand to sales relationship • - Anomaly may be due to lack of stock • - Weekly SOH & sales for each store & SKU • - SKU master • - Store master • - Implementation in SQL & Cognos • - DataMartsfor reports updated weekly • - Documentation on intranet wiki • - Training by corporate training team • - Support from IT helpdesk • - Self-serve report in Cognos for each sales rep • - Presents list of products with opportunities • - Opportunities click through to detailed graphs showing demand, sales & stock position of the two products compared • Perform comparisons & find anomalies with stock issues • Increase wholesale sales into major retailers • DAP • - Quantify demand • - Define normalised sell-rate • - Define a long term in-stock measure • - Define products & outlets that are similar • PD • MM • P • D
Agenda • What do analysts in industry actually do? • Who are our customers & colleagues? • What resources do we use? • Who uses analytics in Australian Industry • Case studies • Take-home Points
Case Study: Telecommunications Industry • - Satisfaction survey • - Service assurance • - Demographics • - Quarterly revenue from different products for each customer • - SVMs to score with likelihood of take-up • - Weighting by value of take-up to find high value take-up • - Winning back customers is hard • - Churn is hard to identify and harder to prevent • - Upsell to existing customers increases retention & revenue • - Implementation in Matlab & C • - Different predictive models for over 50 products in 4 segments • - Automatic updates every quarter • - Used by sales consultants to re-negotiate contracts • Excel spread sheet with potential customer list • - Take-up likelihood for all modelled products • - Last quarter revenue for all products • Increase revenue from business customers • Create models to predict customers likely to take up a product soon • Win-back? • Stop churn? • Upsell? • DAP • Imbalanced data – too few examples of take-up for most products • - Data aggregation & Interleaving • Comparable predictors from revenue • - Raw, change from previous, projected • - Use values as is & normalised • - Binarise using 10 equi-size bins • PD Labels i-5 i-4 i-3 i-2 T R A I N • MM i-4 i-3 i-2 i-1 • P i-3 i-2 i-1 i Prediction Predictors i-1 i i+1 i+2 • D
Case Study: Telecommunications Industry (Cont’d) • Evaluation: Piloted predictive modelling in 2 different regions • Region 1: 9 new opportunities from just 5 products with an increase in revenue of ~400K A$ • Region 2: Opportunities identified were already being processed by sales consultants • Conclusion: Predictive modelling better than previous manual process • Identifies more opportunities • Spreads techniques of good sales teams across the whole organisation • Deployed in 2004 & still operational • For more details, refer to “Predicting Product Purchase Patterns for Corporate Customers” by Bhavani Raskutti & Alan Herschtal in Proceedings of KDD’05, Chicago, Illinois, USA
Take-home points • Data acquisition & processing phase forms 80-90% of any analytics project • Business users are tool agnostic • R, SAS, Matlab, SPSS, … for statistical analysis • Tableau, Cognos, Excel, VB, … for presentation • Business adoption of analytics driven by • Utility of application • Ease of decision-making from insights • Ability to explain insights