1 / 5

Science in Business Data Mining?

Science in Business Data Mining?. Background: support managerial decision making Is there a science to data mining (with CI-methods)? Outline Data Mining in Business & Management Rules established in Business practices vs. Data mining? Statistics vs. Data driven modelling A personal view

grover
Download Presentation

Science in Business Data Mining?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Science in Business Data Mining? • Background: support managerial decision making • Is there a science to data mining (with CI-methods)? Outline • Data Mining in Business & Management • Rules established in Business practices vs. Data mining? • Statistics vs. Data driven modelling • A personal view • How do develop meta-knowledge YES, but it depends(and it may be empirical Wizardry driven by efficiency rather than effectiveness!) Sven F. Crone, Lancaster University Management SchoolResearch Centre for Forecasting

  2. Business Data Mining? Churn Prediction • Main areas for Data Mining: • Finance: Credit risk (personal & corporate) • Marketing: Customer Relationship Management (=Direct Marketing, Database Marketing) DirectMarketing Credit Scoring adapted from Berry and Linoff (2004) and Olafson et al (2006) Sven F. Crone, Lancaster University Management SchoolResearch Centre for Forecasting

  3. Practitioners & Consultants use statistics Best practices Credit Scoring Cross-Selling Large & imbalanced sample Use large sample sizes Original (Imbalanced) class distribution … • Small & Balanced classes • Use 2000 of minority class • Use undersampling • Discretise all (!) variables • Binary dummies / WOE to capture non-linearity • Use Logistic regression Extensive use of expert domain knowledge  efficient solution ≠ best GAP A personal view: • Data selection is best using prior domain knowledge (use filters) • Pre-processing more important than method [Crone et al, 2006; Keogh 2002] • (Balanced) sampling & pre-processing is method dependent • Best practices exist & are domain dependent(e.g. homogeneous datasets in credit scoring) • Flat Maximum effect [Lovie & Lovie, 1986] Sven F. Crone, Lancaster University Management SchoolResearch Centre for Forecasting

  4. How do derive (meta)-knowledge? • Lessons from other disciplines: Time Series Forecasting • More ‘Evidence based methods” [Armstrong 2000] • Empirical Evidence • Conditions under which methods perform well (multiple hypothesis) • Domain specific Competitions (valid & reliable) • Multiple out-of-sample evaluations (≠ single fold, one origin) • Multiple homogeneous datasets from one domain • Use of valid benchmark methods & unbiased error measures • Honour the domain & decision context (active learning, cost sensitive) • Replications • Studies must allow replications – document all steps / parameters • STOP FINE-TUNING / MARGINAL EXTENSION OF SINGLE METHOD ON SINGLE TOY DATASET • Develop solutions for domain (Why make life harder?) • Where to start?  follow high impact approach! • Identify most prominent application domains (e.g. credit risk) • Select promising application domains for CI-methods • Get corporate sponsor & run competition • Analyse conditions (!) using meta-studies! • Embed findings as methodology in SOFTWARE Sven F. Crone, Lancaster University Management SchoolResearch Centre for Forecasting

  5. Literature • Ian Ayres (2007) Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart, Bantam • Thomas H. Davenport, Jeanne G. Harris (2007) Competing on Analytics: The New Science of Winning, Harvard Business School Press • Fildes, Nikolopoulos, Crone, Synthetos (2009) Forecasting and Operational Research – a Review, JORS, forthcoming • Finlay, Crone (under review), Sampling issues in Credit Scoring – the effect of sample size and sample distribution on predictive accuracy, EJOR • Keogh, Kasetty (2002, 2004) On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration, SIGKDD’02 & Data Mining Journal

More Related