310 likes | 318 Views
Explore key success factors, illustrations, and steps in the Data Mining process within the Business Intelligence context. Learn about data extraction, storage, visualization, and reporting for effective decision-making.
E N D
Data Mining Process, Key Success Factors, Illustrations
Data Mining in the BI Context Data Extraction Collecting / Transforming Data Storage Storing / Aggregating / Historising BusinessIntelligence Visualization Reporting / EIS / MIS Exploration OLAP Data Analysis Discovery Data Mining
What Is Data Mining?Business Definition • Deployment of business processes, supported by adequate analytical techniques, to: • Take further advantage of data • Discover relevant knowledge • Act on the results
CRISP-DM Data Understanding Business Understanding Data Preparation Modeling Deployment Evaluation Determine Business Objectives Background Business Objectives Business Success Criteria Situation Assessment Inventoryof Resources Requirements, Assumptions, and Constraints RisksandContingencies Terminology CostsandBenefits Determine Data Mining Goal Data Mining Goals Data Mining Success Criteria Produce Project Plan Project PlanInitial Asessmentof Tools andTechniques Collect Initial Data Initial Data Collection Report Describe Data Data Description Report Explore Data Data Exploration Report Verify Data Quality Data Quality Report Data Set Data Set Description Select Data Rationale forInclusion / Exclusion Clean Data Data Cleaning Report Construct Data Derived Attributes Generated Records Integrate Data Merged Data Format Data Reformatted Data Select Modeling Technique Modeling Technique Modeling Assumptions Generate Test Design Test Design Build Model Parameter Settings Models Model Description Assess Model Model AssessmentRevised Parameter Settings Evaluate Results Assessment of Data Mining Results w.r.t. Business Success Criteria Approved Models Review Process Review of Process Determine Next Steps List of Possible Actions Decision Plan Deployment Deployment Plan Plan Monitoring and Maintenance Monitoring and Maintenance Plan Produce Final Report Final Report Final Presentation Review Project Experience Documentation DOCUMENT EVERYTHING!
Data Mining Tasks • Summarization • Classification / Prediction • Classification, Concept learning, Regression • Clustering • Dependency modeling • Anomaly detection • Link Analysis
How Well Do They Do? 0-13136 Poor 21 13136-19453 Fair 91 19453-25769 Good 90 25769-32086 Excellent 39 32086+ Outstanding 15
Situation & Goal • Poor understanding of customers and behaviors • Short audit: • Nice DWH, only 2 years old, not fully populated • Limited data on purchases and subscriptions • Potential goals: • Associations of products that sell together • Segmentation of customers
Summarization / Aggregation • Revenue distribution • 80% generated by 41.5% of subscribers • 60% generated by 18.3% of subscribers • 42.9% generated by top 5 products • Simple customer classes • Over 65 years old most profitable • Under 16 years old least profitable • Birthdate filled-in for only about 10% of subscribers!
Product Association • About 21% of subscribers buy P4, P7 and P9 • P4 is most profitable product • P7 is ranked 6th • P9 is ranked 15th with only 2% of revenue • Several possible actions • Make a bundle offering of these products • Cross-sell from P9 to P4 • Temptation to remove P9 should be resisted
Clustering 30% of customers who buy a single yearly product !!!
Summary of Findings • Data Mining found: • A small percentage of the customers is responsible for a large share of the sales • Several groups of « strongly-connected » articles • A sizeable group of subscribers who buy a single article • Lessons learned: • First 2 findings: « we knew that! » (BUT: scientific confirmation of business observation) • 3rd finding: « we could target these customers with a special offer! » • Lack of relevant data: the structure is in place but not being used systematically
Lift Lift(c) = CR(c) / c Example: Lift(25%) = CR(25%) / 25% = 62% / 25% = 2.5 If we send to 25% of our prospects using the model, they are 2.5 times as likely to respond than if we were to select them randomly. 30 0 0 30,000
Expected ROI Assume: 200 seminars per year €0.41 stamp €200 per seminar Send half as many, same response (from 0.1% to 0.2% response rate)
Approach & Cost Fixed price: €5,000 Decision: No !?!
Eight Laws (I) • Business/domain objectives are the origin of every data mining solution • Business/domain knowledge is central to every step of the data mining process • Data preparation is more than half of every data mining process • The right model for a given application can only be discovered by experiment
Eight Laws (II) • There are always patterns • Data mining amplifies perception in the domain • The value of data mining results is not determined by the accuracy or stability of predictive models • All patterns are subject to change
The Right Expectation • Data Mining is unlikely to produce surprising results that will utterly transform a business. Rather: • Early results: insights about data and scientific confirmation of human experience/intuition • Beyond: steady improvement to an already successful organization • Occasionally: discovery of one rare/highly valuable piece of knowledge
The Right Organization • Data Mining is not sophisticated enough to be substituted for domain knowledge or for experience in analysis and model building. • Rather: • Data Mining is a joint venture • “… put teams together that have a variety of skills (e.g., statistics, business and IT skills), are creative and are close to the business thinking .”
Key Success Factors • Have a clearly articulated business problem that needs to be solved and for which Data Mining is the adequate technology • Ensure that the problem being pursued is supported by the right type of data of sufficient quality and in sufficient quantity • Recognize that Data Mining is a process with many components and dependencies • Plan to learn from the Data Mining process whatever the outcome
Tips (I) • Don’t wait to get started – the competition is only a mouse click away • Begin with the end in mind • It’s the decision maker, stupid! • Unless there’s a method, there’s madness • Better data means better results
Tips (II) • Twyman’s law: any statistic that appears interesting is almost certainly a mistake (double-check all findings) • Avoid the OLAP trap • Deployment is the key to data mining ROI • Champions train so they can win the race
Crawl, Walk, Run • Exploratory Workshop / Brainstorm • Identify potential profitable applications • Data Audit • Assess data quality and relevance • Identify shortcomings • Suggest ways to enrich data (internal and external) • Domain-relevant Case Studies (start small) • Refine list of applications to produce well-defined, actionable, domain-relevant case studies • Select 1 or more case studies as « pilots » • Scale-up