1 / 40

The CRISP Data Mining Process

The CRISP Data Mining Process. The Data Mining Process. Business understanding. Data evaluation. Data preparation. Data. Deployment. Modeling. Evaluation. Business Understanding. Project objectives. Project requirements. DM Problem Formulation. Preliminary Plan. Case Study.

Download Presentation

The CRISP Data Mining Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The CRISP Data Mining Process

  2. The Data Mining Process Business understanding Data evaluation Data preparation Data Deployment Modeling Evaluation Data Mining

  3. Business Understanding Project objectives Project requirements DM Problem Formulation Preliminary Plan Data Mining

  4. Case Study • Data mining project done for a large insurance company • Consider the use of data mining to improve understanding of customer databases • Led by the data warehousing team, which wanted to also improve their expertise Data Mining

  5. Business Objectives • Understand what coverage packages are of interest to a customer group • Targeting of new customers • Cross-selling opportunities to existing customers • Understand why a customer group terminates coverage • Know in advance what groups are likely to terminate • Understand what factors influence termination Data Mining

  6. What are the Goals? • The business goals • Improve customer retention • Increase cross-selling • Success criteria • Customer turnover rate • Amount of cross-selling Data Mining

  7. Data Mining Problems • Classify new and existing customers as either interested or not interested in a particular coverage • Classify existing customers as either likely or unlikely to terminate coverage Data Mining

  8. The Data Mining Process Business objectives Data evaluation Data preparation Data Deployment Modeling Evaluation Data Mining

  9. Data warehousing team Data Evaluation • Initial data collections • Data quality • Initial insights • Interesting subsets Data Mining

  10. Case Study: Data Evaluation • Data was extracted from select customer databases by company personnel • Coverage programs with few customers selected for pilot project • Five separate files extracted for five coverage programs Data Mining

  11. The Data Mining Process Business objectives Data evaluation Data preparation Data Deployment Modeling Evaluation Data Mining

  12. Data Preparation Raw Data Finished Data Set • Technical tasks: • Data selection • Attribute selection • Data cleaning Data Mining

  13. Case Study: Data Preparation • Some initial formatting of data in MS Excel • Cleaning of data file • Combine headers/instances • Add a new attribute: interest (yes/no) • Must create the no interest cases • End up with a CSV formatted file Data Mining

  14. Weka Data Mining Software • Data in CSV format loaded into Weka: • Data preprocessing • Attribute selection • Modeling • Classification • Clustering • Association rule mining • Visualization Data Mining

  15. Data Preprocessing in Weka • Initial data inspection • Missing values • Useless attributes • Numeric attributes as nominal • Some helpful Weka filters • RemoveUseless • ReplaceMissingValues Data Mining

  16. Data Preprocessing in Weka • Data reduction: • Instance dimension • RemovePercentage, and Resample filters • Attribute dimension • Remove redundant attributes • Remove irrelevant attributes • Identify most important attributes Data Mining

  17. Attribute Selection Methods • Three main methods used: • InfoGain • ChiSquared • Relief • Combined results from complimentary methods • Final pruning of attribute list to twenty attributes Data Mining

  18. Selected Attributes • Location • Tax State • Contract State • State Code • Zip Code Data Mining

  19. Selected Attributes • Size • Case Size Range • Industry • Industry Classification • Industry Classification Name • SIC Code Data Mining

  20. Selected Attributes • Timing • New Sale Flag • Decision Maker Effective Month • Decision Maker Effective Year • Next Renewal Month • Next Renewal Year Data Mining

  21. Selected Attributes • Internal • Agency Number • Office Name • Pricing Category Code • Product Line Name • Small Group Flag Data Mining

  22. Relevance of Attribute Selection • Improved modeling • Faster model induction • Higher accuracy • Easier to interpret models • Structural knowledge gained from the selection of attributes Data Mining

  23. Most Important Attributes • Whatattributes effect the purchasing decision of a customer group? • E.g., the five most important factor that determine if a customer group purchases a particular insurance coverage • Agency Number • Small Group Flag • Zip Code • Decision Maker Effective Year • Next Renewal Month Data Mining

  24. Customer Segmentation • Unique groups of customers • Similar characteristics • Similar behavior in terms of interest in coverage • For example, separate predictive models for customer segments for a particular type of insurance Data Mining

  25. Customer Segments Used for Modeling • Results • Three segments for one database • Two segments for two databases • One segment for two databases • Continue modeling for each segment independently Data Mining

  26. The Data Mining Process Business objectives Data evaluation Data preparation Data Deployment Modeling Evaluation Data Mining

  27. Modeling • Select modeling technique(s) • Calibrate modeling techniques • Make adjustments to data Data Mining

  28. Modeling • Mathematical models for predicting if a customer is interested in a coverage • Understand why a customer is interested • For example: If a customer’s state is Indianaand the office is Indianapolis_Office1then the customer is interested in Coverage_3 Data Mining

  29. Modeling Techniques • Three modeling techniques tried for predicting customer interest: • Decision trees • Artificial neural networks (ANN) • Support vector machines (SVM) • Decision trees have the advantage of transparency • ANN and SVM did not have significantly better prediction accuracy Data Mining

  30. Insurance Coverage Interest (Type 6) Small Group Flag Y N Product Line Name No Group_1 Group_2 Yes No Data Mining

  31. Insurance Coverage Interest (Type 7) Pricing Category Code Others A4 Branches omitted A2 Industry Classification Name Next Renewal Year Transportation_and Public_Utilities Legal_Services > 2002 Group_1 <= 2002 Group_2 Next Renewal Year Agency Number Yes No <= 430 > 430 > 2000 <= 2000 Yes No Yes No Yes No Data Mining

  32. Coverage Accuracy Type 1 84.0% Type 2 97.2% Type 3 98.3% Type 4 99.5% Type 5 88.4% Type 6 100% Type 7 76.3% Type 8 85.0% Type 9 94.8% Accuracy of Predicting Customer Interest Data Mining

  33. Modeling • Mathematical models for predicting if a customer will terminate coverage • Why do customers terminate a specific type of coverage? • What are the important factors in a customers decision to terminate coverage? Data Mining

  34. Who Terminates Type 3 Coverage? Correct for 95% of customers Customer Effective Year Coverage Effective Year Coverage Effective Year Terminated Next Renewal Month Active Terminated Active Terminated Active Data Mining

  35. Who Terminates Type 1 Coverage? • Decision tree based on: • Distribution number • Underwriting department number • Price category • Rate type • Rate Plan Year • Predicts 96.3% of terminations correctly Data Mining

  36. Model Accuracy Type 1 96.3% Type 2 96.5% Type 3 95.3% Type 4 88.9% Type 5 88.3% Accuracy of Predicting Termination Data Mining

  37. The Data Mining Process Business objectives Data evaluation Data preparation Data Deployment Modeling Evaluation Data Mining

  38. Evaluation • Data analysis results in a good model • Are business objectives being achieved? • Is there an important business issue that has not been considered? • Should the results be used? Data Mining

  39. The Data Mining Process Business objectives Data evaluation Data preparation Data Deployment Modeling Evaluation Data Mining

  40. Deployment • Incorporate the results in the organization’s decision making process • Report • Decision support system • Personalization of web pages • Repeatable data mining process Data Mining

More Related