1 / 49

Data Mining

Data Mining. David L. Olson James & H.K. Stuart Professor in MIS University of Nebraska Lincoln. Definition. DATA MINING : exploration & analysis by automatic means of large quantities of data to discover actionable patterns & rules

jacob
Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining David L. Olson James & H.K. Stuart Professor in MIS University of Nebraska Lincoln David L. Olson

  2. Definition • DATA MINING: exploration & analysis • by automatic means • of large quantities of data • to discover actionable patterns & rules • Data mining a way to utilize massive quantities of data that businesses generate David L. Olson

  3. Political Data Mining Grossman et al., 10/18/2004, Time, 38 • 2004 Election • Republicans: VoterVault • From Mid-1990s • About 165 million voters • Massive get-out-the-vote drive for those expected to vote Republican • Democrats: Demzilla • Also about 165 million voters • Names typically have 200 to 400 information items David L. Olson

  4. Medical Diagnosis J. Morris, Health Management Technology Nov 2004, 20,22-24 • Electronic Medical Records • Associated Cardiovascular Consultants • 31 physicians • 40,000 patients per year, southern NJ • Data mined to identify efficient medical practice • Enhance patient outcomes • Reduced medical liability insurance David L. Olson

  5. Mayo Clinic Swartz, Information Management Journal Nov/Dec 2004, 8 • IBM developed EMR program • Complete records on almost 4.4 million patients • Doctors can ask for how last 100 Mayo patients with same gender, age, medical history responded to particular treatments David L. Olson

  6. Retail Outlets • Bar coding & Scanning generate masses of data • customer service • inventory control • MICROMARKETING • CUSTOMER PROFITABILITY ANALYSIS • MARKET BASKET ANALYSIS David L. Olson

  7. FINGERHUT • Founded 1948 • today sends out 130 different catalogs • to over 65 million customers • 6 terabyte data warehouse • 3000 variables of 12 million most active customers • over 300 predictive models • Focused marketing David L. Olson

  8. Fingerhut • Purchased by Federated Department Stores for $1.7 billion in 1999 (for database) • Fingerhut had $1.6 to $2 billion business per year, targeted at lower-income households • Can mail 400,000 packages per day • Each product line has its own catalog David L. Olson

  9. Fingerhut • Uses segmentation, decision tree, regression, neural network tools from SAS and SPSS • Segmentation - combines order & demographic data with product offerings • can target mailings to greatest payoff • customers who recently had moved tripled their purchasing 12 weeks after the move • send furniture, telephone, decoration catalogs David L. Olson

  10. Data for SEGMENTATION cluster indices subj age income marital grocery dine out savings 1001 53 80000 wife 180 90 30000 1002 48 120000 husband 120 110 20000 1003 32 90000 single 30 160 5000 1004 26 40000 wife 80 40 0 1005 51 90000 wife 110 90 20000 1006 59 150000 wife 160 120 30000 1007 43 120000 husband 140 110 10000 1008 38 160000 wife 80 130 15000 1009 35 70000 single 40 170 5000 1010 27 50000 wife 130 80 0 David L. Olson

  11. Initial Look at Data • Want to know features of those who spend a lot dining out • INCLUDE AS MANY ACTIONABLE VARIABLES AS POSSIBLE • things you can identify • Manipulate data • sort on most likely indicator (dine out) David L. Olson

  12. Sorted by Dine Out cluster indices subject age income marital grocery dine out savings 1004 26 40000 wife 80 40 0 1010 27 50000 wife 130 80 0 1001 53 80000 wife 180 90 30000 1005 51 90000 wife 110 90 20000 1002 48 120000 husband 120 110 20000 1007 43 120000 husband 140 110 10000 1006 59 150000 wife 160 120 30000 1008 38 160000 wife 80 130 15000 1003 32 90000 single30 160 5000 1009 35 70000 single40 170 5000 David L. Olson

  13. Analysis • Best indicators • marital status • groceries • Available • marital status might be easier to get David L. Olson

  14. Fingerhut • Mailstream optimization • which customers most likely to respond to existing catalog mailings • save near $3 million per year • reversed trend of catalog sales industry in 1998 • reduced mailings by 20% while increasing net earnings to over $37 million David L. Olson

  15. Banking • Among first users of data mining • Used to find out what motivates their customers (reduce churn) • Loan applications • Target marketing • Norwest: 3% of customers provided 44% profits • Bank of America: program cultivating top 10% of customers David L. Olson

  16. CREDIT SCORING Bank Loan Applications Age Income Assets Debts Want On-time 24 55557 27040 48191 1500 1 20 17152 11090 20455 400 1 20 85104 0 14361 4500 1 33 40921 91111 90076 2900 1 30 76183 101162 114601 1000 1 55 80149 511937 21923 1000 1 28 26169 47355 49341 3100 0 20 34843 0 21031 2100 1 20 52623 0 23054 15900 0 39 59006 195759 161750 600 1 David L. Olson

  17. Characteristics of Not On-time Age Income Assets Debts Want On-time 28 26169 47355 49341 3100 0 20 52623 0 23054 15900 0 Here, DebtsexceedAssets Age Young IncomeLow BETTER: Base on statistics, large sample supplement data with other relevant variables David L. Olson

  18. CHURN • Customer turnover • critical to: • telecommunications • banks • human resource management • retailers David L. Olson

  19. Identify characteristics of those who leave Age Time-job Time-town min bal checking savings card loan years months months $ 27 12 12 549 x x 41 18 41 3259 x x x 28 9 15 286 x x 55 301 5 2854 x x x 43 18 18 1112 x x x 29 6 3 0 x 38 55 20 321 x x x 63 185 3 2175 x x x 26 15 15 386 x x 46 13 12 1187 x x x 37 32 25 1865 x x x David L. Olson

  20. Analysis • What are the characteristics of those who leave? • Correlation analysis • Which customers do you want to keep? • Customer value - net present value of customer to the firm David L. Olson

  21. Correlation Age Time Time min-bal check saving card loan Job Town Age 1.0 0.60.4-0.4 0.0 0.4 0.2 0.3 Job 1.0 0.9-0.6 0.1 0.60.9 -0.2 Town 1.0 -0.5 -0.1 0.30.50.4 Min-Bal 1.0 -0.2 0.30.6 -0.1 Check 1.0 0.5 0.2 0.2 Saving 1.0 0.90.3 Card 1.0 0.5 Loan 1.0 David L. Olson

  22. Mortgage Market • Early 1990s - massive refinancing • need to keep customers happy to retain • contact current customers who have rates significantly higher than market • a major change in practice • data mining & telemarketing increased Crestar Mortgage’s retention rate from 8% to over 20% David L. Olson

  23. Banking • Fleet Financial Group • $30 million data warehouse • hired 60 database marketers, statistical/quantitative analysts & DSS specialists • expect to add $100 million in profit by 2001 David L. Olson

  24. Banking • First Union • concentrated on contact-point • previously had very focused product groups, little coordination • Developed offers for customers David L. Olson

  25. CREDIT SCORING • Data warehouseincluding demand deposits, savings, loans, credit cards, insurance, annuities, retirement programs, securities underwriting, other • Statistical & mathematical models (regression) to predict repayment David L. Olson

  26. CUSTOMER RELATIONSHIP MANAGEMENT (CRM) • understanding value customer provides to firm • Kathleen Khirallah - The Tower Group • Banks will spend $9 billion on CRM by end of 1999 • Deloitte • only 31% of senior bank executives confident that their current distribution mix anticipated customer needs David L. Olson

  27. Customer Value Middle aged (41-55), 3-9 years on job, 3-9 years in town, savings account year annual purchases profit discounted net 1.3 rate 1 1000 200 153 153 2 1000 200 118 272 3 1000 200 91 363 4 1000 200 70 433 5 1000 200 53 487 6 1000 200 41 528 7 1000 200 31 560 8 1000 200 24 584 9 1000 200 18 603 10 1000 200 14 618 David L. Olson

  28. Younger Customer Young (21-29), 0-2 years on job, 0-2 years in town, no savings account year annual purchases profit discounted net 1.3 1 300 60 46 46 2 360 72 43 89 3 432 86 39 128 4 518 104 36 164 5 622 124 34 198 6 746 149 31 229 7 896 179 29 257 8 1075 215 26 284 9 1290 258 24 308 10 1548 310 22 331 David L. Olson

  29. Credit Card Management • Very profitable industry • Card surfing - pay old balance with new card • promotions typically generate 1000 responses, about 1% • in early 1990s, almost all mass-marketing • data mining improves (lift) David L. Olson

  30. LIFT • LIFT = probability in class by sample divided by probability in class by population • if population probability is 20% and sample probability is 30%, LIFT = 0.3/0.2 = 1.5 • best lift not necessarily best need sufficient sample size as confidence increases, longer list but lower lift David L. Olson

  31. Lift Example • Product to be promoted • Sampled over 10 identifiable segments of potential buying population • Profit $50 per item sold • Mailing cost $1 • Sorted by Estimated response rates David L. Olson

  32. Lift Data David L. Olson

  33. Lift Chart David L. Olson

  34. Profit Impact David L. Olson

  35. INSURANCE • Marketing, as retailing & banking • Special: • Farmers Insurance Group - underwriting system generating $ millions in higher revenues, lower claims • 7 databases, 35 million records • better understanding of market niches • lower rates on sports cars, increasing business David L. Olson

  36. Insurance Fraud • Specialist criminals - multiple personas • InfoGlide specializes in fraud detection products • similarity search engine • link names, telephone numbers, streets, birthdays, variations • identify 7 times more fraud than exact-match systems David L. Olson

  37. Insurance Fraud - Link Analysis claim type amount physician attorney back 50000 Welby McBeal neck 80000 Frank Jones arm 40000 Barnard Fraser neck 80000 Frank Jones leg 30000 Schmidt Mason multiple 120000 Heinrich Feiffer neck 80000 Frank Jones back 60000 Schwartz Nixon arm 30000 Templer White internal 180000 Weiss Richards David L. Olson

  38. Insurance Fraud • Analytics’ NetMap for Claims • uses industry-wide database • creates data mart of internal, external data • unusual activity for specific chiropractors, attorneys • HNC Insurance Solutions • workers compensation fraud • VeriComp- predictive software (neural nets) • saved Utah over $2 million David L. Olson

  39. TELECOMMUNICATIONS • Deregulation - widespread competition • churn • 1/3rd poor call quality, 1/2 poor equipment • wireless performance monitor tracking • reduced churn about 61%, $580,000/year • cellular fraud prevention • spot problems when cell phones begin to go bad David L. Olson

  40. Telecommunications • Metapath’s Communications Enterprise Operating System • help identify telephone customer problems • dropped calls, mobility patterns, demographics • to target specific customers • reduce subscription fraud • $1.1 billion • reduce cloning fraud • cost $650 million in 1996 David L. Olson

  41. Telecommunications • Churn Prophet, ChurnAlert • data mining to predict subscribers who cancel • Arbor/Mobile • set of products, including churn analysis David L. Olson

  42. TELEMARKETING • MCI uses data marts to extract data on prospective customers • typically a 2 month program • 20% improvement in sales leads • multimillion investment in data marts & hardware • staff of 45 • trend spotting (which approaches specific customers like) David L. Olson

  43. Telemarketing • Australian Tourist Commission • maintained database since 1992 • responses to travel inquiries on tours, hotels, airlines, travel agents, consumers • data mine to identify travel agents & consumers responding to various media • sales closure rate at 10% and up • lead lists faxed weekly to productive travel agents David L. Olson

  44. Telemarketing • Segmentation • which customers respond to new promotions, to discounts, to new product offers • Determine who • to offer new service to • those most likely to commit fraud David L. Olson

  45. Human Resource Management • Identify individuals liable to leave company without additional compensation or benefits • Firm may already know 20% use 80% of offered services • don’t know which 20% • data mining (business intelligence) can identify • Use most talented people in highest priority(or most profitable) business units David L. Olson

  46. Human Resource Management • Downsizing • identify right people, treat them well • track key performance indicators • data on talents, company needs, competitor requirements • State of Mississippi’s MERLIN network • 30 databases (finance, payroll, personnel, capital projects) • Cognos Impromptu system - 230 users David L. Olson

  47. CASINOS • Casino gaming one of richest data sets known • Harrah’s - incentive programs • about 8 million customers hold Total Gold cards, used whenever the customer spends money in the casino • comprehensive data collection • Trump’s Taj Card similar David L. Olson

  48. Casinos • Bellagio & Mandelay Bay • strategy of luxury visits • child entertainment • change from old strategy - cheap food • Identify high rollers - cultivate • identify those to discourage from play • estimate lifetime value of players David L. Olson

  49. ARTS • computerized box offices leads to high volumes of data • Identify potential consumers for shows • software to manage shows • similar to airline seating chart software David L. Olson

More Related