1 / 21

Data Mining Research

Data Mining Research. David L. Olson University of Nebraska. Data Mining Research. Business Applications Credit scoring Customer classification Fraud detection Human resource management Algorithms Database related Data warehouse products claim internal data mining Text mining

arees
Download Presentation

Data Mining Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Research David L. Olson University of Nebraska

  2. Data Mining Research • Business Applications • Credit scoring • Customer classification • Fraud detection • Human resource management • Algorithms • Database related • Data warehouse products claim internal data mining • Text mining • Data Mining Process

  3. Personal (with others) • Business Applications • Introduction to Business Data Mining with Yong Shi [2006] • Qing Cao - RFM • Algorithms • Advanced Data Mining Techniques with DursunDelen [2008] • Moshkovich & Mechitov – Ordinal scales in trees • Data set balancing • Database related • encyclopedia • Text mining • Web log ethics • Data Mining Process • Ton Stam, DursunDelen

  4. RFMwith Qing Cao, ChingGu, Donhee Lee • Recency • Time since customer made last purchase • Frequency • Number of purchases this customer made over time frame • Monetary • Average purchase amount (or total)

  5. Variants • F & M highly correlated • Bult & Wansbeek [1995] Journal of Marketing Science • Value = M/R • Yang (2004) Journal of Targeting, Measurement and Analysis for Marketing

  6. Limitations • Other attributes may be important • Product variation • Customer age • Customer income • Customer lifestyle • Still, RFM widely used • Works well if response rate is high

  7. Data • Meat retailer in Nebraska • 64,180 purchase orders (mail) • 10,000 individual customers • Oct 11, 1998 to Oct 3, 2003 • ORDER DATA • ORDER AMOUNT • PRESENCE OF PROMOTION

  8. Data • Nebraska food products firm • 64,180 individual purchase orders (by mail) • 10,000 individual customers • 11 Oct 1998 to 3 Oct 2003 • Data: • Order date • Order amount (price) • Whether or not promotion involved

  9. Treatment • Used 5,000 observations to build model • To the end of 2002 • Used another 5,000 for testing • 2003

  10. Correlations* - 0.01 significance; ** - 0.05 significance; *** - 0.001 significance

  11. Data

  12. Count by RFM Cell

  13. Basic Model Coincidence MatrixCorrect0.6076

  14. BALANCE CELLS • Adjusted boundaries of 5 x 5 x 5 matrix • Can’t get all to equal average of 8 • Lumpy (due to ties) • Ranged from 4 to 11

  15. Balanced Cell DensitiesCorrect 0.8380

  16. Alternatives • LIFT • Sort groups by best response • Apply your marketing budget to the most profitable (until you run out of budget) • LIFT is the gain obtained above par (random) • VALUE FUNCTION • (Yang, 2004) • Throw out F (correlated with M) • Use ratio of M/R • Logistic Regression • Decision Tree • Neural Network

  17. LIFTEqual Groups

  18. V Value by Cell

  19. V Model Lift

  20. Models • Regression: -0.4775 + 0.00853 R + 0.1675 F + 0.00213 M Test data: Correct 0.8230 • Decision Tree IF R ≤ 82 AND R≤ 32 YES (1567 right, 198 wrong) ELSE R> 32 AND F ≤ 3 AND M≤ 296 NO (285 right, 91 wrong) ELSE M > 296 YES (28 right, 9 wrong) ELSE F > 3 YES (729 right, 110 wrong) ELSE R > 82 YES (2391 right, 3 wrong) Test data: Correct 0.8678 • Neural Network Test data: Correct 0.8674

  21. COMPARISONS

More Related