240 likes | 278 Views
Data Mining Research. David L. Olson University of Nebraska. Data Mining Research. Business Applications Credit scoring Customer classification Fraud detection Human resource management Algorithms Database related Data warehouse products claim internal data mining Text mining
E N D
Data Mining Research David L. Olson University of Nebraska
Data Mining Research • Business Applications • Credit scoring • Customer classification • Fraud detection • Human resource management • Algorithms • Database related • Data warehouse products claim internal data mining • Text mining • Data Mining Process
Personal (with others) • Business Applications • Introduction to Business Data Mining with Yong Shi [2006] • Qing Cao - RFM • Algorithms • Advanced Data Mining Techniques with DursunDelen [2008] • Moshkovich & Mechitov – Ordinal scales in trees • Data set balancing • Database related • encyclopedia • Text mining • Web log ethics • Data Mining Process • Ton Stam, DursunDelen
RFMwith Qing Cao, ChingGu, Donhee Lee • Recency • Time since customer made last purchase • Frequency • Number of purchases this customer made over time frame • Monetary • Average purchase amount (or total)
Variants • F & M highly correlated • Bult & Wansbeek [1995] Journal of Marketing Science • Value = M/R • Yang (2004) Journal of Targeting, Measurement and Analysis for Marketing
Limitations • Other attributes may be important • Product variation • Customer age • Customer income • Customer lifestyle • Still, RFM widely used • Works well if response rate is high
Data • Meat retailer in Nebraska • 64,180 purchase orders (mail) • 10,000 individual customers • Oct 11, 1998 to Oct 3, 2003 • ORDER DATA • ORDER AMOUNT • PRESENCE OF PROMOTION
Data • Nebraska food products firm • 64,180 individual purchase orders (by mail) • 10,000 individual customers • 11 Oct 1998 to 3 Oct 2003 • Data: • Order date • Order amount (price) • Whether or not promotion involved
Treatment • Used 5,000 observations to build model • To the end of 2002 • Used another 5,000 for testing • 2003
Correlations* - 0.01 significance; ** - 0.05 significance; *** - 0.001 significance
BALANCE CELLS • Adjusted boundaries of 5 x 5 x 5 matrix • Can’t get all to equal average of 8 • Lumpy (due to ties) • Ranged from 4 to 11
Alternatives • LIFT • Sort groups by best response • Apply your marketing budget to the most profitable (until you run out of budget) • LIFT is the gain obtained above par (random) • VALUE FUNCTION • (Yang, 2004) • Throw out F (correlated with M) • Use ratio of M/R • Logistic Regression • Decision Tree • Neural Network
Models • Regression: -0.4775 + 0.00853 R + 0.1675 F + 0.00213 M Test data: Correct 0.8230 • Decision Tree IF R ≤ 82 AND R≤ 32 YES (1567 right, 198 wrong) ELSE R> 32 AND F ≤ 3 AND M≤ 296 NO (285 right, 91 wrong) ELSE M > 296 YES (28 right, 9 wrong) ELSE F > 3 YES (729 right, 110 wrong) ELSE R > 82 YES (2391 right, 3 wrong) Test data: Correct 0.8678 • Neural Network Test data: Correct 0.8674