250 likes | 680 Views
Comparison of Classification Methods for Customer Attrition Analysis. Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104 thu@cis.drexel.edu. Outline. Introduction of the Business Problem Data Selection and Data Processing Data Mining Model Development Process
E N D
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104 thu@cis.drexel.edu
Outline • Introduction of the Business Problem • Data Selection and Data Processing • Data Mining Model Development Process • Data Mining Findings • Q & A
Data Mining for Customer Attrition Analysis In the financial industry, data mining has been applied successfully in determining: • Target-oriented campaign • Identify and understand customer segment: attriter vs. loyal customers, profitable customers vs. regular • Identify cross-sell, up-sell opportunity increase the wallet-share of the customers • Risk analysis for loan applications, credit fraud detection • Finance planning and asset evaluation
Customer Attrition Analysis The goal of attrition analysis is to identify a group of customers who have a high probability to attrite, and then the company can conduct marketing campaigns to change the behavior in the desired direction (change their behavior, reduce the attrition rate).
Business Problem • Our client is one of the largest banks in the world • This attrition analysis project related to one type of credit load service, Over 750,000 customers currently use this service with $1.5 billion in outstanding, every month, about 5,700 customer close their accounts/ transfer to other banks mostly due to rate, credit line, and fees
Problem Definition • Slow attriters: Customers who slowly pay down their outstanding balance until they become inactive. • Fast attriters: Customers who quickly pay down their balance and either lapse it or close it via phone call or write in.
Data Mining Tasks • Utilizing data on accounts that remained continuously open in the last 4 months, predict, with 60 days in advance notice, the likelihood that a particular customer will opt to voluntarily close his/her account either by phone or write-in. • Utilizing data on accounts that remained continuously open in the last 4 months, predict, with 60 days advance notice, the likelihood that a particular customer will have his account transferred to a competing institution. The account may or may not remain open.
Challenging issues in our project • Data highly skewed: 3% attriters vs 97% regular customers • Time-series data: our data warehouse has the past 12 month credit loan service information, High dimensions: 850 attributes for each customers • Lots of dirty data and missing values in the records
Data Mining Process for Customer Attrition Analysis • Problem definition: formulation of business problems in the area of customer retention. • Data review and initial selection • Problem formulation in terms of existing data • Data gathering, cataloging and formatting • Data Processing: (a) Data cleansing, data unfolding and time-sensitive variable definition, target variable definition, (b) Statistical analysis, (c) Sensitivity analysis, (d) Feature selection, (d) Leaker detection • Data modeling via classification model: Decision Trees, Neural Networks, Bayesian Networks, an ensemble of classifiers • Result review and analysis: use the data mining model to predict the likely attriters among the current customers • Result Deployment: target the likely attriters (called rollout)
Data Source • Data Warehouse: Credit Card Data Warehouse containing about 200 product specific fields • Third Party Data : A set of account related demographic and credit bureau information • Segmentation files :Set of account related segmentation values based on our client's segmentation scheme which combines Risk, Profitability and External potential • Payment Database :Database that stores all checks processed. The database can categorize source of checks
Data Processing Goals • Reflects data changes over time. • Recognizes and removes statistically insignificant fields • Defines and introduces the "target" field • Allows for second stage preprocessing and statistical analysis.
Data Processing Steps • Time series "unrolling" • Target value definition • First stage statistical analysis • Field sensitivity analysis and field reduction • Files set generation
Data Mining Algorithms for Attrition Analysis • Boosted Naïve Bayesian (BNB) • NeuralWare Predict (a commercial neural network from NeuralWare Inc) • Decision Tree (based on C4.5 with some modification) • Selective Naïve Bayesian (SNB). • An ensemble of classifier of the above four methods
Classification accuracy is not a proper measure for attrition analysis • The goal of attrition analysis is not to to predict the behavior of every customer, but to find a good subset of customers where the percentage of attriters is high • Classification error (false positive, false negative) have different economic consequence in attrition analysis, need to be treated differently
Criterion for Attrition Analysis: Lift • Lift rather than classification accuracy is a better measure for the attrition analysis, a lift reflects the redistribution of responders in the testing set after the testing examples are ranked • lift can be calculated by looking at the cumulative targets captured up to p% as a percentage of all targets and dividing by p%. For example, the top 10% of the sorted list may contain 35% of likely attriters, then the model has a lift of 35/10=3.5.
Field Test Try to verify the following two points: • the top percentage of the customer attrition list does contain concentrated attriters • the data mining based marketing approach is effective for attrition analysis purpose.
Field Test Results Top 5% of 750000 customer = 37500 (output from the data mining prediction list), create 2 groups with 10000 customers each by random sampling from 37500 top customers from the prediction list sorted by the score Group 1: the marketing department contacted each customer and offered some incentive packages to encourage the customers to stay with the company Group 2: no action. Two months later, examines the customers in Group 1 and Group 2. Group 1 has a attrition rate 0.8%, while Group 2 has 10.6% (the average attrition rate is 2.2%). Lift is 4.8
Q & A ?