1.12k likes | 1.54k Views
Data Mining for Customer Relationship Management. Qiang Yang Hong Kong University of Science and Technology Hong Kong. CRM. Customer Relationship Management: focus on customer satisfaction to improve profit Two kinds of CRM
E N D
Data Mining for Customer Relationship Management Qiang Yang Hong Kong University of Science and Technology Hong Kong
CRM Customer Relationship Management: focus on customer satisfaction to improve profit Two kinds of CRM • Enabling CRM: Infrastructure, multiple touch point management, data integration and management, … • Oracle, IBM, PeopleSoft, Siebel Systems, SAS… • Intelligent CRM: data mining and analysis, customer marketing, customization, employee analysis • Vendors/products (see later) • Services!!
The Business Problem: Marketing • Improve customer relationship • Actions (promotion, communication) changes • What actionsshould your Enterprise take to change your customers from an undesired status to a desired one • How to cross-sell? • How to segment customer base? • How to formulate direct marketing plans? • Data Mining and Help!
分析 挖掘 知识 转变 模型 预处理 处理后数据 抽取 处理前 数据 目标 数据 原数据 数据挖掘全过程 所有客户 中小客户 大客户 个人大客户 集团大客户
Data Mining 主要算法 • 分类分析 (Classification) • 年轻客户,工作不满三年的 易流失 • 聚类分析 (Clustering) • 类1:{年轻客户,工作不满三年,未婚} • 类2:{年轻客户,工作不满三年,已婚} • 关联规则分析 (Association Rule) • 多元回归模型 (Regression) • 决策树模型 (Decision Tree) • 神经网络模型 (Neural Networks )
数据挖掘的主题分析 价值 流失 增值 服务 异常 推销 挽留 信用度 申请 信用 询问 使用 提升 抱怨 离开 客户进入 客户管理 客户离开 竞争 新家 工作 结婚 生子 退休 升职
Customer Attrition • 客户流失(Customer Churn) • 主动流失:客户由于对服务质量不满或其它原因 • 被动流失:客户由于信用方面的原因 • 外部流失:流向外部竞争对手 • 建模目标 • 预测客户在给定的预测时间内流失的可能性。 • 分析在预测时间内流失率较高的客户群体的特征。 • 为市场经营与决策人员制订相应策略,挽留相应的潜在流失客户(尤其是大客户)提供重要的决策依据。
Direct Marketing • Two approaches to promotion: • Mass marketing • Use mass media to broadcast message to the public without discrimination. • Become less effective due to low response rate. • Direct marketing Charles X.L. and Cheng H.L. Data Mining for Direct Marketing: Problems and Solutions (1998)
Direct Marketing • Direct marketing • A process of identifying likely buyers of certain products and promoting the products accordingly • Studies customers’characteristics. • selects certain customers as the target. • Data mining-provide an effective tool for direct marketing
Case Study 1: Attrition/Churn In the Mobile Phone Industry • Each year, an average of 27% of the customers churn in the US. • Overtime, 90% of the customers in cell phone industry have churned at least once in every five year period. • It takes $300 to $600 to acquire a new customer in this industry. • Example Thus, if we reduce churn by 5%, then we can save Company $5,000,000.00 per year!
The CART Algorithm • CART trees work on the following assumptions: • The attributes are continuous. For nominal attributes, they can be converted to binary continuous attributes. • The tree is binary: that is, they split a continuous scale into two. • CART combines decision trees with linear regression models at the leave nodes • The advantage of the CART is that they can be more easily transformed into rules that people in marketing can easily understand.
Lightbridge • Lightbridge, a US mobile-phone company, applied the CART algorithm to their customer database • to identify a segment of their customer base that held 10% of the customers, • but with a 50% churn rate. • This segment is highly predictive in terms of customer attrition. • This segment is then said to have a lift of five.
The Lightbridge Experience From the CART Tree, it is found • Subscribers who call customer service are more loyal to the company, and are less likely to churn! • The first year anniversary seems to be a very vulnerable time for the customers. • After the customers enter the second year, they do not churn.
Case Study 2: A UK Telecom • A UK Company, • the CART model was applied to 260,000 customers • to study why the churn rate (40%) is so high. • Method • Using March 1998 data as training set, and • April 1998 data as test set • CART generated 29 segments that are shown below.
UK 客户流失模型 Contract Type = N ---Length of Service <= 23.02 ---Tariff = X39 (Segment 28) | | | | | | >23.02---Length of Service <=9.22… (Segment 29) | | | | | >9.22 –---Tariff = X39 | =D—Length of Service <=14.93--- (Segment 24) | | >14.93 -- ..
Case Study 3: A Bank in Canada • The bank wants to sell a mutual fund • Database contains two types of customers • After a Mass Marketing Campaign, • Group1 bought the fund • Group2 has not bought the fund • Often, Group1 << Group2 • Group1 is usually 1% • Question: what are the patterns of group1? • How to select a subgroup from Group 2, such that they are likely to buy the mutual fund?
WorkFlow of Case 1: • Get the database of customers (1%) • Data cleaning: transform address and area codes, deal with missing values, etc. • Split database into training set and testing set • Applying data mining algorithms to the training set • Evaluate the patterns found in the testing set • Use the patterns found to predict likely buyers among the current non-buyers • Promote to likely buyers (rollout plan)
Specific problems • Extremely imbalanced class distribution • E.g. only 1% are positive (buyers), and the rest are negative (non-buyers). • Evaluation criterion for data mining process • The predictive accuracy is no longer suitable. • The training set with a large number of variables can be too large. • Efficient learning algorithm is required.
Solutions • Rank training and testing examples • We require learning algorithms to produce probability estimation or confidence factor. • Use liftas the evaluation criterion • A lift reflects the redistribution of responders in the testing set after ranking the testing examples.
SolutionⅠ:Learning algorithms • Naïve Bayes algorithm • Can produce probability in order to rank the testing examples. • Has efficient and good performance • Decision tree with certainty factor (CF) • Modify C4.5 to produce CF.
SolutionⅠ:Learning algorithms(cont.) • Ada-boosting • 1. Initialize different weights across the training set to be uniform. • 2. Select a training set by sampling according to these weights and train component classifier . • 3. Increase weights of patterns misclassified and decrease weights of patterns correctly classified by . • 4. , then skip to 2
SolutionⅡ:lift index for evaluation • A typical lift table • Use a weighted sum of the items in the lift table over the total sum-lift index • Definition: • E.g.
SolutionⅡ:lift index for evaluation(cont.) • Lift index is independent to the number of the responders • 50% for random distribution • Above 50% for better than random distribution • below 50% for worse than random distribution
Solutions:summary • Two algorithms: • Ada-boosted Naïve Bayes • Ada-boosted C4.5 with CF • Three datasets: • Bank • Life insurance • Bonus program • Training and testing set with equal size
Solutions:summary (cont.) • Procedure: • Training learned results • Rank the testing examples • Calculate lift index and compare classifiers • Repeat 10 times for each dataset to obtain an average lift index
Results • Average lift index on three datasets using boosted Naïve Bayes
Comparison of Results • The mailing cost is reduced • But the response rate is improved. • The net profit is increased dramatically.
Results (cont.) • Net profit in direct marketing
Improvement • Probability estimation model • Rank customers by the estimated probability of response and mail to some top of the list. • Drawback of probability model • The actual value of individual customers is ignored in the ranking. • An inverse correlation between the likelihood to buy and the dollar amount to spend.
Improvement*(cont.) • The goal of direct marketing • To maximize (actual profit – mailing cost) over the contacted customers • Idea: Push algorithm • probability estimation profit estimation * Ke Wang, Senqiang Zhou etc. Mining Customer Value: From Association Rules to Direct Marketing.(2002)
Challenges • The inverse correlation often occurs. • Most probable to buy most money to spend • The high dimensionality of the dataset. • “Transparent” prediction model is desirable. • Wish to devise campaign strategies based on the characteristics of generous expense.
Case Study 4: Direct Marketing for Charity in USA • KDD Cup 98 Dataset • 191,779 records in the database • Each record is described by 479 non-target variables and two target variables • The class: “response” or “not response” • The actual donation in dollars • The dataset was split in half, one for training and one for validation
Push Algorithm: Wang, Zhou et. Al ICDE 2003 The algorithm outline
Step1: Rule Generating • Objective: To find all Focused association rules that captures features of responders. FAR: a respond_rule that satisfies specified minimum R-support and maximum N-support. R-support of a respond_rule is the percentage of the respond records that contain both sides of the rule. N-support of a respond_rule is the largest N-support of the data items in the rule. N-support of a data item is the percentage of the records in non-respond records.
Step2: Model building • Compute Observed average profit for each rule. • Build prediction model: assign the prediction rule with the largest possible to each customer record. Given a record t, a rule r is the prediction rule of t if r matches t and has the highest possible rank.
Step3: The model pruning • Build up prediction tree based on prediction rules. • Simplify the tree by pruning overfitting rules that do not generalize to the whole population.
The prediction • The customer will be contacted if and only if r is a respond_rule and Where is the estimated average profit.
Validation • Comparison with the top5 contestants of the KDD-CUP-98 • The approach generates 67% more total profit and 242% more average profit per mail than the winner of the competition.
Cross-Selling with Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee
Motivation • Question: • A user bought some products already • what other products to recommend to a user? • Collaborative Filtering (CF) • Automates “circle of advisors”. +
Collaborative Filtering “..people collaborate to help one another perform filtering by recording their reactions...” (Tapestry) • Finds users whose taste is similar to you and uses them to make recommendations. • Complimentary to IR/IF. • IR/IF finds similar documents – CF finds similar users.
Example • Which movie would Sammy watch next? • Ratings 1--5 • If we just use the average of other users who voted on these movies, then we get • Matrix= 3; Titanic= 14/4=3.5 • Recommend Titanic! • But, is this reasonable?
Types of Collaborative Filtering Algorithms • Collaborative Filters • Statistical Collaborative Filters • Probabilistic Collaborative Filters [PHL00] • Bayesian Filters [BP99][BHK98] • Association Rules [Agrawal, Han] • Open Problems • Sparsity, First Rater, Scalability
Statistical Collaborative Filters • Users annotate items with numeric ratings. • Users who rate items “similarly” become mutual advisors. • Recommendation computed by taking a weighted aggregate of advisor ratings.
Basic Idea • Nearest Neighbor Algorithm • Given a user a and item i • First, find the the most similar users to a, • Let these be Y • Second, find how these users (Y) ranked i, • Then, calculate a predicted rating of a on i based on some average of all these users Y • How to calculate the similarity and average?
Statistical Filters • GroupLens [Resnick et al 94, MIT] • Filters UseNet News postings • Similarity: Pearson correlation • Prediction: Weighted deviation from mean
Pearson Correlation • Weight between users a and u • Compute similarity matrix between users • Use Pearson Correlation (-1, 0, 1) • Let items be all items that users rated