280 likes | 294 Views
This research explores cost-sensitive decision-making when costs and probabilities are unknown. It introduces a testbed using the KDD'98 charitable donations dataset and discusses probability estimation methods. Experimental results are presented.
E N D
Learning and Making Decisions When Costs and Probabilities are Both Uknown Authors:Bianca Zadrozny, Charles Elkan Advisor:Dr. Hsu Graduate:Yu-Wei Su IDSL, Intelligent Database System Lab
Outline • Motivation • Objective • Introduction • MetaCost vs. direct cost-sensitive decision-making • a testbed:The KDD’98 charitable donations dataset • Probability estimation methods • Estimaition donation amounts • Experimental results • Conclusion • opinion IDSL, Intelligent Database System Lab
Motivation • Misclassification costs are different for different examples, in the same way of probabilities • Problems of data unbalance in real world dataset IDSL, Intelligent Database System Lab
Objective • To make optimal decisions given cost and probabilities • Solution of sample bias based on Nobel prize-winning economist, James Heckman IDSL, Intelligent Database System Lab
Introduction • Most supervised learning algorithms assume all errors(incorrect predictions) are equal—not true • Cost-sensitive learning lead to the lowestexpected cost • Non cost-sensitive learning classified as accurate • To present an alternative method call direct cost-sensitive decision-making IDSL, Intelligent Database System Lab
MetaCost vs. direct cost-sensitive decision-making • MetaCost • Each example x is associated with a cost C(i,j,x) of predicting class i for x when the true class of x is j • The optimal decision concerning x is the class i that leads to the lowest expected cost IDSL, Intelligent Database System Lab
MetaCost vs. direct cost-sensitive decision-making( cont.) • Direct cost-sensitive decsion-making has the same central idea but two difference • MetaCost is based on the assumption that costs are known in advance and are the same for all examples • do not estimate probabilities using bagging, using simpler method based on single decison tree IDSL, Intelligent Database System Lab
A testbed:the KDD’98 charitable donations dataset • Training set consists of 95412 records with known classes;test set consists of 96367 records without known classes • The overall percentage of donors among population is about 5% • The donation amount for persons who respond varies from $1 to $200 IDSL, Intelligent Database System Lab
A testbed:the KDD’98 charitable donations dataset( cont.) • In donation domain it is easier to talk consistently about benefit than than cost • The optimal predicted label for example x is the class i that maximizes(j=1 mean the person does donate;j=0 not donate) IDSL, Intelligent Database System Lab
A testbed:the KDD’98 charitable donations dataset( cont.) • The optimal policy IDSL, Intelligent Database System Lab
Probability estimation methods • Deficiencies of decison tree methods • Smoothing • Curtailment • Calibrating naive Bayes classifier scores • Averaging probability estimates IDSL, Intelligent Database System Lab
Deficiencies of decison tree methods • Standard decision tree methods assign by default the raw training frequency p=k/n • These are not accurate conditional probability estimate for at least two reasons • High bias • High variance • Pruning methods can alleviate it but it is not suitable for unbalanced datasets IDSL, Intelligent Database System Lab
Deficiencies of decison tree methods( cont.) • The solution use C4.5 without pruning and without collapsing to obtain raw scores that can be transformed into accurate class membership probabilities IDSL, Intelligent Database System Lab
Smoothing • Using the Laplace correction method • For a two-class problem, it replaces the conditional probability estimate p=k/n by p’=(k+1)/(n+2) that adjusts probabilities estimates to be closer to ½ • With donation it replace the probability p=k/n by p’=(k+bm)/(n+m),where b is the base rate of the positive class and m is a parameter IDSL, Intelligent Database System Lab
Smoothing( cont.) • For example, a leaf contains four examples, one of which is positive, the raw C4.5 score of this leaf is 0.25. • The smoothed score with m=200 and b=0.05 is IDSL, Intelligent Database System Lab
Smoothing( cont.) IDSL, Intelligent Database System Lab
Curtailment • To overcome the problem of overfit • Curtailment is not equivalent to any type of pruning IDSL, Intelligent Database System Lab
Curtailment( cont.) IDSL, Intelligent Database System Lab
Curtailment( cont.) IDSL, Intelligent Database System Lab
Calibrating naive Bayes classifier scores • Using a histogram method to obtain calibrated probabilityestimates from a naive Bayesian classifier • Sort the training examples acording to their scores and divide the sorted set into b equal size bins • Given a test example x, place it in a bin according to its score n(x) and then estimate the corrected probability IDSL, Intelligent Database System Lab
Averaging probability estimates • Combining the probability estimates given by different classifiers throught averaging can reduce the variance of the probability estimates[ Tumer and Ghosh,1995] • Where is the variance of each original clasifier,N is the number of classifiers and is the correlatin factor among all classifiers IDSL, Intelligent Database System Lab
Estimaition donation amounts • For non-donors in the training set it should impute a donation amount of zero since their actual donation amount is zero as analogous to donation probability • It is also wrong to using the same donation estimate for all test examples means that the decision about donate is based on the probability IDSL, Intelligent Database System Lab
Estimaition donation amounts( cont.) • These costs or benefits must be estimated for each example • Using least-squares multiple linear regression(MLB) to estimate donaition • Lastgift:dollar amount of most recent gift • Ampergift:average gift amount in responses to the last 22 promotions IDSL, Intelligent Database System Lab
Estimaition donation amounts( cont.) • The problem of sample selection bias • Donation amounts estimated by the regression equation tend to be too low for test examples that have a low probability of donation IDSL, Intelligent Database System Lab
Estimaition donation amounts( cont.) • Heckman correction • To learn a probit linear model to estimate conditional probabilities P(j=1|x) • To estimate y(x) by llinear regression using only the training examples x for which j(x)=1,but including value of P(j=1|x) • Second step of Heckman’s procedure in this paper is obtain by decision tree or a navie Bayes classifier IDSL, Intelligent Database System Lab
Experimental results IDSL, Intelligent Database System Lab
Conclusion • The method of cost-sensitive learning that performs systematically better than MetaCost in experiments • To provide a solution to the fundamental problem of costs being different for different examples • To identify and solve the problem of sample selection bias IDSL, Intelligent Database System Lab
Opinion • Frequency is not the only metric • Positive and negative classes are not 1 and 0 question IDSL, Intelligent Database System Lab