200 likes | 536 Views
Considering Cost Asymmetry in Learning Classifiers . by Bach, Heckerman and Horvitz. Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007. Outline. Introduction SVM with Asymmetric Cost SVM Regularization Path ( Hastie et al., 2005 ) Path with Cost Asymmetry
E N D
Considering Cost Asymmetry in Learning Classifiers by Bach, Heckerman and Horvitz Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007
Outline • Introduction • SVM with Asymmetric Cost • SVM Regularization Path (Hastie et al., 2005) • Path with Cost Asymmetry • Results • Conclusions
Introduction (1) Binary classification real-valued predictors binary response A classifier could be defined as based on a linear decision function Parameters
Introduction (2) • Two types of misclassification: • false negative: cost • false positive: cost Expected cost: In terms of 0-1 loss function Real loss function but Non-convex Non-differentiable
Introduction (3) Convex loss functions – surrogates for the 0-1 loss function (for training purpose)
Introduction (4) Empirical cost given n labeled data points Objective function asymmetry regularization Since convex surrogates of the 0-1 loss function are used for training, the cost asymmetries for training and testing are mismatched. Motivation: efficiently look at many training asymmetries even if the testing asymmetry is given.
SVM with Asymmetric Cost (1) hinge loss SVM with asymmetric cost where
SVM with Asymmetric Cost (2) The Lagrangian with dual variables Karush-Kuhn-Tucker (KKT) conditions
SVM with Asymmetric Cost (3) The dual problem where A quadratic optimization problem given a cost structure Computation will be intractable for the whole space Following the SVM regularization path algorithm (Hastie et al., 2005), the authors deal with (1)-(3) and KKT conditions instead of the dual problem.
SVM Regularization Path (1) • Define active sets of data points: • Margin: • Left of margin: • Right of margin: KKT conditions SVM regularization path The cost is symmetric and thus searching is along the axis.
SVM Regularization Path (2) Initialization ( ) Consider sufficiently large (C is very small), all the points are in L with Decrease Remain One or more positive and negative examples hit the margin simultaneously
SVM Regularization Path (3) Initialization ( ) Define The critical condition for first two points hitting the margin For , this initial condition keeps the same except the definition of .
SVM Regularization Path (4) • The path: decrease , changes only for except that one of the following events happens • A point from L or R has entered M; • A point in M has left the set to join either R or L consider only the points on the margin where is some function of , Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in
SVM Regularization Path (4) • The path: decrease , changes only for except that one of the following events happens • A point from L or R has entered M; • A point in M has left the set to join either R or L consider only the points on the margin where is some function of , Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in .
SVM Regularization Path (5) • Update regularization • Update active sets and solutions • Stopping condition • In the separable case, we terminate when L become empty; • In the non-separable case, we terminate when for all the possible events
Path withCost Asymmetry (1) Exploration in the 2-d space Path initialization: start at situations when all points are in L Follow the updating procedure in the 1-d case along the line Regularization is changing and the cost asymmetry is fixed. Among all the classifiers, find the best one , given user’s cost function Paths starting from
Path withCost Asymmetry (2) Produce ROC Collecting R lines in the direction of , we can build three ROC curves
Results (1) • For 1000 testing asymmetries , three methods are compared: • “one” – take as training cost asymmetry; • “int” – vary the intercept of “one” and build an ROC, then select the optimal classifier; • “all” – select the optimal classifier from the ROC obtained by varying both the training asymmetry and the intercept. • Use a nested cross-validation: • The outer cross-validation: produce overall accuracy estimates for the classifier; • The inner cross-validation: select optimal classifier parameters (training asymmetry and/or intercept).
Conclusions • An efficient algorithm is presented to build ROC curves by varying the training cost asymmetries for SVMs. • The main contribution is generalizing the SVM regularization path (Hastie et al., 2005) from a 1-d axis to a 2-d plane. • Because of the usage of a convex surrogate, using the testing asymmetry for training leads to non-optimal classifier. • Results show advantages of considering more training asymmetries.