180 likes | 291 Views
Optimizing F-Measure with Support Vector Machines. David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003. Carleton College. Overview. Classification algorithms often evaluated by test set accuracy
E N D
Optimizing F-Measurewith Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College
Overview • Classification algorithms often evaluated by test set accuracy • Test set accuracy can be a poor measure when one of the classes is rare • Support Vector Machines (SVMs) are designed to optimize test set accuracy • SVMs have been used in an ad-hoc manner on datasets with rare classes • Our new results: current ad-hoc heuristic techniques can be theoretically justified.
Roadmap • The Traditional SVM and variants • Precision, Recall, and F-measure metrics • The F-measure Maximizing SVM • Equivalence of traditional SVM and F-measure SVM (for the right parameters) • Implications and Conclusions
The Classification Problem A+ A- = “margin” Separating Surface:
The Classification Problem • Given m points in the n dimensional space Rn • Each point represented as xi • Membership of each point Ai in the classes A+ or A- is specified by yi = § 1 • Separate by two bounding planes such that: • More succinctly:for i=1,2,…,m.
Misclassification Count SVM • (¢)* is the step function (1 if > 0, 0 otherwise) • “Push the planes apart, and minimize number of misclassified points.” • C balances two competing objectives • Minimizing w 0 w pushes planes apart • Problem NP-complete, objective non-differentiable
Approx Misclassification Count SVM • > 0 is an arbitrary fixed constant that determines closeness of approximation. • This is still difficult to solve. where we use some differentiable approximation, such as
Standard “Soft Margin” SVM • “Push the planes apart, and minimize distance of misclassified points.” • We minimize total distances from misclassified points to bounding planes, not actual number of them. • Much more tractable, does quite well in optimizing accuracy • Does poorly when one class is rare
Weighted Standard SVM • “Push the planes apart, and minimize weighted distance of misclassified points.” • Allows one to choose different C values for the two classes. • Often used to weight rare class more heavily. • How do we measure success when one class is rare? Assuming that A+ is the rare class…
Measures of success • Precision and Recall are better descriptors when one class is rare.
F-measure • F-measure: commonly used “average” of precision and recall • Can C+ and C- in the weighted SVM be balanced to optimize F-measure? • Can we start over and invent an SVM to optimize F-measure?
Constructing an F-measure SVM • How do we appropriately represent F-measure in an SVM? • Substitute P and R into F: • Thus to maximize F-measure, we minimize
Constructing an F-measure SVM • Want to minimize • FP = # misclassified A-FN = # misclassified A+ • New F-measure maximizing SVM:
The F-measure Maximizing SVM • Approximate with sigmoid: • Can we connect with standard SVM?
Weighted misclassification count SVM F-measure maximizing SVM • How do these two formulations relate? • We show: • Pick a parameter C. • Find classifier to optimize F-measure SVM. • There exist parameters C+and C- such that misclassification counting SVM has same solution. • Proof and formulas to obtain C+and C- in paper.
Implications of result • Since there exist C+, C- to yield same solution as F-measure maximizing SVM, finding best C+ and C- for the weighted standard SVM is “the right thing to do.”(modulo approximations) • In practice, common trick is to choose C+, C- such that:This heuristic seems reasonable but is not optimal. (Good first guess?)
Implications of result • Suppose that SVM fails to provide good F-measure for a given problem, for a wide range of C+ and C- values. • Q: Is there another SVM formulation that would yield better F-measure?A: Our evidence suggests not. • Q: Is there another SVM formulation that would find best possible F-measure more directly?A: Yes, the F-measure maximizing SVM.
Conclusions / Summary • We provide theoretical evidence that standard heuristic practices in using SVMs for optimizing F-measure are reasonable. • We provide a framework for continued research in F-measure maximizing SVMs. • All our results apply directly to SVMs with kernels (see paper). • Future work: attacking F-measure maximizing SVM directly to find faster algorithms.