Optimizing F-Measure with Support Vector Machines

Optimizing F-Measurewith Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College

Overview • Classification algorithms often evaluated by test set accuracy • Test set accuracy can be a poor measure when one of the classes is rare • Support Vector Machines (SVMs) are designed to optimize test set accuracy • SVMs have been used in an ad-hoc manner on datasets with rare classes • Our new results: current ad-hoc heuristic techniques can be theoretically justified.

Roadmap • The Traditional SVM and variants • Precision, Recall, and F-measure metrics • The F-measure Maximizing SVM • Equivalence of traditional SVM and F-measure SVM (for the right parameters) • Implications and Conclusions

The Classification Problem A+ A- = “margin” Separating Surface:

The Classification Problem • Given m points in the n dimensional space Rn • Each point represented as xi • Membership of each point Ai in the classes A+ or A- is specified by yi = § 1 • Separate by two bounding planes such that: • More succinctly:for i=1,2,…,m.

Misclassification Count SVM • (¢)* is the step function (1 if  > 0, 0 otherwise) • “Push the planes apart, and minimize number of misclassified points.” • C balances two competing objectives • Minimizing w 0 w pushes planes apart • Problem NP-complete, objective non-differentiable

Approx Misclassification Count SVM •  > 0 is an arbitrary fixed constant that determines closeness of approximation. • This is still difficult to solve. where we use some differentiable approximation, such as

Standard “Soft Margin” SVM • “Push the planes apart, and minimize distance of misclassified points.” • We minimize total distances from misclassified points to bounding planes, not actual number of them. • Much more tractable, does quite well in optimizing accuracy • Does poorly when one class is rare

Weighted Standard SVM • “Push the planes apart, and minimize weighted distance of misclassified points.” • Allows one to choose different C values for the two classes. • Often used to weight rare class more heavily. • How do we measure success when one class is rare? Assuming that A+ is the rare class…

Measures of success • Precision and Recall are better descriptors when one class is rare.

F-measure • F-measure: commonly used “average” of precision and recall • Can C+ and C- in the weighted SVM be balanced to optimize F-measure? • Can we start over and invent an SVM to optimize F-measure?

Constructing an F-measure SVM • How do we appropriately represent F-measure in an SVM? • Substitute P and R into F: • Thus to maximize F-measure, we minimize

Constructing an F-measure SVM • Want to minimize • FP = # misclassified A-FN = # misclassified A+ • New F-measure maximizing SVM:

The F-measure Maximizing SVM • Approximate with sigmoid: • Can we connect with standard SVM?

Weighted misclassification count SVM F-measure maximizing SVM • How do these two formulations relate? • We show: • Pick a parameter C. • Find classifier to optimize F-measure SVM. • There exist parameters C+and C- such that misclassification counting SVM has same solution. • Proof and formulas to obtain C+and C- in paper.

Implications of result • Since there exist C+, C- to yield same solution as F-measure maximizing SVM, finding best C+ and C- for the weighted standard SVM is “the right thing to do.”(modulo approximations) • In practice, common trick is to choose C+, C- such that:This heuristic seems reasonable but is not optimal. (Good first guess?)

Implications of result • Suppose that SVM fails to provide good F-measure for a given problem, for a wide range of C+ and C- values. • Q: Is there another SVM formulation that would yield better F-measure?A: Our evidence suggests not. • Q: Is there another SVM formulation that would find best possible F-measure more directly?A: Yes, the F-measure maximizing SVM.

Conclusions / Summary • We provide theoretical evidence that standard heuristic practices in using SVMs for optimizing F-measure are reasonable. • We provide a framework for continued research in F-measure maximizing SVMs. • All our results apply directly to SVMs with kernels (see paper). • Future work: attacking F-measure maximizing SVM directly to find faster algorithms.

Optimizing F-Measure with Support Vector Machines

Optimizing F-Measure with Support Vector Machines

Presentation Transcript

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Optimizing F-Measure with Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines