290 likes | 465 Views
Concave Minimization for Support Vector Machine Classifiers. Unlabeled Data Classification & Data Selection. Glenn Fung O. L. Mangasarian. Part 1: Unlabeled Data Classification. Given a large unlabeled dataset
E N D
Concave Minimization for Support Vector Machine Classifiers Unlabeled Data Classification&Data Selection Glenn Fung O. L. Mangasarian
Part 1: Unlabeled Data Classification • Given a large unlabeled dataset • Use a k-Median clustering algorithm to select a small (5% to 10%) representative sample. • Representative sample is labeled by expert or oracle. • Combined labeled-unlabeled dataset is classified by a Semi-supervised Support Vector Machine. • Test set correctness within 5.2% of a linear support vector machine trained on the entire dataset labeled by an expert.
Part 2: Data Selection for Support Vector Machines Classifiers • Extract a minimal set of data points from a given dataset. • Minimal set used to generate a Minimal Support Vector Machine (MSVM) classifier. • MSVM classifier as good or better than that obtained by training on entire dataset. • Feature selection is incorporated into procedure to obtain a minimal set of input features. • Data reduction as high as 81% and averaged 66% over seven public datasets.
Unlabeled Data Classification • Given a completely unlabeled large data set. • Costly to label points by an expert or an oracle. • Two Question arise: • How to choose a small subset for labeling? • How to combine labeled and unlabeled data? • Answers: • Use k-median clustering for selecting “representative” points to be labeled. • Use semi-supervised SVM to obtain a classifier based on labeled and unlabeled data.
Unlabeled Data Classification Unlabeled Data Set k-Median clustering Chosen Data Remaining Data Expert Labeled Data Semi-supervised SVM Separating Plane
K-Median Clustering Algorithm • Given m data points. Find k clusters of these points such that the sum of the 1-norm distances from each point to the closest cluster center is minimized.
* * * * K-Median Clustering Algorithm
Unlabeled Data Classification Unlabeled Data Set k-Median clustering Chosen Data Remaining Data Expert Labeled Data Semi-supervised SVM Separating Plane
Semi-supervised SVM (S3VM) • Given a dataset consisting of: • labeled (+1,-1) points represented by: • unlabeled points represented by: • Classify the data into two classes as follows: • Assign each unlabeled point in to a class (+1,-1) so as to maximize the distance between the bounding planes obtained by a linear SVM1 applied to entire dataset.
:A concave approach • The term in the objective function is concave because it is the minimum of two linear functions. • A local solution to this problem is obtained solving a succession of linear programs (4 to 7) .
S3VM: Graphical ExampleSeparate Triangles & Circles Hollow shapes represent labeled data Solid shapes represent unlabeled data SVM S3VM
Part 2: Data Selection for Support Vector Machines Classifiers Labeled dataset 1-norm SVM feature selection Smaller dimension dataset Support vector suppression MSVM Separating surface
Feature Selection using 1-norm Linear SVM ( small.)
Motivation for the Minimal Support Vector Machine (MSVM)
Motivation for the Minimal Support Vector Machine (MSVM) • Suppression of error term y: • Minimizes the number of misclassified points. • Works remarkably well computationally. • Reduces positive components of multiplier u and hence number of support vectors.
Conclusions • Unlabeled data classification: • A fast finite linear programming based approach for Semi-supervised Support Vector Machines was proposed for classifying large datasets that are mostly unlabeled. • Totally unlabeled datasets were classified by: • Labeling a small percentage of clusters by an expert • Classification by a semi-supervised SVM • Test set correctness within 5.2% of a linear SVM trained on the entire dataset labeled by an expert.
Conclusions • Data selection for SVM classifiers: • Minimal SVM (MSVM) extracts a minimal subset used to classify the entire dataset. • MSVM maintains or improves generalization over other classifiers that use the entire dataset. • Data reduction as high as 81%, and averaged 66% over seven public datasets. • Future work • MSVM: Promising tool for incremental algorithms. • Improve chunking algorithms with MSVM. • Nonlinear MSVM: strong potential for time & storage reduction.