270 likes | 282 Views
This research paper focuses on the problem of designing efficient semi-supervised learning algorithms. The authors propose a method based on estimating label means, which allows for reducing the number of constraints in the optimization problem while achieving competitive performance with state-of-the-art methods. The proposed algorithms are based on convex relaxation and alternating optimization. Experimental results show the effectiveness of the approach.
E N D
Semi-Supervised Learning Using Label Mean Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1 1LAMDA Group, Nanjing University, China {liyf, zhouzh}@lamda.nju.edu.cn 2Dept. Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong jamesk@cse.ust.hk
The Problem Many SVM algorithms for supervised learning are efficient. Existing S3VMs (Semi-Supervised SVMs) are not so efficient. What’s the major obstacle to designing efficient S3VMs? How to design an efficient S3VM?
Outline • Introduction • Our Methods • Experiments • Conclusion
Introduction Semi-Supervised Learning (SSL) Optimal Hyperplane The goal of SSL is to improve the performance of supervised learning by utilizing unlabeled data
Introduction SSL Applications • Text categorization [Joachims. ICML’99] • Hand-written digit classification [Zhu et al., ICML’03; Zhu et al., ICML’05] • Medical image segmentation [Grady & Funka-Lea, ECCV’04] • Image retrieval [He at al., ACM Multimedia’04] • Word sense disambiguation [Niu et al., ACL’04; Yarowsky et al., ACL’95; CUONG, Thesis07] • Object detection [Rosenberg et al., WACV’05] • … …
Introduction Many SSL Algorithms • Generative methods [Miller & Uyar, NIPS’96; Nigam et al., MLJ00; Fujino et al., AAAI’05; etc.] • Disagreement-based methods [Blum & Mitchell, COLT’98; Mitchell, ICCS’99; Nigam & Ghahi, CIKM’00; Zhou & Li, TKDE’05] • Graph-based methods [Zhou et al., NIPS’02, Zhu et al., ICML03; Belkin et al., JMLR’06] • … … • Recent surveys of SSL literature: • Chapelle et al., eds., Semi-Supervised Learning, MIT Press, 2006 • Zhu, Semi-Supervised Learning Literature Survey, 2007 • Zhou & Li, Semi-supervised learning by disagreement, KAIS, 2009
Introduction S3VMs • Semi-supervised Support Vector Machine [Bennett & Demiriz, NIPS’99] • Transductive SVM [Joachims, ICML’99] • Laplacian SVM [Belkin et al., JMLR’06] • SDP relaxations [De Bie & Cristianimi, NIPS’04; De Bie & Cristianim, JMLR’06] • Many optimization algorithms for S3VM [Chepelle et al., JMLR’08] • … …
Introduction S3VMs Optimal Hyperplane Low-Density Assumption & Cluster Assumption [Chellepe et al., ICML05]
Introduction S3VMs formulations Margin Loss on labeled data, e.g., hinge loss Loss on unlabeled data, e.g., symmetric hinge loss Balance constraint The effect of the objective in S3VM has been well-studied in [Chellepe et al., JMLR’08].
Introduction Efficiency of existing S3VMs • [Bennett & Demiriz, NIPS’99] formulated S3VM as a mixed-integer programming problem, so it is computationally intractable in general • Transductive SVM [Joachims, ICML’99] iteratively solves standard supervised SVM problems, however, the number of iterations may be quite large in practice • Laplacian SVM [Belkin et al., JMLR’06] solves a small SVM with labeled data only, but it needs to calculate the inverse of an nn matrix ( O(n3) time and O(n2) memory) Existing S3VMs are inefficient
Introduction Analysis • Our main observation: • Most S3VM algorithms aim at estimating the correct label of each unlabeled instance • The number of constraints in the optimization problem will be as many as the unlabeled samples Can we use simpler statistics instead of the labels to reduce the number of constraints while still achieves competitively performance with state-of-art ssl methods? - label means.
Outline • Introduction • Our Methods • Experiments • Conclusion
Our Methods Usefulness of the Label Mean We consider the following optimization problem are estimations of the label means
Our Methods Usefulness of the Label Mean (cont.) MeanS3VM This motivates us to first estimate the label means of the unlabeled instances. Difference only exists when samples are non-separable This analysis suggests that, if an S3VM “knows” the label means of the unlabeled instances, it can closely approximate an SVM that “knows” all the labels of the unlabeled instances!
Our Methods Estimate the label mean Maximal margin approach We propose two algorithms to solve it, one is based on convex relaxation, the other is based on alternating optimization. • Note that it has much fewer constraints than S3VM, which greatly reduces the time complexity of the optimization. • It can also be explained in terms of MMD [Gretton et al., NIPS’06] which aims to separate distribution of different classes with large margin.
Our Methods Convex relaxation approach Consider the dual Consider the minimax relaxation [Li et al., AISTATS’09] Multiple Kernel Learning
Our Methods Convex relaxation approach (cont.) Exponential number of base kernels…. Too expensive Cutting plane algorithm Adaptive SimpleMKL How?
Our Methods Find the most violated d To find the most violated d, we need to solve the following maximization problem Rewritten as It is a concave QP, and could not be solved efficiently… Not related to d However, the cutting plane method only requires to add a violated constraintat each iteration Hence, we propose a simple and efficient method for finding a good approximation of the most violated d Linear problem, can be solved by sorting
Our Methods Alternating Optimization Iterate until convergence. Fixed d, solve the dual variable Standard SVM Fixed dual variable, solve the d Can still be solved by sorting
Our Methods Comparison and means3vm implementation Convex relaxation approach is global optimization Alternating optimization approach may get stuck in local solution, but simple and empirically faster We use the result of d from these two approaches, together with the labels of the labeled data, to train a final SVM We denote convex relaxation approach as meanS3vm-mkl and alternating optimization approach as meanS3vm-iter
Outline • Introduction • Our Methods • Experiments • Conclusion
Experiments Four Kinds of Tasks • Benchmark tasks • UCI data sets • Text categorization • Speed
Experiments Benchmark Tasks Following the same setup as S3VM meanS3vms achieve highly competitive performance.
Experiments UCI datasets 9 data sets, 10 labeled data, 50% train / 50% test, 20 runs win 0 1 0 0 3 2 4 Means3vms achieve highly competitive performance in all data sets. In particular, they achieve the best performance in 6 of 9 tasks.
Experiments Text Categorization 10 binary tasks: 2 labeled data, 50% train / 50% test, 20 runs win 0 2 0 0 4 4 Means3vms achieve highly competitive performance in all data sets. They achieve the best performance in 8 of 10
Experiments Speed On large data sets (with more than 1,000 instances), means3vm-mkl is much faster than Laplacian SVM. means3vm-iter is almost the fastest method. On large data sets, means3vm-iter is 10 times faster than Laplacian SVM, 100 times faster than TSVM.
Conclusion Main contribution: • S3VM + label means ~ SVM with full labels • Two efficient and effective SSL methods Future work: • Theoretical study on the effect of label means • Other approaches to estimating label means Thanks!