290 likes | 524 Views
Tighter and Convex Maximum Margin Clustering. Yu-Feng Li (LAMDA, Nanjing University, China) (liyf@lamda.nju.edu.cn) Ivor W. Tsang (NTU, Singapore) (IvorTsang@ntu.edu.sg)
E N D
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) (liyf@lamda.nju.edu.cn) Ivor W. Tsang (NTU, Singapore) (IvorTsang@ntu.edu.sg) James T. Kwok (HKUST, Hong Kong) (jamesk@cse.ust.hk) Zhi-Hua Zhou (LAMDA, Nanjing University, China) (zhouzh@lamda.nju.edu.cn)
Summary • Maximum Margin Clustering (MMC) [Xu et al., nips05] • inspired by the success of large margin criterion in SVM • the state-of-the-art performance in many clustering problems. • The problem of existing methods • SDP relaxation: global but not scalable • Local search: efficient but non-convex • We propose a convex LG-MMC method which is also scalable to large datasets via Label Generation strategy.
Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion
Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion
Maximum Margin Clustering [Xu et.al., NIPS05] • Perform clustering (i.e., determining the unknown label y) by simultaneously finding maximum margin hyperplane in the data • Setting • Given a set of unlabeled pattern • Goal • Learn a decision function and a label vector Error Margin Balance Constraint
Maximum Margin Clustering [Xu et.al., NIPS05] • The Dual problem • Key • Some kind of relaxation maybe helpful Mixed integer program, intractable for large scale dataset
Related work • MMC with SDP relaxation [Xu et.al., nips05] • convex, state-of-the-art performance • Expensive: the worse-case O(n^6.5) • Generalized MMC [Valizadegan & Jin, nips07] • a smaller SDP problem which speedup MMC by 100 times • Still expensive: cannot handle medium datasets • Some efficient algorithms [Zhang et.al., icml07][Zhao et.al.,sdm08] • Much more scalable than global methods • Non-convex: may get struck in local minima To investigate a convex method which is also scalable for large datasets
Outline • Introduction • The Proposed LG-MMC Method • Experiment Results • Conclusion
Intuition efficient hard ? ? 1 -1 SVM ? SVM -1 ? 1 ? -1 combination efficient -1 1 1 -1 1 - yy’ : label-kernel SVM -1 -1 1 1 -1 - Multiple label-kernel learning
Flow Chart of LG-MMC • LG-MMC: transform MMC problem to multiple label-kernel learning via minmax relaxation • Cutting Plane Algorithm • multiple label-kernel learning • Finding the most violated y • LG-MMC achieves tighter relaxation than SDP relaxation [Xu et al., nips05]
LG-MMC: Minmax relaxation of MMC problem • Consider interchanging the order of and , leading to: • According to the minmax theorem, the optimal objective of LG-MMC is upper bound of that of MMC problem.
LG-MMC: multiple label-kernel learning • Firstly, LG-MMC can be rewritten as: • For the inner optimization subproblem, let be the dual variable for each constraint. Its Lagrangian can be obtained as:
LG-MMC: multiple label-kernel learning (cont.) • Setting its derivative w.r.t. to zero, we have • Let be the simplex • Replace the inner subproblem with its dual and one can have: • Similar to single label learning, the above formulation can be regarded as multiple label-kernel learning.
Cutting Plane Algorithm • Problem: Exponential number of possible labeling assignment • the set of base kernels is also exponential in size • direct multiple kernel learning (MKL) is computationally intractable • Observation • only a subset of these constraints are active at optimality • cutting-planemethod
Cutting Plane Algorithm 1. Initialize . Find the most violated y and set = {y,−y}. ( is the subset of constraints). 2. Run MKL for the subset of kernel matrices selected in . 3. Find the most violated y and set 4. Repeat steps 2-3 until convergence. How? How?
Cutting Plane AlgorithmStep2: Multiple Label-Kernel Learning • Suppose that the current working set is • The feature map for the base kernel matrix : • SimpleMKL 1. Fix and solve the SVM’s dual 2. Fix and use gradient method for updating 3. Iterate until converge
Cutting Plane Algorithm Step 3: Finding the most violated y • Find the most violated y: • Problem: Concave QP • Observation: • The cutting plane algorithm only requires the addition of a violated constraint at each iteration • Replace the L2 norm above with infinity-norm
Cutting Plane Algorithm Step 3: Finding the most violated y • Each of these is of the form: • Sort ‘s • Balance constraint
LG-MMC achieves tighter relaxation • Consider the set of all feasible label matrices and two relaxations Convex hull
LG-MMC achieves tighter relaxation (cont.) • Define • One can find that • Maximum margin clustering is the same as • LG-MMC problem is the same as • SDP based MMC problem is the same as
LG-MMC achieves tighter relaxation (cont.) • is the convex-hull of , which is the smallest convex set containing . • LG-MMC gives the tightest convex relaxation. • It can be shown that is more relaxed than . • SDP MMC is a looser relaxation than the proposed formulation.
Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion
Experiments Data sets 17 UCI dataset MNIST dataset Implementation Matlab 7.6 Evaluation Misclassification error
Compared Methods • k-means • One of most mature baseline methods • Normalized Cut [Shi & Malik, PAMI00] • The first spectral based clustering method • GMMC [Valizadegan & Jin, nips07] • One of the most efficient global methods for MMC • IterSVR [Zhang et.al., icml07] • An efficient algorithm for MMC • CPMMC [Zhao et.al., sdm08] • Another state-of-the-art efficient method for MMC
Win-tie-loss • Global method vs local method • Global method are better than local method. • LG-MMC vs GMMC • LG-MMC is competitive to GMMC method.
Speed LG-MMC is about 10 times faster than GMMC However, In general, local methods are faster than global method.
Outline • Introduction • The Proposed LG-MMC Method • Experiment Results • Conclusion
Conclusion • Main Contribution • In this paper, we propose a scalable and global optimization method for maximum margin clustering • To our best knowledge, it is first time to use label-generation strategy for clustering which might be useful in other domains • Further work • In further, we will extend the proposed approach for semi-supervised learning. Thank you