Tighter and Convex Maximum Margin Clustering

Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) (liyf@lamda.nju.edu.cn) Ivor W. Tsang (NTU, Singapore) (IvorTsang@ntu.edu.sg) James T. Kwok (HKUST, Hong Kong) (jamesk@cse.ust.hk) Zhi-Hua Zhou (LAMDA, Nanjing University, China) (zhouzh@lamda.nju.edu.cn)

Summary • Maximum Margin Clustering (MMC) [Xu et al., nips05] • inspired by the success of large margin criterion in SVM • the state-of-the-art performance in many clustering problems. • The problem of existing methods • SDP relaxation: global but not scalable • Local search: efficient but non-convex • We propose a convex LG-MMC method which is also scalable to large datasets via Label Generation strategy.

Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion

Maximum Margin Clustering [Xu et.al., NIPS05] • Perform clustering (i.e., determining the unknown label y) by simultaneously finding maximum margin hyperplane in the data • Setting • Given a set of unlabeled pattern • Goal • Learn a decision function and a label vector Error Margin Balance Constraint

Maximum Margin Clustering [Xu et.al., NIPS05] • The Dual problem • Key • Some kind of relaxation maybe helpful  Mixed integer program, intractable for large scale dataset 

Related work • MMC with SDP relaxation [Xu et.al., nips05] • convex, state-of-the-art performance • Expensive: the worse-case O(n^6.5) • Generalized MMC [Valizadegan & Jin, nips07] • a smaller SDP problem which speedup MMC by 100 times • Still expensive: cannot handle medium datasets • Some efficient algorithms [Zhang et.al., icml07][Zhao et.al.,sdm08] • Much more scalable than global methods • Non-convex: may get struck in local minima To investigate a convex method which is also scalable for large datasets

Outline • Introduction • The Proposed LG-MMC Method • Experiment Results • Conclusion

Intuition efficient hard ? ? 1 -1 SVM ? SVM -1 ? 1 ? -1 combination efficient -1 1 1 -1 1 - yy’ : label-kernel SVM -1 -1 1 1 -1 - Multiple label-kernel learning

Flow Chart of LG-MMC • LG-MMC: transform MMC problem to multiple label-kernel learning via minmax relaxation • Cutting Plane Algorithm • multiple label-kernel learning • Finding the most violated y • LG-MMC achieves tighter relaxation than SDP relaxation [Xu et al., nips05]

LG-MMC: Minmax relaxation of MMC problem • Consider interchanging the order of and , leading to: • According to the minmax theorem, the optimal objective of LG-MMC is upper bound of that of MMC problem.

LG-MMC: multiple label-kernel learning • Firstly, LG-MMC can be rewritten as: • For the inner optimization subproblem, let be the dual variable for each constraint. Its Lagrangian can be obtained as:

LG-MMC: multiple label-kernel learning (cont.) • Setting its derivative w.r.t. to zero, we have • Let be the simplex • Replace the inner subproblem with its dual and one can have: • Similar to single label learning, the above formulation can be regarded as multiple label-kernel learning.

Cutting Plane Algorithm • Problem: Exponential number of possible labeling assignment • the set of base kernels is also exponential in size • direct multiple kernel learning (MKL) is computationally intractable • Observation • only a subset of these constraints are active at optimality • cutting-planemethod

Cutting Plane Algorithm 1. Initialize . Find the most violated y and set = {y,−y}. ( is the subset of constraints). 2. Run MKL for the subset of kernel matrices selected in . 3. Find the most violated y and set 4. Repeat steps 2-3 until convergence. How? How?

Cutting Plane AlgorithmStep2: Multiple Label-Kernel Learning • Suppose that the current working set is • The feature map for the base kernel matrix : • SimpleMKL 1. Fix and solve the SVM’s dual 2. Fix and use gradient method for updating 3. Iterate until converge

Cutting Plane Algorithm Step 3: Finding the most violated y • Find the most violated y: • Problem: Concave QP • Observation: • The cutting plane algorithm only requires the addition of a violated constraint at each iteration • Replace the L2 norm above with infinity-norm

Cutting Plane Algorithm Step 3: Finding the most violated y • Each of these is of the form: • Sort ‘s • Balance constraint

LG-MMC achieves tighter relaxation • Consider the set of all feasible label matrices and two relaxations Convex hull

LG-MMC achieves tighter relaxation (cont.) • Define • One can find that • Maximum margin clustering is the same as • LG-MMC problem is the same as • SDP based MMC problem is the same as

LG-MMC achieves tighter relaxation (cont.) • is the convex-hull of , which is the smallest convex set containing . • LG-MMC gives the tightest convex relaxation. • It can be shown that is more relaxed than . • SDP MMC is a looser relaxation than the proposed formulation.

Outline • Introduction • The Proposed LG-MMC Method • Experimental Results • Conclusion

Experiments Data sets 17 UCI dataset MNIST dataset Implementation Matlab 7.6 Evaluation Misclassification error

Compared Methods • k-means • One of most mature baseline methods • Normalized Cut [Shi & Malik, PAMI00] • The first spectral based clustering method • GMMC [Valizadegan & Jin, nips07] • One of the most efficient global methods for MMC • IterSVR [Zhang et.al., icml07] • An efficient algorithm for MMC • CPMMC [Zhao et.al., sdm08] • Another state-of-the-art efficient method for MMC

Clustering Error

Win-tie-loss • Global method vs local method • Global method are better than local method. • LG-MMC vs GMMC • LG-MMC is competitive to GMMC method.

Speed LG-MMC is about 10 times faster than GMMC However, In general, local methods are faster than global method.

Outline • Introduction • The Proposed LG-MMC Method • Experiment Results • Conclusion

Conclusion • Main Contribution • In this paper, we propose a scalable and global optimization method for maximum margin clustering • To our best knowledge, it is first time to use label-generation strategy for clustering which might be useful in other domains • Further work • In further, we will extend the proposed approach for semi-supervised learning. Thank you

Tighter and Convex Maximum Margin Clustering

Tighter and Convex Maximum Margin Clustering

Presentation Transcript

Convex and Concave Lenses

CoFi Rank : Maximum Margin Matrix Factorization for Collaborative Ranking

Clustering Change Detection Using Normalized Maximum Likelihood Coding

Concave and Convex Lenses

Convex Functions, Convex Sets and Quadratic Programs

Tighter British control

Moab Tighter Loop

Convex Mixture Models for Multi-view Clustering

Concave and Convex Mirrors

Concave and Convex Mirrors

Convex and Concave

Tighter British Control

Part II: Discriminative Margin Clustering

MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification

Clustering Detecting margin regions

Laplace Maximum Margin Markov Networks

Convex and Concave Functions

Tighter British Control

PORTFOLIO MARGIN and CROSS-MARGIN