30 likes | 165 Views
An Extended Level Method for Multiple Kernel Learning Zenglin Xu 1 , Rong Jin 2 , Irwin King 1 , Michael R. Lyu 1. Michigan State University. The Chinese University of Hong Kong. 2 rongjin@cse.msu.edu Department of Computer Science and Engineering Michigan State University
E N D
An Extended Level Method for Multiple Kernel Learning Zenglin Xu1,Rong Jin2,Irwin King1, Michael R. Lyu1 Michigan State University The Chinese University of Hong Kong 2 rongjin@cse.msu.edu Department of Computer Science and Engineering Michigan State University East Lansing, MI, 48824 1 {zlxu, king, lyu}@cse.cuhk.edu.hk Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong 3. Level Method for MKL 1. Multiple Kernel Learning (MKL) 4. Experiments 3.1 Motivation Given a list of base kernel functions/matrices Ki , i = 1, . . . ,m, MKL searches for a linear combination of the base kernel functions that maximizes a generalized performance measure. 4.1 Settings Kernel combination parameters 4.2 Results Kernel weights evolution SVM parameters SILP / breast Properties 3.2 Algorithms Question? Time-saving-ratio How to efficiently solve the optimization problem? 2. Optimization Method Properties: Small or medium scale: • Semi-definite Programming (SDP) [Lanckriet et al., 2004] • Second Order Cone Programming (SOCP) [Bach et al., 2004] Large scale: • Semi-Infinite Linear Programming (SILP) [Sonnenburg et al., 2006] • Subgradient Descent (SD) [Rakotomamonjy et al., 2008] Evolution of objective values SD / breast Breast data set Properties of Gap Properties of Bounds Steps of SILP and SD 4.3 Discussion Level / breast • SILP: • oscillation of solutions • SD: • a large number of calls to SVM required to compute the optimal step size via a line search • e.g., for “iono”, 1231 times for SD, while 47 for level method SILP pro • Level method: • the cutting plane model • the projection to level sets ensures the stability of solutions con 3.3 Level method illustration SD 5 Conclusion pro An efficient method for large scale multiple kernel learning con • utilize the gradients of all the past solutions • introduce a projection step for regularization