240 likes | 505 Views
AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach. Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li , mark}@ cs.sfu.ca. Outline. Introduction Adaptive Multiple Kernel Learning Experiments Conclusion.
E N D
AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, mark}@cs.sfu.ca * This work was done when the author was at SFU.
Outline • Introduction • Adaptive Multiple Kernel Learning • Experiments • Conclusion
Introduction • Kernel • Given a set of data and a feature mapping function , a kernel matrix can be defined as the inner product of each pair of feature vectors of . • Multiple Kernel Learning • Aim to learn an optimal kernel as well as support vectors by combining a set of kernels linearly. Kernel coefficients
Introduction • Multiple Kernel Learning
Introduction • Example: Lp-norm Multiple Kernel Learning [1] • Learning • train a traditional SVM by fixing the kernel coefficients; • learn kernel coefficients by fixing w Convex function Traditional SVM constraints Kernel coefficient constraints [1] M. Kloft, et al. Efficient and accurate Lp-norm multiple kernel learning. In NIPS’09, 2009.
Introduction • Motivation • Lp-norm kernel coefficient constraint makes the learning of kernel coefficients difficult, especially when p>1 • Intuition • Solve the MKL problem without considering the kernel coefficient constraints explicitly • Contributions • Propose a family of biconvex optimization formulations for MKL • Can handle the cases of arbitrary norms of kernel coefficients • Easy and fast to optimize
Adaptive Multiple Kernel Learning weighting Biconvexfunction
Adaptive Multiple Kernel Learning • Biconvex functions • f(x,y) is a biconvex function if fy(x) is convex and fx(y) is convex. • Example: f(x,y)=x2+y2-3xy • Biconvex optimization • At least one function in the objective functions and constraints is biconvex, and others are convex. • Local optima
Adaptive Multiple Kernel Learning • Adaptive Multiple Kernel Learning (AdaMKL) • Aim to simplify the MKL learning process as well as keep the similar discriminative power of MKL using biconvex optimization. • Binary classification
Adaptive Multiple Kernel Learning • Objective function:
Adaptive Multiple Kernel Learning • Optimization • Learn w by fixing θ using Np(θ) norm • Learn θ by fixing w using L1 or L2 norm of θ • Repeat the two steps until converged
Adaptive Multiple Kernel Learning • Learning w (Dual)
Adaptive Multiple Kernel Learning • Learning θ
Adaptive Multiple Kernel Learning • Computational complexity • Same as quadratic programming • Convergence • If hard-margin cases (C=+∞) can be solved at the initialization stage, then AdaMKL will converge to a local minimum. • If at either step our objective function converged, then AdaMKL has converged to a local minimum.
Adaptive Multiple Kernel Learning Lp-norm MKL AdaMKL • Convex • Kernel coefficient norm condition • Gradient search, Semi-infinite programming (SIP), etc • Biconvex • Kernel coefficient conditions hidden in dual • Quadratic programming
Experiments • 4 specific AdaMKL: N0L1, N1L1, N1L2, N2L2, where “N” and “L” denote the types of norm used for learning w and θ. • 2 experiments • Toy example: C=105 without tuning, 10 Gaussian kernels, randomly sampled from 2D Gaussian distributions • Positive samples: mean [0 0], covariance [0.3 0; 0 0.3], 100 samples • Negative samples: mean [-1 -1] and [1 1], covariance [0.1 0; 0 0.1] and [0.2 0; 0 0.2], 100 samples, respectively. 16
Experiments (2) [2] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efficiency in multiple kernel learning. In ICML’07. [3] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. JMLR, 9:2491–2521, 2008. • 4 benchmark datasets: breast-cancer, heart, thyroid, and titanic (downloaded from http://ida.first.fraunhofer.de/projects/bench/) • Gaussian kernels + polynomial kernels • 100, 140, 60, 40 kernels for corresponding datasets, respectively • Compared with convex optimization based MKL: GMKL[2] and SMKL[3] 17
Experiments - Benchmark datasets (a) Breast-Cancer: [69.64 ~ 75.23] (b) Heart: [79.71 ~ 84.05] (d) Titanic: [76.02 ~ 77.58]) (c) Thyroid: [95.20 ~ 95.80]
Experiments - Benchmark datasets (a) Breast-Cancer (b) Heart (d) Titanic (c) Thyroid
Conclusion • Biconvex optimization for MKL • Hide the kernel coefficient constraints (non-negative and Lp (p≥1) norm) in the dual without explicit consideration. • Easy to optimize, fast to converge, lower computational time but similar performance as traditional convex optimization based MKL