AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach

AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, mark}@cs.sfu.ca * This work was done when the author was at SFU.

Outline • Introduction • Adaptive Multiple Kernel Learning • Experiments • Conclusion

Introduction • Kernel • Given a set of data and a feature mapping function , a kernel matrix can be defined as the inner product of each pair of feature vectors of . • Multiple Kernel Learning • Aim to learn an optimal kernel as well as support vectors by combining a set of kernels linearly. Kernel coefficients

Introduction • Multiple Kernel Learning

Introduction • Example: Lp-norm Multiple Kernel Learning [1] • Learning • train a traditional SVM by fixing the kernel coefficients; • learn kernel coefficients by fixing w Convex function Traditional SVM constraints Kernel coefficient constraints [1] M. Kloft, et al. Efficient and accurate Lp-norm multiple kernel learning. In NIPS’09, 2009.

Introduction • Motivation • Lp-norm kernel coefficient constraint makes the learning of kernel coefficients difficult, especially when p>1 • Intuition • Solve the MKL problem without considering the kernel coefficient constraints explicitly • Contributions • Propose a family of biconvex optimization formulations for MKL • Can handle the cases of arbitrary norms of kernel coefficients • Easy and fast to optimize

Adaptive Multiple Kernel Learning weighting Biconvexfunction

Adaptive Multiple Kernel Learning • Biconvex functions • f(x,y) is a biconvex function if fy(x) is convex and fx(y) is convex. • Example: f(x,y)=x2+y2-3xy • Biconvex optimization • At least one function in the objective functions and constraints is biconvex, and others are convex. • Local optima

Adaptive Multiple Kernel Learning • Adaptive Multiple Kernel Learning (AdaMKL) • Aim to simplify the MKL learning process as well as keep the similar discriminative power of MKL using biconvex optimization. • Binary classification

Adaptive Multiple Kernel Learning • Objective function:

Adaptive Multiple Kernel Learning • Optimization • Learn w by fixing θ using Np(θ) norm • Learn θ by fixing w using L1 or L2 norm of θ • Repeat the two steps until converged

Adaptive Multiple Kernel Learning • Learning w (Dual)

Adaptive Multiple Kernel Learning • Learning θ

Adaptive Multiple Kernel Learning • Computational complexity • Same as quadratic programming • Convergence • If hard-margin cases (C=+∞) can be solved at the initialization stage, then AdaMKL will converge to a local minimum. • If at either step our objective function converged, then AdaMKL has converged to a local minimum.

Adaptive Multiple Kernel Learning Lp-norm MKL AdaMKL • Convex • Kernel coefficient norm condition • Gradient search, Semi-infinite programming (SIP), etc • Biconvex • Kernel coefficient conditions hidden in dual • Quadratic programming

Experiments • 4 specific AdaMKL: N0L1, N1L1, N1L2, N2L2, where “N” and “L” denote the types of norm used for learning w and θ. • 2 experiments • Toy example: C=105 without tuning, 10 Gaussian kernels, randomly sampled from 2D Gaussian distributions • Positive samples: mean [0 0], covariance [0.3 0; 0 0.3], 100 samples • Negative samples: mean [-1 -1] and [1 1], covariance [0.1 0; 0 0.1] and [0.2 0; 0 0.2], 100 samples, respectively. 16

Experiments (2) [2] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efficiency in multiple kernel learning. In ICML’07. [3] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. JMLR, 9:2491–2521, 2008. • 4 benchmark datasets: breast-cancer, heart, thyroid, and titanic (downloaded from http://ida.first.fraunhofer.de/projects/bench/) • Gaussian kernels + polynomial kernels • 100, 140, 60, 40 kernels for corresponding datasets, respectively • Compared with convex optimization based MKL: GMKL[2] and SMKL[3] 17

Experiments - Toy example

Experiments - Benchmark datasets (a) Breast-Cancer: [69.64 ~ 75.23] (b) Heart: [79.71 ~ 84.05] (d) Titanic: [76.02 ~ 77.58]) (c) Thyroid: [95.20 ~ 95.80]

Experiments - Benchmark datasets (a) Breast-Cancer (b) Heart (d) Titanic (c) Thyroid

Conclusion • Biconvex optimization for MKL • Hide the kernel coefficient constraints (non-negative and Lp (p≥1) norm) in the dual without explicit consideration. • Easy to optimize, fast to converge, lower computational time but similar performance as traditional convex optimization based MKL

Thank you !!!

AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach

AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach

Presentation Transcript

Ch 2. Getting Started with the Kernel

Linux Kernel Crash Dumps

Multiple Intelligence Approach

Rakesh Babu 200402007 Advisors : Prof C. V. Jawahar, IIIT Hyderabad

The Pyramid Match Kernel and Its Improvement

Greedy Unsupervised Multiple Kernel Learning

Multiple Kernel Learning

A Bayesian Approach to Localized Multi-Kernel Learning Using the Relevance Vector Machine

Lecture 17: Supervised Learning Recap

Security Strategies in Linux Platforms and Applications Lesson 10

Lecture 7. Kernel Smoothing Methods

Measure Independence in Kernel Space

Parallelizing the GAP Kernel

Kernel module programming and debugging

Kernel Timers

IS3440 Linux Security Unit 7 Securing the Linux Kernel

Micro-kernel

Kernel-assisted MPI Communication on Multi-core Clusters

Kernel Methods: the Emergence of a Well-founded Machine Learning

Machine Learning and Bioinformatics 機器學習與生物資訊學

Comparing Kernel-based Learning Methods for Face Recognition

Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems