1 / 24

Multiple Kernel Learning

Multiple Kernel Learning. Marius Kloft Technische Universität Berlin Korea University. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Machine Learning. Example Object detection in images. Aim

mircea
Download Presentation

Multiple Kernel Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Kernel Learning Marius Kloft Technische Universität Berlin Korea University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA

  2. Machine Learning • Example • Object detection in images • Aim • Learning the relation of of two random quantities and • from observations • Kernel-based learning:

  3. Multiple Views / Kernels • (Lanckriet, 2004) Space Shape How to combine the views? Weightings. Color

  4. Computation of Weights? • State of the art • Sparse weights • Kernels / views are completely discarded • But why discard information? • (Bach, 2008)

  5. From Vision to Reality? • State of the art: sparse method • empirically ineffective • My dissertation: new methodology • established as a standard • (Gehler et al., Noble et al., Shawe-Taylor et al., NIPS 2008) • Effective in Applications • More efficient and effective in practice • Breakthrough at lear-ning bounds: O(M/n)

  6. Presentation of Methodology Non-sparse Multiple Kernel Learning

  7. New Methodology • (Kloft et al.,ECML • 2010, JMLR 2011) • Generalized formulation • arbitrary loss • arbitrary norms • e.g. lp-norms: • 1-norm leads to sparsity: • Computation of weights? • Model • Kernel • Mathematical program Convex problem. Optimization over weights

  8. Theoretical Analysis • Theoretical foundations • Active research topic • NIPS workshop 2010 • We show: • Theorem (Kloft & Blanchard). The local Rademacher complexity of MKL is bounded by: • Corollaries (Learning Bounds) • Upper bound with rate • best known rate: • Generally • for , improvement of two orders of magnitude • (Cortes et al., ICML 2010) • (Kloft & Blanchard, NIPS 2011, JMLR 2012)

  9. Proof (Sketch) • Relating the original class with the centered class • Bounding the complexity of the centered class • Khintchine-Kahane’s and Rosenthal’s inequalities • Bounding the complexity of the original class • Relating the bound to the truncation of the spectra of the kernels

  10. Optimization • (Kloft et al.,JMLR 2011) • Implementation • In C++ (“SHOGUN Toolbox”) • Matlab/Octave/Python/R support • Runtime: ~ 1-2 orders of magnitude faster • Algorithms • Newton method • sequential, quadratically constrained programming with level set projections • block-coordinate descent alg. • Alternate • solve (P) w.r.t. w • solve (P) w.r.t. : • Until convergence (proved) • (Sketch) analytical

  11. Toy Experiment • Choice of p crucial • Optimality of p depends on true sparsity • SVM fails in the sparse scenarios • 1-norm best in the sparsest scenario only • p-norm MKL proves robust in all scenarios • Bounds • Can minimize bounds w.r.t. p • Bounds well reflect empirical results • Design • Two 50-dimensional Gaussians • mean μ1 on simplex, • Zero-mean features irrelevant • 50 training examples • One linear kernel per feature • Six scenarios • Vary % of irrelevant features • Variance so that Bayes error constant • Results test error • 50% • 0% • 0% 44% 64% 82% 92% 98% sparsity • 0% 44% 64% 82% 92% 98% sparsity Sparsity

  12. Applications • Lesson learned: • Optimality of a method depends on true underlying sparsity of problem • Applications studied: • Computer vision • Visual object recognition • Bioinformatics • Subcellular localization • Protein fold prediction • DNA Transcription start site detection • Metabolic network reconstruction non-sparsity

  13. Application Domain: Computer Vision • Visual object recognition • Aim: annotation of visual media (e.g., images) • Motivation: •  content-based image retrieval • aeroplane bicycle bird

  14. Application Domain: Computer Vision • Visual object recognition • Aim: annotation of visual media (e.g., images) • Motivation: •  content-based image retrieval • Multiple kernels • based on • Color histograms • shapes (gradients) • local features (SIFT words) • spatial features

  15. Application Domain: Computer Vision • Datasets • 1. VOC 2009 challenge • 7054 train / 6925 test images • 20 object categories • aeroplane, bicycle,… • 2. ImageCLEF2010 Challenge • 8000 train / 10000 test images • taken from Flickr • 93 concept categories • partylife, architecture, skateboard, … • 32 Kernels • 4 types: • varied over color channel combinations and spatial tilings(levels of a spatial pyramid) • one-vs.-rest classifier for each visual concept

  16. Application Domain: Computer Vision • Challenge results • Employed our approach for ImageCLEF2011 Photo Annotation challenge • achieved the winning entries in 3 categories! • Preliminary results: • using BoW-S only gives worse results → BoW-S alone is not sufficient • (Binder et al., 2010,2011)

  17. Application Domain: Computer Vision • Experiment • Disagreement of single-kernel classifiers • BoW-S kernels induce more or less the same predictions • Why can MKL help? • some images better captured by certain kernels • Different images may have different kernels that capture them well

  18. Application Domain: Genetics • (Kloft et al., NIPS 2009, JMLR 2011) • Detection of • transcription start sites: • by means of kernels based on: • sequence alignments • distribution of nukleotides • downstream, upstream • folding properties • bindingenergies and angles • Empirical analysis • detection accuracy (AUC): • higher accuracies than sparse MKL and ARTS • ARTS winner of international comparison of 19 models • (Abeel et al., 2009)

  19. Application Domain: Genetics • (Kloft et al., NIPS 2009, JMLR 2011) • Theoretical analysis • impact of lp-Norm on bound • confirms experimental results: • stronger theoretical guarantees for proposed approach (p>1) • empirical and theoretical results approximately equal for • Empirical analysis • detection accuracy (AUC): • higher accuracies than sparse MKL and ARTS • ARTS winner of international comparison of 19 models • (Abeel et al., 2009)

  20. Application Domain: Pharmacology • Results • Accuracy: • 1-norm MKL and SVM on par • p-norm MKL performs best • 6% higher accuracy than baselines • Protein Fold Prediction • Prediction of fold class of protein • Fold class related to protein’s function • e.g., important for drug design • Data set and kernels from Y. Ying • 27 fold classes • Fixed train and test sets • 12 biologically inspired kernels • e.g. hydrophobicity, polarity, van-der-Waals volume

  21. Further Applications • Computer vision • Visual object recognition • Bioinformatics • Subcellular localization • Protein fold prediction • DNA Transcription splice site detection • Metabolic network reconstruction non-sparsity

  22. Conclusion: Non-sparse Multiple Kernel Learning Visual Object Recognition established standard: winner of ImageCLEF 2011 Challenge Computational Biology More accurate gene detector than winner of int. comparison Appli- cations Training with >100,000 data points and >1000 Kernels Sharp learning bounds

  23. Thank you for your attention. • I will be pleased to answer any additional questions.

  24. References • Abeel, Van de Peer, Saeys (2009).Toward a gold standard for promoter prediction evaluation. Bioinformatics. • Bach (2008).Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine Learning Research (JMLR). • Kloft, Brefeld, Laskov, and Sonnenburg (2008).Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning. • Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009).Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009). • Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML. • Kloft, Blanchard (2011).The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2011). • Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR),12(Mar):953-997. • Kloftand Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502. • Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research (JMLR).

More Related