1 / 24

Multiple Kernel Learning

Multiple Kernel Learning. Marius Kloft Technische Universität Berlin (now: Courant / Sloan-Kettering Institute). TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Machine Learning. Example Object detection in images. Aim

armand-ryan
Download Presentation

Multiple Kernel Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Kernel Learning Marius Kloft Technische Universität Berlin (now: Courant / Sloan-Kettering Institute) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA

  2. Machine Learning • Example • Object detection in images • Aim • Learning the relation of of two random quantities and • from observations • Kernel-based learning:

  3. Multiple Views / Kernels • (Lanckriet, 2004) Space Shape How to combine the views? Weightings. Color

  4. Computation of Weights? • State of the art • Sparse weights • Kernels / views are completely discarded • But why discard information? • (Lanckriet et al., Bach et al., 2004)

  5. From Vision to Reality? • State of the art: sparse method • empirically ineffective • New methodology • (Gehler et al., Noble et al., Cortes et al., Shawe-Taylor et al., etc., NIPS WS 2008, ICML, ICCV, ...) • More effective in applications • More efficient and effective in practice • learning bounds of rate up to O(M/n)

  6. Presentation of Methodology Non-sparse Multiple Kernel Learning

  7. New Methodology • (Kloft et al., NIPS 2009, • ECML 2009/10, JMLR 2011) • Generalized formulation • arbitrary loss • arbitrary norms • e.g. lp-norms: • 1-norm leads to sparsity: • Computation of weights? • Model • Kernel • Mathematical program Convex problem. Optimization over weights

  8. Theoretical Analysis • Theoretical foundations • Active research topic • NIPS workshop 2009 • We show: • Theorem (Kloft & Blanchard). The localRademacher com-plexity of MKL is bounded by1: • Corollaries (Learning Bounds) • Bound: • depends on kernel • best known rate: • (Cortes, Mohri, Rostamizadeh, ICML 2010) • (Kloft & Blanchard, NIPS 2011, JMLR 2012; • Kloft, Rückert, Bartlett, ECML 2010)

  9. Proof (Sketch) • Relating the original class with the centered class • Bounding the complexity of the centered class • Khintchine-Kahane’s and Rosenthal’s inequalities • Bounding the complexity of the original class • Relating the bound to the truncation of the spectra of the kernels

  10. Optimization • (Kloft et al.,JMLR 2011) • Implementation • In C++ (“SHOGUN Toolbox”) • Matlab/Octave/Python/R support • Runtime: ~ 1-2 orders of magnitude faster • Algorithms • Newton method • sequential, quadratically constrained programming with level set projections • block-coordinate descent alg. • Alternate • solve (P) w.r.t. w • solve (P) w.r.t. : • Until convergence (proved) • (Sketch) analytical

  11. Toy Experiment • Choice of p crucial • Optimality of p depends on true sparsity • SVM fails in the sparse scenarios • 1-norm best in the sparsest scenario only • p-norm MKL proves robust in all scenarios • Bounds • Can minimize bounds w.r.t. p • Bounds well reflect empirical results • Design • Two 50-dimensional Gaussians • mean μ1 := binary vector; • Zero-mean features irrelevant • 50 training examples • One linear kernel per feature • Six scenarios • Vary % of irrelevant features • Variance so that Bayes error constant • Results test error • 50% • 0% • 0% 44% 64% 82% 92% 98% sparsity • 0% 44% 64% 82% 92% 98% sparsity Sparsity

  12. Applications • Lesson learned: • Optimality of a method depends on true underlying sparsity of problem • Applications studied: • Computer vision • Visual object recognition • Bioinformatics • Subcellular localization • Protein fold prediction • DNA Transcription start site detection • Metabolic network reconstruction non-sparsity

  13. Application Domain: Computer Vision • Visual object recognition • Aim: annotation of visual media (e.g., images) • Motivation: •  content-based image retrieval • aeroplane bicycle bird

  14. Application Domain: Computer Vision • Visual object recognition • Aim: annotation of visual media (e.g., images) • Motivation: •  content-based image retrieval • Multiple kernels • based on • Color histograms • shapes (gradients) • local features (SIFT words) • spatial features

  15. Application Domain: Computer Vision • Datasets • 1. VOC 2009 challenge • 7054 train / 6925 test images • 20 object categories • aeroplane, bicycle,… • 2. ImageCLEF2010 Challenge • 8000 train / 10000 test images • taken from Flickr • 93 concept categories • partylife, architecture, skateboard, … • 32 Kernels • 4 types: • varied over color channel combinations and spatial tilings(levels of a spatial pyramid) • one-vs.-rest classifier for each visual concept

  16. Application Domain: Computer Vision • Challenge results • Employed our approach for ImageCLEF2011 Photo Annotation challenge • achieved the winning entries in 3 categories • Preliminary results: • using BoW-S only gives worse results → BoW-S alone is not sufficient • (Binder, Kloft, et al., 2010, 2011)

  17. Application Domain: Computer Vision • Experiment • Disagreement of single-kernel classifiers • BoW-S kernels induce more or less the same predictions • Why can MKL help? • some images better captured by certain kernels • different images may have different kernels that capture them well

  18. Application Domain: Genetics • (Kloft et al., NIPS 2009, JMLR 2011) • Detection of • transcription start sites: • by means of kernels based on: • sequence alignments • distribution of nukleotides • downstream, upstream • folding properties • bindingenergies and angles • Empirical analysis • detection accuracy (AUC): • higher accuracies than sparse MKL and ARTS (small n) • ARTS = winner of international comparison of 19 models • (Abeel et al., 2009)

  19. Application Domain: Genetics • (Kloft et al., NIPS 2009, JMLR 2011) • Theoretical analysis • impact of lp-Norm on bound • confirms experimental results: • stronger theoretical guarantees for proposed approach (p>1) • empirical and theoretical results approximately equal for • Empirical analysis • detection accuracy (AUC): • higher accuracies than sparse MKL and ARTS (small n) • ARTS = winner of international comparison of 19 models • (Abeel et al., 2009)

  20. Application Domain: Pharmacology • Results • Accuracy: • 1-norm MKL and SVM on par • p-norm MKL performs best • 6% higher accuracy than baselines • Protein Fold Prediction • Prediction of fold class of protein • Fold class related to protein’s function • e.g., important for drug design • Data set and kernels from Y. Ying • 27 fold classes • Fixed train and test sets • 12 biologically inspired kernels • e.g. hydrophobicity, polarity, van-der-Waals volume

  21. Further Applications • Computer vision • Visual object recognition • Bioinformatics • Subcellular localization • Protein fold prediction • DNA Transcription splice site detection • Metabolic network reconstruction non-sparsity

  22. Conclusion: Non-sparse Multiple Kernel Learning Visual Object Recognition winner of ImageCLEF 2011 Photo-annotation Challenge Computational Biology gene detector competitive to winner of int. comparison Appli- cations Training with >100,000 data points and >1000 Kernels Sharp learning bounds

  23. Thank you for your attention • I will be happy to answer any additional questions

  24. References • Abeel, Van de Peer, Saeys (2009).Toward a gold standard for promoter prediction evaluation. Bioinformatics. • Bach (2008).Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine Learning Research (JMLR). • Kloft, Brefeld, Laskov, and Sonnenburg (2008).Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning. • Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009).Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009). • Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML. • Kloft, Blanchard (2011).The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2011). • Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR),12(Mar):953-997. • Kloftand Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502. • Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research (JMLR).

More Related