1 / 20

Explicit Feature Methods for Accelerated Kernel Learning

Explicit Feature Methods for Accelerated Kernel Learning. Purushottam Kar. Quick Motivation. Kernel Algorithms (SVM, SVR, KPCA) have output Number of “support v ectors” is typically large Provably a constant fraction of training set size* Prediction time where

steven-wall
Download Presentation

Explicit Feature Methods for Accelerated Kernel Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar

  2. Quick Motivation • Kernel Algorithms (SVM, SVR, KPCA) have output • Number of “support vectors” is typically large • Provably a constant fraction of training set size* • Prediction time where • Slow for real time applications *[Steinwart NIPS 03, Steinwart-Christmann. NIPS 08]

  3. The General Idea • Approximate kernel using explicit feature maps • Speeds up prediction time to • Speeds up training time as well

  4. Why Should Such Maps Exist? • Mercer’s theorem*Every PSD kernel has the following expansion • The series converges uniformlyto kernel • For every such that if we construct the map • then for all • Call such maps uniformly-approximate *[Mercer 09]

  5. Today’s Agenda • Some explicit feature map constructions • Randomized feature mapse.g. Translation invariant, rotation invariant • Deterministic feature mapse.g. Intersection, scale invariant • Some “fast” random feature constructions • Translation invariant, dot product • The BIG picture?

  6. Random Feature Maps Approximate recovery of kernel values with high confidence

  7. Translation Invariant Kernels* • Kernels of the form • Gaussian kernel, Laplacian kernel • Bochner’s Theorem**For every there exists a positive function • Finding : take inverse Fourier transform of • Select for *[Rahimi-Recht NIPS 07], ** Special case for , [Bochner 33]

  8. Translation Invariant Kernels • Empirical averages approximate expectations • Let • Let us assume points Then we require depends on spectrum of kernel

  9. Translation Invariant Kernels • For the RBF Kernel • If kernel offers a margin, then we should require Here where

  10. Rotation Invariant Kernels* • Kernels of the form • Polynomial kernels, exponential kernel • Schoenberg’s theorem** • Select for • Approx. : select • Similar approximation guarantees as earlier *[K.-Karnick AISTATS 12], **[Schoenberg 42]

  11. Deterministic Feature Maps Exact/approximate recovery of kernel values with certainty

  12. Intersection Kernel* • Kernel of the form • Exploit additive separabilityof the kernel • Each can be calculated in time ! • Requires preprocessing time per dimension • Prediction time almost independent of • However, deterministic and exact method – no or *[Maji-Berg-Malik CVPR 08]

  13. Scale Invariant Kernels* • Kernels of the form where • Bochner’s theorem still applies** • Involves working with • Restrict domain so that we have a Fourier series • Use only lower frequencies • Deterministic-approximate maps *[Vedaldi-Zisserman CVPR 10], **[K. 12]

  14. Fast Feature Maps Accelerated Random Feature Constructions

  15. Fast Fourier Features • Special case of • Old method: , time • Instead use where • is the Hadamard transform, is a random permutation random diagonal scaling, Gaussian and sign matrices • Prediction time , • Rows of are (non independent) Gaussian vectors • Correlations are sufficiently low • However, exponential convergence (for now) only for

  16. Fast Taylor Features • S • Earlier method , takes time • New method* takes time • Earlier method works (a bit) better for • Should be possible to improve new method as well • Crucial idea • Count Sketch** such that • Create sketch of tensor • Create independent count sketches • Can show that • Can be done in time time using FFT *[Pham-Pagh KDD 13], **[Charikaret al 02]

  17. The BIG Picture An Overview of Explicit Feature Methods

  18. Other Feature Construction Methods • Efficiently evaluable maps for efficient prediction • Fidelity to a particular kernel not an objective • Hard(er?) to give generalization guarantees • Local Deep Kernel Learning (LDKL)* • Sparse features speed up evaluation time to • Training phase more involved • Pairwise Piecewise Linear Embedding (PL2)** • Encodes (discretization of) individual and pairs of features • Construct a dimensional feature map • Features are -sparse *[Jose et al ICML 13], **[Pele et al ICML 13]

  19. A Taxonomy of Feature Methods Data Dependence Kernel Dependence

  20. Discussion The next big thing in accelerated kernel learning ?

More Related