270 likes | 293 Views
CS 489/698 : Intro to ML. Lecture 0 8 : Kernels. Announcement. Final exam: Dec 15 , 2017, Friday, 12:30 – 3:00 pm Room TBA Project proposal due tonight ICLR challenge. Outline. Feature map Kernels The Kernel Trick Advanced. XOR revisited. Quadratic classifier. Weights
E N D
CS489/698: IntrotoML Lecture 08: Kernels Yao-Liang Yu
Announcement • Finalexam:Dec15,2017,Friday,12:30–3:00pm • RoomTBA • Projectproposalduetonight • ICLRchallenge Yao-Liang Yu
Outline • Feature map • Kernels • The Kernel Trick • Advanced Yao-Liang Yu
XOR revisited Yao-Liang Yu
Quadratic classifier Weights (to be learned) Yao-Liang Yu
The power of lifting Rd Rd*d+d+1 Feature map Yao-Liang Yu
Example Yao-Liang Yu
Doesitwork? Yao-Liang Yu
Curse of dimensionality? • This is still computable in O(d)! computation in this space now But, all we need is the dot product !!! Yao-Liang Yu
Feature transform • NN: learn ϕ simultaneously with w • Here: choose a nonlinear ϕ so that for some f : RR save computation Yao-Liang Yu
Outline • Feature map • Kernels • The Kernel Trick • Advanced Yao-Liang Yu
Reverse engineering • Start with some function , s.t. exists feature transform ϕ with • As long as k is efficiently computable, don’t care the dim of ϕ (could be infinite!) • Such k is called a (reproducing) kernel. Yao-Liang Yu
Examples • Polynomial kernel • Gaussian Kernel • Laplace Kernel • Matérn Kernel Yao-Liang Yu
Verifying a kernel For any n, for any x1, x2, …, xn, the kernel matrix K with is symmetric and positive semidefinite( ) • Symmetric: Kij = Kji • Positive semidefinite (PSD): for all Yao-Liang Yu
Kernel calculus • If k is a kernel, so is λk for any λ ≥ 0 • If k1 and k2 are kernels, so is k1+k2 • If k1 and k2 are kernels, so is k1k2 Yao-Liang Yu
Outline • Feature map • Kernels • The Kernel Trick • Advanced Yao-Liang Yu
Kernel SVM (dual) With α, but ϕis implicit… Yao-Liang Yu
Does it work? Yao-Liang Yu
Testing • Given test sample x’, how to perform testing? No explicit access to ϕ, again! dual variables kernel training set Yao-Liang Yu
Tradeoff • Previously: training O(nd), test O(d) • Kernel: training O(n2d), test O(nd) • Nice to avoid explicit dependence on h (could be inf) • But if n is also large…(maybe later) Yao-Liang Yu
Learning the kernel (Lanckriet et al.’04) • Nonnegative combination of t pre-selected kernels, with coefficients ζ simultaneously learned Yao-Liang Yu
Logistic regression revisited Representer Theorem (Wabha, Schölkopf, Herbrich, Smola, Dinuzzo, …). The optimal w has the following form: kernelize Yao-Liang Yu
Kernel Logistic Regression (primal) uncertain predictions get bigger weight Yao-Liang Yu
Outline • Feature map • Kernels • The Kernel Trick • Advanced Yao-Liang Yu
Universal approximation (Micchelli, Xu, Zhang’06) Universal kernel. For any compact set Z, for any continuous function f : Z R, for any ε > 0, there exist x1, x2, …, xnin Z and α1,α2,…,αn in R such that kernel methods decision boundary Example. The Gaussian kernel. Yao-Liang Yu
Kernel mean embedding (Smola, Song, Gretton, Schölkopf, …) • Characteristickernel: the above mapping is 1-1 • Completely preserve the information in the distribution P • Lots of applications feature map of some kernel Yao-Liang Yu
Questions? Yao-Liang Yu