350 likes | 476 Views
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology. Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA. Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers.
E N D
Landmark-Based Speech Recognition:Spectrogram Reading,Support Vector Machines,Dynamic Bayesian Networks,and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA
Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers • Definition: Hyperplane Classifier • Minimum Classification Error Training Methods • Empirical risk • Differentiable estimates of the 0-1 loss function • Error backpropagation • Kernel Methods • Nonparametric expression of a hyperplane • Mathematical properties of a dot product • Kernel-based classifier • The implied high-dimensional space • Error backpropagation for a kernel-based classifier • Useful kernels • Polynomial kernel • RBF kernel
Hyperplane Classifier x Distance=b x x x x x x x x x x x Normal Vector w x x x x x Class Boundary (“Separatrix”): The plane wTx=b x Origin (x=0)
Empirical Risk with 0-1 Loss Function = Error Rate on Training Data
Differentiable Approximations of the 0-1 Loss Function: Hinge Loss
Differentiable Approximations of the 0-1 Loss Function: Hinge Loss
Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss
Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries x x x x x x x x More Red x x x x x Less Red x x x Less Blue More Blue
Error Backpropagation: Sigmoidal Classifier with Absolute Loss
Sigmoidal Classifier: Signal Flow Diagram Hypothesis h(x) Sigmoid input g(x) + Connection weights w w1 w3 w2 x1 x2 x3 Input x
Multilayer Perceptron Hypothesis h2(x) Sigmoid inputs g2(x) + b21 Connection weights w1 w311 w313 w312 Sigmoid outputs h1(x) Sigmoid inputs g1(x) b11 + + + b12 b13 Connection weights w1 w123 w133 w113 x1 x2 x3 Input h0(x)≡x
Output of Multilayer Perceptron is an Approximation of Posterior Probability
Polynomial Kernel: Separatrix (Boundary Between Two Classes) is a Polynomial Surface
Classification Boundaries Available from a Polynomial Kernel(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)
RBF Classifier Can Represent Any Classifier Boundary(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004) • More training corpus errors • Smoother boundary • Fewer training corpus errors • Wigglier boundary In these figures, C was adjusted, not g, but a similar effect can be achieved by setting N<<M and adjusting g.
Summary • Classifier definitions • Classifier = a function from x into y • Loss = the cost of a mistake • Risk = the expected loss • Empirical Risk = the average loss on training data • Multilayer Perceptrons • Sigmoidal classifier is similar to hyperplane classifier with sigmoidal loss function • Train using error backpropagation • With two hidden layers, can model any boundary (MLP is a “universal approximator”) • MLP output is an estimate of p(y|x) • Kernel Classifiers • Equivalent to: (1) project into f(x), (2) apply hyperplane classifier • Polynomial kernel: separatrix is polynomial surface of order d • RBF kernel: separatrix can be any surface (RBF is also a “universal approximator”) • RBF kernel: if N<M, g can adjust the “wiggliness” of the separatrix