200 likes | 209 Views
Final Exam Review. CS479/679 Pattern Recognition Dr. George Bebis. Final Exam May 13, 2019 - (12:10pm – 2:10pm). Comprehensive Everything covered before Midterm exam Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm.
E N D
Final Exam Review CS479/679 Pattern RecognitionDr. George Bebis
Final ExamMay 13, 2019 - (12:10pm – 2:10pm) • Comprehensive • Everything covered before Midterm exam • Linear Discriminant Functions • Support Vector Machines • Expectation-Maximization Algorithm Case studies are included in the final exam (i.e., focus on main ideas/results, not details) • No need to memorize complicated/long equations (e.g., decision boundary equations for Gaussian distributions, Chernoff bound, etc.), they will be provided to you during the exam if needed. • Simpler/Shorter equations (e.g., Bayes rule, ML/BE equations, PCA/LDA equations, linear discriminant, gradient descent etc.)should be memorized.
Linear Discriminant Functions • General form of linear discriminant: • What is the form of the decision boundary? • The decision boundary is a hyperplane • What is the meaning of w and w0? • The orientation and location of the hyperplane are determined by w and w0 correspondingly.
Linear Discriminant Functions • What is the geometric interpretation of g(x)? • Distance of x from the decision boundary (hyperplane) – know how to prove it.
Linear Discriminant Functions • How do we estimate w and w0? • Apply learning using a set of labeled training examples • What is the effect of each training example? • Places a constraint on the solution a2 paraneter space (ɑ1, ɑ2) feature space (y1, y2) a1
How do we “learn” the parameters? • Iterative optimization – what is the main idea? • Minimize some error function J(α) iteratively: • How are the parameters updated? search direction learning rate
Methods • Gradient descent – search direction? • Newton – search direction? • Perceptron rule – error function?
Methods (cont’d) • Gradient descent • Effect of parameter initialization • Effect of learning rate • Newton • Computational requirements • Convergence • Perceptron rule • Batch vs single-sample Perceptron • Perceptron Convergence Theorem
Support Vector Machines • What is the capacity of a classifier? • How is the VC dimension related to the capacity? • What is structural risk minimization? • Find solutions that: (1) minimize the empirical risk and (2) have low VC dimension
Support Vector Machines • What is the margin of separation? How is it defined? • What is the relationship between VC dimension and margin of separation? • VC dimension is minimized by maximizing the margin of separation. support vectors
Support Vector Machines • SVM optimization problem: What is the role of these terms?
Support Vector Machines (cont’d) • SVM solution: • Are all λκnon-zero? • Soft margin classifier – tolerate “outliers” • What is the effect of “c” on the solution?
Support Vector Machines (cont’d) • Non-linear SVM – what is the main idea? • Map data to a high dimensional space h • Use a linear classifier in the new space • Computational complexities?
Support Vector Machines (cont’d) • What is the kernel trick? • Compute dot products using a kernel function K(x,y)=(x . y) d e.g., polynomial kernel:
Support Vector Machines • SVM is based on exact optimization (i.e., no local optima). • Complexity depends on the number of support vectors, not on the dimensionality of the transformed space. • Performance depends on the choice of the kernel and its parameters.
Expectation-Maximization (EM) • What is the EM algorithm? • An iterative method to perform ML estimation i.e., max p(D/ θ) • When is EM useful? • Most useful for problems where the data is incompleteor can be thought as being incomplete.
Expectation-Maximization (EM) • What are the steps of the EM algorithm? • Initialization:θ0 • Expectation Step: • Maximization Step: • Test for convergence: • Convergence properties of EM ? • Solution depends on the initial estimate θ0 • No guarantee to find global maximum but stable (i.e., no oscillations)
Expectation-Maximization (EM) • What is a Mixture of Gaussians (MoG)? • How are the MoG parameters estimated? • Introduce “hidden” variables • Use EM algorithm to estimate E[zi]
Expectation-Maximization (EM) • Can you interpret the EM steps for MoGs?
Expectation-Maximization (EM) • Can you interpret the EM steps for MoGs?