290 likes | 305 Views
This lecture covers the topics of Gaussian process regression, learning hyperparameters, automatic relevance determination, Gaussian processes for classification, and support vector machines. The lecture also discusses the course projects and paper presentations.
E N D
CS 59000 Statistical Machine learningLecture 16 Yuan (Alan) Qi Purdue CS Oct. 23 2008
Outline Information about paper presentation and course projects Review of Gaussian processe regression Learning hyperparameters Automatic Relevance Determination GP for classification Support Vector Machine
Course Projects Second last week: Paper presentation. Last week: Project presentation 21 Registered students. 6 groups: 3-4 persons per group, 20 minutes per group. 5 mins questions.
Paper Presentation Each group presents one recent paper from top conferences/journals on machine learning or bioinformatics or computer vision, e.g., NIPS, ICML, UAI, RECOMB, ISMB, JMLR. Up to your choice Format: Define problem to solve and describe challenges The algorithm/model in a nutshell. Highlight the essence Results Discussion: Strength and weakness of this paper
Project Topics Anything related to the course materials: new methods, theoretical proofs, and applications. Novelty is appreciated. New algorithms or applications, proof for unanswered questions.
Review: Gaussian Process for Regression Likelihood: Prior: Marginal distribution:
Predictive Distribution is a Gaussian distribution with mean and variance:
Computational Complexity GP prediction for a new data point: GP: O(N3) where N is number of data points Basis function model: O(M3) where M is the dimension of the feature expansion When N is large: computationally expensive. Sparsification: make prediction based on only a few data points (essentially make N small)
Learning Hyperparameters Empirical Bayes Methods
Automatic Relevance Determination Consider two-dimensional problems: Maximizing the marginal likelihood will make certain small, reducing its relevance to prediction.
Gaussian Processes for Classification Likelihood: GP Prior: Covariance function:
Predictive Distribution No analytical solution. Approximate this integration: Laplace’s method Variational Bayes Expectation propagation
Laplace’s method for GP Classification (2) Taylor expansion:
Laplace’s method for GP Classification (3) Newton-Raphson update:
Laplace’s method for GP Classification (4) Gaussian approximation:
Laplace’s method for GP Classification (4) Question: How to get the mean and the variance above?
Support Vector Machines Support Vector Machines: motivated by statistical learning theory. Maximum margin classifiers Margin: the smallest distance between the decision boundary and any of the samples
Distance of Data Point to Hyperplace Consider data points that are correctly classified. The distance of a data point to the hyperplace:
Maximizing Margin Since scaling w and b together will not change the above ratio, we set In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.
Reformulating Optimization Problem Quadratic programming: Subject to
Dual Variables Setting derivatives over L to zero:
Computational Complexity Quadratic programming: When Dimension < Number of data points, Solving the Dual problem is more costly. Dual representation allows the use of kernels