CS 59000 Statistical Machine learning Lecture 16

CS 59000 Statistical Machine learningLecture 16 Yuan (Alan) Qi Purdue CS Oct. 23 2008

Outline Information about paper presentation and course projects Review of Gaussian processe regression Learning hyperparameters Automatic Relevance Determination GP for classification Support Vector Machine

Course Projects Second last week: Paper presentation. Last week: Project presentation 21 Registered students. 6 groups: 3-4 persons per group, 20 minutes per group. 5 mins questions.

Paper Presentation Each group presents one recent paper from top conferences/journals on machine learning or bioinformatics or computer vision, e.g., NIPS, ICML, UAI, RECOMB, ISMB, JMLR. Up to your choice Format: Define problem to solve and describe challenges The algorithm/model in a nutshell. Highlight the essence Results Discussion: Strength and weakness of this paper

Project Topics Anything related to the course materials: new methods, theoretical proofs, and applications. Novelty is appreciated. New algorithms or applications, proof for unanswered questions.

Review: Gaussian Process for Regression Likelihood: Prior: Marginal distribution:

Predictive Distribution is a Gaussian distribution with mean and variance:

Computational Complexity GP prediction for a new data point: GP: O(N3) where N is number of data points Basis function model: O(M3) where M is the dimension of the feature expansion When N is large: computationally expensive. Sparsification: make prediction based on only a few data points (essentially make N small)

Learning Hyperparameters Empirical Bayes Methods

Automatic Relevance Determination Consider two-dimensional problems: Maximizing the marginal likelihood will make certain small, reducing its relevance to prediction.

Gaussian Processes for Classification Likelihood: GP Prior: Covariance function:

Sample from GP Prior

Predictive Distribution No analytical solution. Approximate this integration: Laplace’s method Variational Bayes Expectation propagation

Laplace’s method for GP Classification (1)

Laplace’s method for GP Classification (2) Taylor expansion:

Laplace’s method for GP Classification (3) Newton-Raphson update:

Laplace’s method for GP Classification (4) Gaussian approximation:

Laplace’s method for GP Classification (4) Question: How to get the mean and the variance above?

Predictive Distribution

Example

Support Vector Machines Support Vector Machines: motivated by statistical learning theory. Maximum margin classifiers Margin: the smallest distance between the decision boundary and any of the samples

Distance of Data Point to Hyperplace Consider data points that are correctly classified. The distance of a data point to the hyperplace:

Maximizing Margin Since scaling w and b together will not change the above ratio, we set In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.

Reformulating Optimization Problem Quadratic programming: Subject to

Using Lagrange Multipliers

Dual Variables Setting derivatives over L to zero:

Dual Problem

Computational Complexity Quadratic programming: When Dimension < Number of data points, Solving the Dual problem is more costly. Dual representation allows the use of kernels

Prediction

CS 59000 Statistical Machine learning Lecture 16

CS 59000 Statistical Machine learning Lecture 16

Presentation Transcript

CS b351 Statistical Learning

CS 461: Machine Learning Lecture 9

CS 59000 Statistical Machine learning Lecture 25

CS 59000 Statistical Machine learning Lecture 3

CS 59000 Statistical Machine learning Lecture 24

CS 59000 Statistical Machine learning Lecture 13

CS 59000 Statistical Machine learning Lecture 15

CS 461: Machine Learning Lecture 4

CS 461: Machine Learning Lecture 4

CS 461: Machine Learning Lecture 2

CS 461: Machine Learning Lecture 8

CS 461: Machine Learning Lecture 7

CS 461: Machine Learning Lecture 1

CS 59000 Statistical Machine learning Lecture 7

CS 59000 Statistical Machine learning Lecture 18

CS 59000 Statistical Machine learning Lecture 6

CS 461: Machine Learning Lecture 7

CS 461: Machine Learning Lecture 1

CS 461: Machine Learning Lecture 3

CS 536: Machine Learning

CS 782 Machine Learning

CS 461: Machine Learning Lecture 2