Tutorial on Gaussian Processes DAGS ’07 Jonathan Laserson and Ben Packer

9/10/07 Tutorial onGaussian Processes DAGS ’07 Jonathan Laserson and Ben Packer

Outline • Linear Regression • Bayesian Inference Solution • Gaussian Processes • Gaussian Process Solution • Kernels • Implications

Linear Regression • Task: Predict y given x

Linear Regression • Predicting Y given X

L2 Regularized Lin Reg • Predicting Y given X

Bayesian Inference • We have P(y|w,X) and P(w) • Combine these to get P(y,w|X) • Marginalize to get P(y|X) • Same as P(y,y*|X,x*) • Joint Gaussian->Conditional Gaussian Error bars!

Gaussian Process • We saw a distribution over Y directly • Why not start from here? • Instead of choosing a prior over w and defining fw(x), put your prior over f directly • Since y = f(x) + noise, this induces a prior over y • Next… How to put a prior on f(x)

What is a random process? • It’s a prior over functions • A stochastic process is a collection of random variables, f(x), indexed by x • It is specified by giving the joint probability of every finite subset of variables f(x1), f(x2), …, f(xk) • In a consistent way!

What is a Gaussian process? • It’s a prior over functions • A stochastic process is a collection of random variables, f(x), indexed by x • It is specified by giving the joint probability of every finite subset of variables f(x1), f(x2), …, f(xk) • In a consistent way! • The joint probability of f(x1), f(x2), …, f(xk) is a multivariate Gaussian

What is a Gaussian Process? • It is specified by giving the joint probability of every finite subset of variables f(x1), f(x2), …, f(xk) • In a consistent way! • The joint probability of f(x1), f(x2), …, f(xk) is a multivariate Gaussian • Enough to specify mean and covariance functions • μ(x) = E[f(x)] • C(x,x’) = E[ (f(x)- μ(x)) (f(x’)- μ(x’)) ] • f(x1), …, f(xk) ~ N( [μ(x1) … μ(xk)], K) Ki,j = C(xi, xj) • For simplicity, we’ll assume μ(x) = 0.

Back to Linear Regression • Recall: Want to put a prior directly on f • Can use a Gaussian Process to do this • How do we choose μ and C? • Use knowledge of prior over w • w ~ N(0, σ2I) • μ(x) = E[f(x)] = E[wTx] = E[wT]x = 0 • C(x,x’) = E[ (f(x)- μ(x)) (f(x’)- μ(x’)) ] = E[f(x)f(x’)] = xTE[wwT]x’ = xT(σ2I)x’ = σ2xTx’ Can have f(x) = WTΦ(x)

Back to Linear Regression • μ(x) = 0 • C(x,x’) = σ2xTx’ • f ~ GP(μ,C) • It follows that • f(x1),f(x2),…,f(xk) ~ N(0, K) • y1,y2,…,yk ~ N(0,ν2I + K) • K = σ2XXT • Same as Least Squares Solution! • If we use a different C, we’ll have a different K

Kernels • If we use a different C, we’ll have a different K • What do these look like? • Linear • Poly • Gaussian C(x,x’) = σ2xTx’

Kernels • If we use a different C, we’ll have a different K • What do these look like? • Linear • Poly • Gaussian C(x,x’) = (1+xTx’)2

Kernels • If we use a different C, we’ll have a different K • What do these look like? • Linear • Poly • Gaussian C(x,x’) = exp{-0.5*(x-x’)2}

End

Learning a kernel • Parameterize a family of kernel functions using θ • Learn K using gradient of likelihood

GP Graphical Model

Starting point • For details, see • Rasmussen’s NIPS 2006 Tutorial • http://www.kyb.mpg.de/bs/people/carl/gpnt06.pdf • Williamson’s Gaussian Processes paper • http://www.dai.ed.ac.uk/homes/ckiw/postscript/hbtnn.ps.gz • GPs for classification (approximation) • Sparse methods • Connection to SVMs

Your thoughts…

Tutorial on Gaussian Processes DAGS ’07 Jonathan Laserson and Ben Packer

Tutorial on Gaussian Processes DAGS ’07 Jonathan Laserson and Ben Packer

Presentation Transcript

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Acid-Base Balance Interactive Tutorial

Time Series Models

HW# 2 /Tutorial # 2 WWWR Chapter 17 ID Chapter 3

Chapter 11 METAL CASTING PROCESSES

Short M ATLAB Tutorial

Rec Trac Recreation Tracking Software

Introduction to workflow technology Representation of healthcare processes in a workflow editor and their execution in

Clementine Tutorial

TT284 Tutorial 4: Starts at 19:00

Tutorial by:mozafar bag mohammadi

Jonathan Swift

HAPTER 2

A2.2SL1 Surface Processes and Landforms WEEK 3 FLUVIAL PROCESSES

POMDPs

BigSim Tutorial

X10 Tutorial x10.sf

Kerberos v5 Tutorial

Using this Tutorial

Processes and operating systems