210 likes | 416 Views
Bayesian Learning & Estimation Theory. Example: For Gaussian likelihood P ( x | q ) = N ( x | , 2 ),. Objective of regression: Minimize error. E ( w ) = ½ S n ( t n - y ( x n , w ) ) 2. L =. Maximum likelihood estimation. Precision b =1/ s 2.
E N D
Example: For Gaussian likelihood P(x|q) = N(x|,2), Objective of regression: Minimize error E(w)= ½Sn( tn- y(xn,w) )2 L = Maximum likelihood estimation
Precision b =1/s 2 A probabilistic view of linear regression • Compare to error function: E(w)= ½Sn( tn- y(xn,w) )2 • Since argminwE(w)= argmaxw , regression is equivalent to ML estimation of w
Bayesian learning • View the data D and parameter q as random variables (for regression, D= (x, t) and q=w) • The data induces a distribution over the parameter: P(q|D) = P(D,q) / P(D) P(D,q) • Substituting P(D,q) = P(D |q) P(q), we obtain Bayes’ theorem: P(q|D) P(D |q) P(q) Posterior Likelihood x Prior
Bayesian prediction • Predictions (eg, predict t from x using data D) are mediated through the parameter: P(prediction|D) = q P(prediction|q ) P(q |D) dq • Maximum a posteriori (MAP) estimation: qMAP = argmaxqP(q |D) P(prediction|D) P(prediction| qMAP) • Accurate when P(q |D) is concentrated on qMAP
A probabilistic view of regularized regression • E(w) = ½Sn( tn- y(xn,w) )2+ l/2Smwm2 • Prior: w’s are IID Gaussian p(w) = Pm (1/ 2pl-1 ) exp{- l wm2 / 2 } • Since argminwE(w)= argmaxwp(t|x,w) p(w), regularized regression is equivalent to MAP estimation of w ln p(t|x,w) ln p(w)
M wm| 0,a -1 Computed using linear algebra (see textbook) m = 0 Bayesian linear regression • Likelihood: • b specifies precision of data noise • Prior: • a specifies precision of weights • Posterior: • This is an M+1 dimensional Gaussian density • Prediction:
Likelihood Prior Example: y(x) = w0 + w1x y(x) sampled from posterior Data Posterior No data 1st point 2nd point ... 20th point
Mean and one std dev of the predictive distribution Example: y(x) = w0 + w1x + … + wMxM • M = 9, a = 5x10-3: Gives a reasonable range of functions • b = 11.1: Known precision of noise
0 1 Example: y(x) = w0 + w1f1(x)+ … + wMfM(x) Gaussian basis functions:
Choosing a particular M and w seems wrong – we should hedge our bets Hand-labeled horizontal coordinate, t Cross validation reduced the training data, so the red line isn’t as accurate as it should be The red line doesn’t reveal different levels of uncertainty in predictions How are we doing on the pass sequence? • Least squares regression…
Choosing a particular M and w seems wrong – we should hedge our bets Hand-labeled horizontal coordinate, t Cross validation reduced the training data, so the red line isn’t as accurate as it should be Bayesian regression The red line doesn’t reveal different levels of uncertainty in predictions Hand-labeled horizontal coordinate, t How are we doing on the pass sequence?
Estimation theory • Provided with a predictive distribution p(t|x), how do we estimate a single value for t? • Example: In the pass sequence, Cupid must aim at and hit the man in the white shirt, without hitting the man in the striped shirt • Define L(t,t*) as the loss incurred by estimating t* when the true value is t • Assuming p(t|x) is correct, the expected loss is E[L] = tL(t,t*) p(t|x) dt • The minimum loss estimate is found by minimizing E[L]w.r.t. t*
Squared loss • A common choice:L(t,t*) = ( t - t* )2 E[L] = t ( t - t* )2p(t|x) dt • Not appropriate for Cupid’s problem • To minimize E[L] , set its derivative to zero: dE[L]/dt* = -2t ( t - t* )p(t|x) dt =0 -2tt p(t|x)dt + t* = 0 • Minimum mean squared error (MMSE) estimate: t* = E[t|x] = tt p(t|x)dt For regression: t* = y(x,w)
Other loss functions Absolute loss Squared loss
t* e Median Mean Mean and median Absolute loss t1 t2 t3 t4 t5 t6 t7 t L = |t*-t1| +|t*-t2| + |t*-t3| + |t*-t4| + |t*-t5| + |t*-t6| + |t*-t7| • Consider moving t*to the left by e • L decreases by 6e and increases by e • Changes in L are balanced when t* = t4 • The median of t under p(t|x) minimizes absolute loss • Important: The median is invariant to monotonic transformations of t
D-dimensional estimation • Suppose t is D-dimensional, t = (t1,…,tD) • Example: 2-dimensional tracking • Approach 1: Minimum marginal loss estimation • Find td* that minimizes tL(td,td*)p(td|x) dtd • Approach 2: Minimum joint loss estimation • Define joint loss L(t,t*) • Findt* that minimizes tL(t,t*)p(t|x) dt
t= 290 Man in white shirt is occluded Hand-labeled horizontal coordinate, t Fraction of pixels in column with intensity > 0.9 Feature, x Horizontal location Compute 1st moment: x= 224 How are we doing on the pass sequence? • Bayesian regression and estimation enables us to track the man in the striped shirt based on labeled data • Can we track the man in the white shirt? 0 320
How are we doing on the pass sequence? • Bayesian regression and estimation enables us to track the man in the striped shirt based on labeled data • Can we track the man in the white shirt? Not very well. Regression fails to identify that there really are two classes of solution Hand-labeled horizontal coordinate, t Feature, x