MSc Methods XX: YY

MSc Methods XX: YY Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592 Email: mdisney@ucl.geog.ac.uk www.geog.ucl.ac.uk/~mdisney

Lecture outline • Two parameter estimation • Some stuff • Uncertainty & linear approximations • parameter estimation, uncertainty • Practical – basic Bayesian estimation • Linear Models • parameter estimation, uncertainty • Practical – basic Bayesian estimation

Parameter estimation continued • Example: signal in the presence of background noise • Very common problem: e.g. peak of lidar return from forest canopy? Presence of a star against a background? Transitioning planet? A B 0 x See p 35-60 in Sivia & Skilling

Gaussian peak + background • Data are e.g. photon counts in a particular channel, so expect count in kth channel Nk to be where A, B are signal and background • Assume peak is Gaussian (for now), width w, centered on xo so ideal datum Dk then given by • Where n0 is constant (integration time). Unlike Nk, Dk not a whole no., so actual datum some integer close to Dk • Poisson distribution is pdf which represents this property i.e.

Aside: Poisson distribution • Poisson: prob. of N events occurring over some fixed time interval if events occur at a known rate independently of time of previous event • If expected number over a given interval is D, prob. of exactly N events • Used in discrete counting experiments, particularly cases where large number of outcomes, each of which is rare (law of rare events) e.g. • Nuclear decay • No. of calls arriving at a call centre per minute – large number arriving BUT rare from POV of general population…. See practical page for poisson_plot.py

Gaussian peak + background • So likelihood for datum Nk is • Where I includes reln. between expected counts Dk and A, B i.e. for our Gaussian model, xo, w, no are given (as is xk). • IF data are independent, then likelihood over all M data is just product of probs of individual measurements i.e. • As usual, we want posterior pdf of A, B given {Nk}, I

Gaussian peak + background • Prior? Neither A, nor B can be –ve so most naïve prior pdf is • To calc constant we need Amax, Bmax but we may assume they are large enough not to cut off posterior pdf i.e. Is effectively 0 by then • So, log of posterior • And, as before, we want A, B to maximise L • Reliability is width of posterior about that point

Gaussian peak + background • ‘Generate’ experimental data (see practical) • n0 chosen to give max expectation Dk = 100. Why do Nk > 100?

Gaussian peak + background • Posterior PDF is now 2D • Max L A=1.11, B=1.89 (actual 1.09, 1.93)

Gaussian peak + background • Changing the experimental setup? • E.g. reducing counts per bin (SNR) e.g. because of shorter integration time, lower signal threshold etc. Same signal, but data look much noisier – broader PDF Truncated at 0 – prior important

Gaussian peak + background • Changing the experimental setup? • Increasing number of bins (same count rate, but spread out over twice measurement range) Much narrower posterior PDF BUT reduction mostly in B

Gaussian peak + background • More data, so why uncertainty in A, B not reduced equally? • Data far from origin only tell you about background • Conversely – restrict range of x over which data are collected (fewer bins) it is hard to distinguish A from B (signal from noise) • Skewed & high correlation between A, B

Marginal distribution • If only interested in A then according to marginalisation rule integrate joint posterior PDF wrt B i.e. • So • See previous experimental cases….. 2 1 15 bins, ~10 counts maximum 15 bins, ~100 counts maximum

Marginal distribution • Marginal conditional • Marginal pdf: takes into account prior ignorance of B • Conditional pdf: assumes we know B e.g. via calibration • Least difference when measurements made far from A (3) • Most when data close to A (4) 4 3 7 bins, ~100 counts maximum 31 bins, ~100 counts maximum

Uncertainty Max?? • Posterior L shows reliability of parameters & we want optimal • For parameters {Xj}, with post. • Optimal {Xoj} is set of simultaneous eqns • For i = 1, 2, …. Nparams • So for log of P i.e. and for 2 parameters we want • where Sivia & Skilling (2006) Chapter 3, p 35-51

Uncertainty • To estimate reliability of best estimate we want spread of P about (Xo, Yo) • Do this using Taylor expansion i.e. • Or • So for the first three terms (to quadratic) we have • Ignore (X-Xo) and (Y-Yo) terms as expansion is about maximum Sivia & Skilling (2008) Chapter 3, p 35-51 http://en.wikipedia.org/wiki/Taylor_series

Uncertainty • So mainly concerned with quadratic terms. Rephrase via matrices • For quadratic term, Q • Where Y • Contour of Q in X-Y plane i.e. line of constant L • Orientation and eccentricity determined by A, B, C • Directions e1 and e2 are the eigenvectors of 2nd derivative matrices A, B, C e2 Q=k Yo e1 X Xo Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty • So (x,y) component of e1 and e2 given by solutions of • Where eigenvalues λ1 and λ2 are 1/k2 (& k1,2 are widths of ellipse along principal directions) • If (Xo, Yo) is maximum then λ1 and λ2< 0 • So A < 0, B < 0 and AB > C2 • So if C ≠ 0 then ellipse not aligned to axes, and how do we estimate error bars on Xo, Yo? • We can get rid of parameters we don’t want (Y for e.g.) by integrating i.e. Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty • And then use Taylor again & • So (see S&S p 46 & Appendix) • And so marginal distn. for X is just Gaussian with best estimate (mean) Xo and uncertainty (SD) • So all fine and we can calculate uncertainty……right? Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty e2 • Note AB-C2 is determinant of and is λ1 x λ2 • So if λ1or λ2 0 then AB-C20 and σX and σY∞ • Oh dear…… • So consider variance of posterior • Where μ is mean • For a 1D normal distribution this gives • For 2D case (X,Y) here • Which we have from before. Same for Y so….. e1 Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty • Consider covariance σ2XY • Which describes correlation between X and Y and if estimate of X has little/no effect on estimate of Y then • And, using Taylor as before • So in matrix notation • Where we remember that Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty • Covariance (or variance-covariance) matrix describes covariance of error terms • When C = 0, σ2XY= 0 and no correlation, and e1 and e2 aligned with axes • If C increases (relative to A, B), posterior pdf becomes more skewed and elliptical - rotated at angle ± tan-1(√A/B) Large, +ve correlation Large, -ve correlation C=0, X, Y uncorrelated After Sivia & Skilling (2008) fig 3.7 p. 48

Uncertainty • As correlation grows, if C =(AB)1/2then contours infinitely wide in one direction (except for prior bounds) • In this case σX and σY v. large (i.e. very unreliable parameter estimates) • BUT large off-diagonals in covariance matrix mean we can estimate linear combinations of parameters • For –ve covariance, posterior wide in direction Y=-mX, where m=(A/B)1/2 but narrow perpendicular to axis along Y+mX = c • i.e. lot of information aboutY+mXbut little about Y – X/m • For +vecorrelation most info. on Y-mXbut not Y + X/m After Sivia & Skilling (2008) fig 3.7 p. 48

Uncertainty • Seen the 2 param case, so what about generalisation of Taylor quadratic approximation to M params? • Remember, we want {Xoj} to maximise L, (log) posterior pdf • Rephrase in matrix form Xo i.e. for i = 1, 2, …. M we want • Extension of Taylor expansion to M variables is • So if X is an M x 1 column vector and ignoring higher terms, exponential of posterior pdf is Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty • Where is a symmetric M x M matrix of 2nd derivatives • And (X-Xo)Tis the transpose of (X-Xo) (a row vector) • So this is generalisation of Q from 2D case • And contour map from before is just a 2D slice through our now M dimensional parameter space • Constant of proportionality is Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty • So what are the implications of all of this?? • Maximum of M parameter posterior PDF is Xo & we know • Compare to 2D case & see is analogous to -1/σ2 • Can show that generalised case for covariance matrix σ2 is • Square root of diagonals (i=j) give marginal error bars and off-diagonals (i≠j) decribe correlations between parameters • So covariance matrix contains most information describing model fit AND faith we have in parameter estimates Sivia & Skilling (2008) Chapter 3, p 35-51

Uncertainty • Sivia & Skilling make the important point (p50) that inverse of diagonal elements of matrix≠ diagonal of inverse of matrix • i.e. do NOT try and estimate value / spread of one parameter in M dim case by holding all others fixed at optimal values Incorrect ‘best fit’ σii Xj • Need to include marginalisation to get correct magnitude for uncertainty • Discussion of multimodal and asymmetric posterior PDF for which Gaussian is not good approx • S&S p51…. Xoj σii Xi After Sivia & Skilling (2008) p50.

Summary • We have seen that we can express condition for best estimate of set of M parameters {Xj} very compactly as • Where jth element of is (log posterior pdf) evaluated at (X=Xo) • So this is set of simultaneous equations, which, IF they are linear i.e. • Then can use linear algebra methods to solve i.e. • This is the power (joy?) of linearity! Will see more on this later • Even if system not linear, we can often approximate as linear over some limited domain to allow linear methods to be used • If not, then we have to use other (non-linear) methods…..

Summary • Two parameter eg: Gaussian peak + background • Solve via Bayes’ T using Taylor expansion (to quadratic) • Issues over experimental setup • Integration time, number of bins, size etc. • Impact on posterior PDF • Can use linear methods to derive uncertainty estimates and explore correlation between parameters • Extend to multi-dimensional case using same method • Be careful when dealing with uncertainty • KEY: not always useful to look for summary statistics – if in doubt look at the posterior PDF – this gives full description

MSc Methods XX: YY

MSc Methods XX: YY

Presentation Transcript

Optimization Methods

Introduction to Trenchless Methods

Evacuation: Challenges, Principles and Methods of Safe Egress

Descriptive Methods & Ethical Research

PROTEINS

Approaches and Methods for Teaching A Foreign Language

Chapter 32

Integrative Mixed-Methods Research: A Workshop

Chapter 11 Instructional Methods

Programming with methods and classes

M98MC Week 4 New Media, New Audience, New Methods

Review of Methods from Prerequisite Course

AMCS/CS 340: Data Mining

Flood and Runoff estimation methods

Traditional Methods of Procurement - Tendering

Data Mining: Concepts and Techniques Cluster Analysis Li Xiong

Qualitative Research Methods for Beginning Research Students

Kernel Methods: the Emergence of a Well-founded Machine Learning

Barron’s AP #2: Methods

Empirical Methods for AI & CS

Chapter 5

MSc Methods XX: YY

MSc Methods XX: YY

Presentation Transcript

Optimization Methods

Introduction to Trenchless Methods

Evacuation: Challenges, Principles and Methods of Safe Egress

Descriptive Methods &amp; Ethical Research

PROTEINS

Approaches and Methods for Teaching A Foreign Language

Chapter 32

Integrative Mixed-Methods Research: A Workshop

Chapter 11 Instructional Methods

Programming with methods and classes

M98MC Week 4 New Media, New Audience, New Methods

Review of Methods from Prerequisite Course

AMCS/CS 340: Data Mining

Flood and Runoff estimation methods

Traditional Methods of Procurement - Tendering

Data Mining: Concepts and Techniques Cluster Analysis Li Xiong

Qualitative Research Methods for Beginning Research Students

Kernel Methods: the Emergence of a Well-founded Machine Learning

Barron’s AP #2: Methods

Empirical Methods for AI &amp; CS

Chapter 5

Descriptive Methods & Ethical Research

Empirical Methods for AI & CS