Developments in Bayesian Priors

Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16th 2005

Plan • Probability • Frequentist • Bayesian • Bayes Theorem • Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior • Fisher Information • Reference Priors: Demortier Roger Barlow: Developments in Bayesian Priors

Probability Probability as limit of frequency P(A)= Limit NA/Ntotal Usual definition taught to students Makes sense Works well most of the time- But not all Roger Barlow: Developments in Bayesian Priors

Frequentist probability “It will probably rain tomorrow.” “ Mt=174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.” Roger Barlow: Developments in Bayesian Priors

Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns) Roger Barlow: Developments in Bayesian Priors

Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: • Be aware of the limits and pitfalls of both • Always be aware which you’re using Roger Barlow: Developments in Bayesian Priors

Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(Mtop), P(MH), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior? Roger Barlow: Developments in Bayesian Priors

Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a2) , P(ln a) , P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors Roger Barlow: Developments in Bayesian Priors

Example – Le Diberder Sad Story Fitting CKM angle αfrom B 6 observables 3 amplitudes: 6 unknown parameters (magnitudes, phases) αis the fundamentally interesting one Roger Barlow: Developments in Bayesian Priors

Results Frequentist Bayesian Set one phase to zero Uniform priors in other two phases and 3 magnitudes Roger Barlow: Developments in Bayesian Priors

More Results Bayesian Parametrise Tree and Penguin amplitudes Bayesian 3 Amplitudes: 3 real parts, 3 Imaginary parts Roger Barlow: Developments in Bayesian Priors

Interpretation • B shows same (mis)behaviour • Removing all experimental info gives similar P(α) • The curse of high dimensions is at work Uniformity in x,y,z makes P(r) peak at large r This result is not robust under changes of prior Roger Barlow: Developments in Bayesian Priors

Example - Heinrich CDF statistics group looking at problem of estimating signal cross section S in presence of background and efficiency. N= εS+b Efficiency and Background from separate calibration experiments (sidebands or MC). Scaling factors κ, ω are known. Everything done using Bayesian methods with uniform priors and Poisson statistics formula. Calibration experiments use uniform prior for εand for b, yielding posteriors used for S P(N|S)=(1/N!)∫∫e-(εS+b) (εS+b )N P(ε) P(b) dε db Check coverage – all fine Roger Barlow: Developments in Bayesian Priors

But it all goes pear shaped.. If particle decays in several channels Hγγ H τ+τ- Hbb Each channel with different b and ε: total 2N+1 parameters, 2N+1 experiments Heavy undercoverage! e.g. with 4 channels, all ε=25±10%, b=0.75±0.25 For s≈10 get ’90% upper limit’ above s in only 80% of cases 100% 90% 10 20 S Roger Barlow: Developments in Bayesian Priors

The curse strikes again Uniform prior in ε: fine Uniform prior in ε1, ε2… εN • εN-1 prior in total ε Prejudice in favour of high efficiency Signal size downgraded Roger Barlow: Developments in Bayesian Priors

Happy ending Effect avoided by using Jeffreys’ Priors instead of uniform priors for εand b Not uniform but like 1/ε, 1/b Not entirely realistic but interesting Uniform prior in S is not a problem – but maybe should consider 1/√S? Coverage (a very frequentist concept) is a useful tool for Bayesians Roger Barlow: Developments in Bayesian Priors

Fisher Information An informative experiment is one for which a measurement of x will give precise information about the parameter a. Quantify: I(a)= -<2 ln L/a2> (Second derivative – curvature) P(x,a): everything P(x)|a is the pdf P(a)|x is the likelihood L(a) Roger Barlow: Developments in Bayesian Priors

Jeffreys’ Prior A prior may be uniform in a – but if I(a) depends on a it’s still not ‘flat’: special values of a give better measurements Transform aa’ such that I(a’) is constant. Then choose a uniform prior • location parameter – uniform prior OK • scale parameter – a’ is ln a. prior 1/a • Poisson mean – prior 1/√a Roger Barlow: Developments in Bayesian Priors

Objective Prior? Jeffreys called this an ‘objective’ prior as opposed to ‘subjective’ or straight guesswork, but not everyone was convinced For statisticians ‘flat prior’ means Jeffreys prior. For physicists it means uniform prior Prior depends on likelihood. Your ‘prior belief’ P(MH) (or whatever) depends on the analysis Equivalent to a prior proportional to √I Roger Barlow: Developments in Bayesian Priors

Reference Priors (Demortier) 4 steps • Intrinsic Discrepancy Between two PDFs δ{P1(z),P2(z)}=Min{∫P1(z)ln(P1(z)/P2(z)) dz, ∫P2(z)ln(P2(z)/P1(z))dz} Sensible measure of difference δ=0 iff P1(z) & P2(z) are the same, else +ve Invariant under all transformations of z Roger Barlow: Developments in Bayesian Priors

Reference Priors (2) 2) Expected Intrinsic Information Measurement M: x is sampled from p(x|a) Parameter a has a prior p(a) Joint distribution p(x,a)=p(x|a) p(a) Marginal distribution p(x)=∫p(x|a) p(a) da I(p(a),M)=δ{p(x,a),p(x)p(a)} Depends on (i) x-a relationship and (ii) breadth of p(a) Expected Intrinsic (Shannon) Information from measurement M about parameter a Roger Barlow: Developments in Bayesian Priors

Reference Priors (3) 3) Missing information Measurement Mk – k samples of x Enough measurements fix a completely Limitk∞ I(p(a),Mk) is the difference between knowledge encapsulated in prior p(a) and complete knowledge of a. Hence Missing Information given p(a). Roger Barlow: Developments in Bayesian Priors

Reference Priors(4) 4) Family of priors P (e.g. Fourier series, polynomials, histogram). p(a)P Ignorance principle: choose the least informative (dumbest) prior in the family: the one for which the missing information Limitk∞ I(p(a),Mk) is largest. Technical difficulties in taking k limit and integrating over infinite range of a Roger Barlow: Developments in Bayesian Priors

Family of Priors (Google) Roger Barlow: Developments in Bayesian Priors

Reference Priors Do not represent subjective belief – in fact the opposite (like a jury selection). Allow the most input to come from the data. Formal consensus practitioners can use to arrive at sensible posterior Depend on measurement p(x|a) – cf Jeffreys Also require family of P of possible priors May be improper but this doesn’t matter (do not represent…). For 1 parameter (if measurement is asymptoticallly Gaussian, which the CLT usually secures) give Jeffreys prior But can also (unlike Jeffreys) work for several parameters Roger Barlow: Developments in Bayesian Priors

Summary • Probability • Frequentist • Bayesian • Bayes Theorem • Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior • Fisher Information • Reference Priors: Demortier Roger Barlow: Developments in Bayesian Priors

Developments in Bayesian Priors

Developments in Bayesian Priors

Presentation Transcript

Lecture 11. Bayesian Regression with conjugate and non-conjugate priors

Modeling Heterogeneity in Discrete Choice: Recent Developments and Contrasts with Bayesian Estimation

Bayesian

Inference in Bayesian Networks

Learning In Bayesian Networks

Inference in Bayesian Networks

Inference in Bayesian Nets

Bayesian and non-Bayesian Learning in Games

Learning in Bayesian Networks

Upper Limits and Priors

Bayesian fMRI analysis with Spatial Basis Function Priors

Biospheric Models as Priors

Inference in Bayesian Networks

Bayesian methods, priors and Gaussian processes

Bayesian Inference in fMRI

About priors

Bayesian methods, priors and Gaussian processes

Bayesian and non-Bayesian Learning in Games

Best Vegan Catering in Melbourne – Priors Catering