280 likes | 498 Views
Developments in Bayesian Priors. Roger Barlow Manchester IoP meeting November 16 th 2005. Plan. Probability Frequentist Bayesian Bayes Theorem Priors Prior pitfalls (1): Le Diberder Prior pitfalls (2): Heinrich Jeffreys’ Prior Fisher Information Reference Priors: Demortier.
E N D
Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16th 2005
Plan • Probability • Frequentist • Bayesian • Bayes Theorem • Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior • Fisher Information • Reference Priors: Demortier Roger Barlow: Developments in Bayesian Priors
Probability Probability as limit of frequency P(A)= Limit NA/Ntotal Usual definition taught to students Makes sense Works well most of the time- But not all Roger Barlow: Developments in Bayesian Priors
Frequentist probability “It will probably rain tomorrow.” “ Mt=174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.” Roger Barlow: Developments in Bayesian Priors
Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns) Roger Barlow: Developments in Bayesian Priors
Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: • Be aware of the limits and pitfalls of both • Always be aware which you’re using Roger Barlow: Developments in Bayesian Priors
Bayes Theorem (1763) P(A|B) P(B) = P(A and B) = P(B|A) P(A) P(A|B)=P(B|A) P(A) P(B) Frequentist use eg Čerenkov counter P( | signal)=P(signal | ) P() / P(signal) Bayesian use P(theory |data) = P(data | theory) P(theory) P(data) Roger Barlow: Developments in Bayesian Priors
Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(Mtop), P(MH), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior? Roger Barlow: Developments in Bayesian Priors
Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a2) , P(ln a) , P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors Roger Barlow: Developments in Bayesian Priors
Example – Le Diberder Sad Story Fitting CKM angle αfrom B 6 observables 3 amplitudes: 6 unknown parameters (magnitudes, phases) αis the fundamentally interesting one Roger Barlow: Developments in Bayesian Priors
Results Frequentist Bayesian Set one phase to zero Uniform priors in other two phases and 3 magnitudes Roger Barlow: Developments in Bayesian Priors
More Results Bayesian Parametrise Tree and Penguin amplitudes Bayesian 3 Amplitudes: 3 real parts, 3 Imaginary parts Roger Barlow: Developments in Bayesian Priors
Interpretation • B shows same (mis)behaviour • Removing all experimental info gives similar P(α) • The curse of high dimensions is at work Uniformity in x,y,z makes P(r) peak at large r This result is not robust under changes of prior Roger Barlow: Developments in Bayesian Priors
Example - Heinrich CDF statistics group looking at problem of estimating signal cross section S in presence of background and efficiency. N= εS+b Efficiency and Background from separate calibration experiments (sidebands or MC). Scaling factors κ, ω are known. Everything done using Bayesian methods with uniform priors and Poisson statistics formula. Calibration experiments use uniform prior for εand for b, yielding posteriors used for S P(N|S)=(1/N!)∫∫e-(εS+b) (εS+b )N P(ε) P(b) dε db Check coverage – all fine Roger Barlow: Developments in Bayesian Priors
But it all goes pear shaped.. If particle decays in several channels Hγγ H τ+τ- Hbb Each channel with different b and ε: total 2N+1 parameters, 2N+1 experiments Heavy undercoverage! e.g. with 4 channels, all ε=25±10%, b=0.75±0.25 For s≈10 get ’90% upper limit’ above s in only 80% of cases 100% 90% 10 20 S Roger Barlow: Developments in Bayesian Priors
The curse strikes again Uniform prior in ε: fine Uniform prior in ε1, ε2… εN • εN-1 prior in total ε Prejudice in favour of high efficiency Signal size downgraded Roger Barlow: Developments in Bayesian Priors
Happy ending Effect avoided by using Jeffreys’ Priors instead of uniform priors for εand b Not uniform but like 1/ε, 1/b Not entirely realistic but interesting Uniform prior in S is not a problem – but maybe should consider 1/√S? Coverage (a very frequentist concept) is a useful tool for Bayesians Roger Barlow: Developments in Bayesian Priors
Fisher Information An informative experiment is one for which a measurement of x will give precise information about the parameter a. Quantify: I(a)= -<2 ln L/a2> (Second derivative – curvature) P(x,a): everything P(x)|a is the pdf P(a)|x is the likelihood L(a) Roger Barlow: Developments in Bayesian Priors
Jeffreys’ Prior A prior may be uniform in a – but if I(a) depends on a it’s still not ‘flat’: special values of a give better measurements Transform aa’ such that I(a’) is constant. Then choose a uniform prior • location parameter – uniform prior OK • scale parameter – a’ is ln a. prior 1/a • Poisson mean – prior 1/√a Roger Barlow: Developments in Bayesian Priors
Objective Prior? Jeffreys called this an ‘objective’ prior as opposed to ‘subjective’ or straight guesswork, but not everyone was convinced For statisticians ‘flat prior’ means Jeffreys prior. For physicists it means uniform prior Prior depends on likelihood. Your ‘prior belief’ P(MH) (or whatever) depends on the analysis Equivalent to a prior proportional to √I Roger Barlow: Developments in Bayesian Priors
Reference Priors (Demortier) 4 steps • Intrinsic Discrepancy Between two PDFs δ{P1(z),P2(z)}=Min{∫P1(z)ln(P1(z)/P2(z)) dz, ∫P2(z)ln(P2(z)/P1(z))dz} Sensible measure of difference δ=0 iff P1(z) & P2(z) are the same, else +ve Invariant under all transformations of z Roger Barlow: Developments in Bayesian Priors
Reference Priors (2) 2) Expected Intrinsic Information Measurement M: x is sampled from p(x|a) Parameter a has a prior p(a) Joint distribution p(x,a)=p(x|a) p(a) Marginal distribution p(x)=∫p(x|a) p(a) da I(p(a),M)=δ{p(x,a),p(x)p(a)} Depends on (i) x-a relationship and (ii) breadth of p(a) Expected Intrinsic (Shannon) Information from measurement M about parameter a Roger Barlow: Developments in Bayesian Priors
Reference Priors (3) 3) Missing information Measurement Mk – k samples of x Enough measurements fix a completely Limitk∞ I(p(a),Mk) is the difference between knowledge encapsulated in prior p(a) and complete knowledge of a. Hence Missing Information given p(a). Roger Barlow: Developments in Bayesian Priors
Reference Priors(4) 4) Family of priors P (e.g. Fourier series, polynomials, histogram). p(a)P Ignorance principle: choose the least informative (dumbest) prior in the family: the one for which the missing information Limitk∞ I(p(a),Mk) is largest. Technical difficulties in taking k limit and integrating over infinite range of a Roger Barlow: Developments in Bayesian Priors
Family of Priors (Google) Roger Barlow: Developments in Bayesian Priors
Reference Priors Do not represent subjective belief – in fact the opposite (like a jury selection). Allow the most input to come from the data. Formal consensus practitioners can use to arrive at sensible posterior Depend on measurement p(x|a) – cf Jeffreys Also require family of P of possible priors May be improper but this doesn’t matter (do not represent…). For 1 parameter (if measurement is asymptoticallly Gaussian, which the CLT usually secures) give Jeffreys prior But can also (unlike Jeffreys) work for several parameters Roger Barlow: Developments in Bayesian Priors
Summary • Probability • Frequentist • Bayesian • Bayes Theorem • Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior • Fisher Information • Reference Priors: Demortier Roger Barlow: Developments in Bayesian Priors