Frequentist versus Bayesian

Frequentist versus Bayesian

The Bayesian approach In Bayesian statistics we can associate a probability with a hypothesis, e.g., a parameter value q. Interpret probability of q as ‘degree of belief’ (subjective). Need to start with ‘prior pdf’ p(q), this reflects degree of belief about q before doing the experiment. Our experiment has data x, → likelihood functionL(x|q). Bayes’ theorem tells how our beliefs should be updated in light of the data x: Posterior pdf p(q|x) contains all our knowledge about q. Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

Case #4: Bayesian method We need to associate prior probabilities with q0 and q1, e.g., reflects ‘prior ignorance’, in any case much broader than ← based on previous measurement Putting this into Bayes’ theorem gives: posterior Q likelihood  prior Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

Bayesian method (continued) We then integrate (marginalize) p(q0, q1 | x) to find p(q0 | x): In this example we can do the integral (rare). We find Ability to marginalize over nuisance parameters is an important feature of Bayesian statistics. Glen Cowan Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

Bayesian Statistics at work: The Troublesome Extraction of the angle a Stéphane T’JAMPENS LAPP (CNRS/IN2P3 & Université de Savoie) J. Charles, A. Hocker, H. Lacker, F.R. Le Diberder, S. T’Jampens, hep-ph-0607246

D.R. Cox, Principles of Statistical Inference, CUP (2006) W.T. Eadie et al., Statistical Methods in Experimental Physics, NHP (1971) www.phystat.org Digression: Statistics Statistics tries answering a wide variety of questions  two main different! frameworks: Frequentist: probability about the data(randomness of measurements), given the model P(data|model) Hypothesis testing: given a model, assess the consistency of the data with a particular parameter value  1-CL curve (by varying the parameter value) [only repeatable events (Sampling Theory)] Bayesian: probability about the model (degree of belief), given the data P(model|data) Likelihood(data,model) Prior(model)

Bayesian Statistics in 1 slide The Bayesian approach is based on the use of inverse probability (“posterior”): Bayesian: probability about the model (degree of belief), given the data P(model|data)  Likelihood(data;model) Prior(model) Bayes’rule Cox – Principles of Statistical Inference (2006) • “it treats information derived from data (“likelihood”) as on exactly equal footing with probabilities derived from vague and unspecified sources (“prior”). The assumption that all aspects of uncertainties are directly comparable is often unacceptable.” • “nothing guarantees that my uncertainty assessment is any good for you - I'm just expressing an opinion (degree of belief). To convince you that it's a good uncertainty assessment, I need to show that the statistical model I created makes good predictions in situations where we know what the truth is, and the process of calibrating predictions against reality is inherently frequentist.” (e.g., MC simulations)

Uniform prior: model of ignorance? Cox – Principles of Statistical Inference (2006) A central problem : specifying a prior distribution for a parameter about which nothing is known  flat prior Problems: Not re-parametrization invariant (metric dependent): uniform in q is not uniform in z=cosq Favors large values too much [the prior probability for the range 0.1 to 1 is 10 times less than for 1 to 10] Flat priors in several dimensions may produce clearly unacceptable answers. In simple problems, appropriate* flat priors yield essentially same answer as non-Bayesian sampling theory. However, in other situations, particularly those involving more than two parameters, ignorance priors lead to different and entirely unacceptable answers. * (uniform prior for scalar location parameter, Jeffreys’ prior for scalar scale parameter).

6D space Uniform Prior in Multidimensional Parameter Space Hypersphere: One knows nothing about the individual Cartesian coordinatesx,y,z… One has achieved the remarkable feat of learning something about the radius of the hypersphere, whereas one knew nothing about the Cartesian coordinates and without making any experiment. What do we known about the radius r =√(x^2+y^2+…) ?

Improper posterior Isospin Analysis : B→hh J. Charles et al. – hep-ph/0607246 Gronau/London (1990) MA: Modulus & Argument RI: Real & Imaginary

Isospin Analysis: removing information from B0→p0p0 No model-independent constraint on acan be inferred in this case  Information is extracted on a, which is introduced by the priors (where else?)

PHYSTAT Conferences: http://www.phystat.org Conclusion Statistics is not a science, it is mathematics (Nature will not decide for us) [You will not learn it in Physics books  go to the professional literature!] Many attempts to define “ignorance” prior to “let the data speak by themselves” but none convincing. Priors are informative. Quite generally a prior that gives results that are reasonable from various viewpoints for a single parameter will have unappealing features if applied independently to many parameters. In a multiparameter space, credible Bayesian intervals generally under-cover. If the problem has some invariance properties, then the prior should have the corresponding structure. specification of priors is fraught with pitfalls (especially in high dimensions). Examine the consequences of your assumptions (metric, priors, etc.) Check for robustness: vary your assumptions Exploring the frequentist properties of the result should be strongly encouraged.

α[ππ] : B-factories status LP07

Isospin analysis : reminder √2 A+0 = √2 A(Buπ+π0) = e-iα(T+- +T00) √2 A+0 = e+iα(T+- +T00) A+- = A(Bdπ+π-) = e-iα T+- + P+- A+- = e+iα T+- + P+- √2 A00 = √2 A(Bdπ0π0) = e-iα T00 - P+- √2 A00 = e+iα T00 - P+- ΔΦ=2α ΔΦ=2αeff Im A00 • B+0 |A+0|= |A+0| A+0 A+-/√2 • B+-, C+- |A+-|,|A+-| A+-/√2 • S+-  sin(2αeff )  2-fold αeff in [0,π] α • B00, C00  |A00|,|A00| Re Closing SU(2) triangle  8-fold α A+0 A00 • S00 relative phase between A00 & A00 • Neglecting EW penguin, the amplitude of the SU(2)-related Bππmodes is : • SU(2) triangular relation : A+0 = A+-/ √2 + A00 • Same for Bρρ decay dominated by longitudinal polarized ρ (CP-even fs)

Isospin analysis : reminder • Sin(2αeff) from B (π/ρ)+ (π/ρ)- 2 solutions for αeff in [0,π] • Δα = α-αeff from SU(2) B/Bbar triangles 1 ,2 or 4 solutions for Δα (dep. on triangles closure) •  2, 4 or 8 solutions for α = αeff + Δα PiPi RhoRho RhoRho C00 but noS00 no C00/S00 C00 AND S00 Bbar B 4-fold Δα 2-foldΔα 1-fold Δα (‘plateau’) 1-fold Δα (peak) A00/A+0 A+-/√2/A+0

Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16th 2005

Plan • Probability • Frequentist • Bayesian • Bayes Theorem • Priors • Prior pitfalls (1): Le Diberder • Prior pitfalls (2): Heinrich • Jeffreys’ Prior • Fisher Information • Reference Priors: Demortier

Probability Probability as limit of frequency P(A)= Limit NA/Ntotal Usual definition taught to students Makes sense Works well most of the time- But not all

Frequentist probability “It will probably rain tomorrow.” “ Mt=174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.”

Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns)

Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: • Be aware of the limits and pitfalls of both • Always be aware which you’re using

Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(Mtop), P(MH), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior?

Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a2) , P(ln a) , P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors

Frequentist versus Bayesian