310 likes | 404 Views
A Skeptical Bayesian at Durham. Jim Linnemann MSU AMST Workshop, Fermilab June 1, 2002. Skeptical Bayesian remarks Conference highlights Some things to work on. What do I want? Statistics is not a science; nature won’t tell me the right procedure.
E N D
A Skeptical Bayesianat Durham Jim Linnemann MSU AMST Workshop, Fermilab June 1, 2002
Skeptical Bayesian remarks • Conference highlights • Some things to work on
What do I want?Statistics is not a science; nature won’t tell me the right procedure • Correction of upper limits for systematics • understand limitations of method • Common practice—recommended in RPP “statistics” section • method sensible enough to sell—let’s try to agree? • Thanks to PDG, Glen Cowan: at last, some advice on systematics! • Default: convolve the likelihood • Mixes Bayes and Frequentist (like Cousins+Highland) • Comparison of experiments • Limit gets worse with worse resolution, background • And same answer if inputs the same! • Connection with known limits • Continuity to simple frequentist cases (b=0, =0) • coverage—if possible • Continuity of limit to error bar? (‘unified method’) • Not obvious: “5 sigma limit” to “1 sigma central” • Is under-coverage of Bayes lower limit crucial?
To Use Bayes or not? • Professional Statisticians are much more Bayes-oriented in last 20 years • Computationally possible • Philosophically coherent • (solipsistic?? Subjective Bayes…) • In HEP: want to publish result, not prior • We want to talk about P(theory|data) • But this requires prior: P(theory) • Likelihoods we can agree on! • Conclusions should be insensitive to a range of priors • Probably true, with enough data • Search limits DO depend on priors! • Hard to convince anyone of a single objective prior!!! • Unpleasant properties of naïve frequentist limits, too • Feldman-Cousins is current consensus • Systematic errors hard in frequentist framework • PDG currently recommends Bayes “smearing of likelihood” • close in spirit to Cousins-Highland mixed Frequentist-Bayesian
Why Bayesian? • Nuisance parameters are not strictly beyond frequentist methods’ reach but • Neyman construction in n-dimensions rarely used • Bayes: Natural treatment of systematics! • Unify treatment with statistical errors consistently • “degree of belief” needed for many systematics • Coherent point of view
Bayes Theorem P(|x) = P(x| ) P() / P(x| )P() d is the unknown parameter P(x|) = is the Likelihood function (fcn of ) p(data|model, vary model): NOT A PDF P(|x) = pdf of after we observe x describes posterior knowledge of P() = pdf of after we observe x describes prior knowledge of and what might that be????
Bayesians don’t own the theorem • Theorem in probability • any interpretation of probability following axioms can use it • If prior knowledge in frequencies: • use it to update knowledge of a particular measurement • Entirely within frequentist framework
Why Skeptical? • Faustian bargain • Bayes Theorem: Updates your prior beliefs on signal, not just systematics (nuisance parameters) • Inserts your beliefs where prefer only data (publication) • Not so bad if you have enough data • Must consider alternative priors to avoid solipsism • Reasonable priors lead to same conclusions • Not great in the case of upper limits • Not really independent of signal prior assumptions! • No universally accepted “objective” priors • Even Jeffrey’s metric-independent prior! Bernardo in n-dim? • “flat” not special, nor metric dependent Flat in what variable? Cross section, mass, tan , ln(tan ) ? flat in mass gives much tighter limit than flat in cross section cross section prior rapidly falls in mass: pulls towards 0!
Some Facts:Nuisance • Corrections for background uncertainty are small <15% for even extreme db/b=1 • Efficiency/ luminosity: < 20% if <30% resolution • At least quadratic • Bayes corrections larger than Cousins-Highland • Probably larger than necessary • Esp as approach discovery • Lognormal, beta, gamma agree to 5% P(0)=0 • With flat prior, don’t use Truncated Gaussian (P(0)0)
Some Facts:Signal • Bayes flat signal over-covers upper Poisson limits • But undercovers lower • Nobody’s real prior (but probably doesn’t matter!) • (s+b) (Jeffreys’ form) • Average coverage, OK, but can undercover • Really nobody’s real prior for signal (function of b!) • More complex than flat (s=sigma*L*eff) • Cheat and insert estimates, do dL, deff separately? • PROBLEM: • Differences between signal priors are of order of efficiency corrections that motivated going Bayesian! • Because we don’t have much data
Other, smaller, worries • HPD (Highest Posterior Density) limits not independent of parameterization (metric): • Stand-in for central limits, for example • P(|x) and P(|x2) don’t have equal height at equiv points • Ideology… • “if your experiment is inconclusive, ask more experts to sharpen the prior” (!) • “the creativity is in formulating the prior” • But result had better be independent of the prior • unless it is expressing a constraint, that you’re sure of! • A pain to waste time on such debates
Bayes at Durham(Michael Goldstein) • Vigorously subjective Bayesian • But not abusive, thank goodness! • “Sensitivity Analysis is at the heart of scientific Bayesianism” • How skeptical would the community as a whole have to be in order not to be convinced. • What prior gives P(hypothesis) > 0.5 • What prior gives P(hypothesis) > 0.99, etc • A modest proposal: • Many big groups have phenomenologists (+ videographers!) • get a statistician as a collaborator • as is common in clinical trials
What to work on now? Education • Look at tutorials from Durham • E.g. Barlow on systematics… • Absorbing experience from LEP • Combining data (Parke) • CLS? (Read)—should at least understand = PV(s+b)/PV(b) (Cowan, PDG Stats) PV = P Value = prob(obs), posterior, like P(2) • interpolating MC samples (Kjaer) • Understand Unfolding (Cowan, Blobel) • Quite important for combining data • And pdf fitting! (many talks at Durham)—make it easier, not harder Research • Blind Analyses • Goodness of Fit Tests • 2 is seldom best test in frequentist • Not much available in Bayesian context—prefer comparison of models • Look at Yasbley’s talk from Belle on problems in B oscillations! • And at Smith and Tovey’s, on dark matter searches: other problems
Belle questions • Unified limits (for rare decays)? • Feldman-Cousins argues for unified limits • Same probability content whether search or measurement • How important to go smoothly from limit to error limits? • the concern is undercoverage of your stated limits • Doesn’t make sense to me! >> .999 for discovery, .68 for measurement! • How to combine statistical and systematic errors • Add linearly or in quadrature • How to deal with analysis? • Interesting hypotheses of different dimensions: • Circle (physical region?), line, point • And data point outside any of them!
Multidimensional Methods • Aspire to full extraction of information (“Bayes discriminant”) • Equivalent to trying to fit: (little or nothing Bayesian) P(s|x)/P(b|x) = (P(s)/P(b)) x [P(x|s)/P(x|b)] [= Neyman-Pearson Test] unless you know something of P(s)/P(b) v.s. x that MC doesn’t! • Practical: multiple backgrounds, may want to fit separately • Less complexity to fit individual shapes than a sum • Issues in • choice of dimensionality (no one tells you how many!) • Almost always dimensionality < {kinematics + all ID variables} • No easy way to tell when have “all important variables!” (“the limit”) • methods of approximation • control of bias/variance tradeoff • complexity of fit method • Number of free parameters of method itself • Amount of training data needed • “ease” of interpretation • We are following the field; hope for theory to help • An excellent book: Elements of Statistical Learning, Hastie, Tibshirani, Friedman
R.K. Bock Interesting idea: Expansion in dimensionality of correlation
Roger Barlow Calculated the to use for comparison checks…
Aslan/Zech Goodness of Fit “Energy Test” (electrostatics motivated)
“liberating” Paul Harrison Blind Analysis Cousins: it takes longer, especially first time By the way, no one can read light green print…
Blind Analysis • A called shot • Step towards making 3 mean 3 • Many ways to blind • 10% of data; background-only, obscure fit result • Creates a mindset • Avoiding biases and subjectivity
Inherently unstable Measured smoother than true Un-smooth: Enhances noise! Nice discussion of regularization, biases, uncertainies See talk and his statistics book One program: Must balance between oscillations and over-smoothed result = Bias-variance tradeoff Same issues in multidimensional methods Glen Cowan Unfolding (unsmearing)
View as matrix problem “ill-posed” = singular Analyze in terms of Eigenvalues/vectors and condition number V. Blobel Unfolding: Insight Statistical error Truncate eigenfunctions when below error bound
Blobel Unfolding Results oversmoothed Statistical error in neighboring bins no longer uncorrelated! High frequencies not measured: Report fewer bins? (or supply from prior??) Higher modes (noisy) converge slowly: interate only a few times (d’Agostini)
Cousins’ Last Words (for now!)from conference summary • The area under the likelihood function is meaningless. • Mode of a probability density is metric-dependent, as are shortest intervals. • A confidence interval is a statement about P(data | parameters), not P(parameters | data) • Don’t confuse confidence intervals (statements about parameter) with goodness of fit (statement about model itself). • P(non-SM physics | data) requires a prior; you won’t get it from frequentist statistics. • The argument for coherence of Bayesian P is based on P = subjective degree of belief.