Statistics for SUSY/Exotics

Explore Bayesian limits, prior choices, and sensitivity in SUSY and exotic particle searches. Learn about formal priors, Bayesian intervals, and Jeffreys' prior in high-energy physics.

  Statistics for SUSY/Exotics ATLAS UK SUSY/Exotics Cambridge, 1 May, 2012 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  2. Outline Some developments for Bayesian methods Bayesian limits Reference priors Bayes factors for discovery Discovery sensitivity: beyond s/√b Some comments on unfolding ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  3. Quick review of probablility ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  4. The Bayesian approach to limits In Bayesian statistics need to start with ‘prior pdf’p(q), this reflects degree of belief about q before doing the experiment. Bayes’ theorem tells how our beliefs should be updated in light of the data x: Integrate posterior pdf p(q | x) to give interval with any desired probability content. For e.g. n ~ Poisson(s+b), 95% CL upper limit on s from ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  5. Bayesian prior for Poisson parameter Include knowledge that s≥0 by setting prior p(s) = 0 for s<0. Could try to reflect ‘prior ignorance’ with e.g. Not normalized but this is OK as long as L(s) dies off for large s. Not invariant under change of parameter — if we had used instead a flat prior for, say, the mass of the Higgs boson, this would imply a non-flat prior for the expected number of Higgs events. Doesn’t really reflect a reasonable degree of belief, but often used as a point of reference; or viewed as a recipe for producing an interval whose frequentist properties can be studied (coverage will depend on true s). ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  6. Bayesian interval with flat prior for s Solve numerically to find limit sup. For special case b = 0, Bayesian upper limit with flat prior numerically same as one-sided frequentist case (‘coincidence’). Otherwise Bayesian limit is everywhere greater than the one-sided frequentist limit, and here (Poisson problem) it coincides with the CLs limit. Never goes negative. Doesn’t depend on b if n = 0. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  7. Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements. Often called “objective priors” Form basis of Objective Bayesian Statistics The priors do not reflect a degree of belief (but might represent possible extreme cases). In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  8. Priors from formal rules (cont.) For a review of priors obtained by formal rules see, e.g., Formal priors have not been widely used in HEP, but there is recent interest in this direction, especially the reference priors of Bernardo and Berger; see e.g. L. Demortier, S. Jain and H. Prosper, Reference priors for high energy physics, Phys. Rev. D 82 (2010) 034002, arXiv:1002.1111. D. Casadei, Reference analysis of the signal + background model in counting experiments, JINST 7 (2012) 01012; arXiv:1108.4270. Casadei’s approach: describe nuisance parameters with informative gamma priors and marginalize; from this find the Jeffreys’ prior for the (single) parameter of interest. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  9. Jeffreys’ prior According to Jeffreys’rule, take prior according to where is the Fisher information matrix. One can show that this leads to inference that is invariant under a transformation of parameters. For a Gaussian mean, the Jeffreys’ prior is constant; for a Poisson mean m it is proportional to 1/√m. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  10. Jeffreys’ prior for Poisson mean Suppose n ~ Poisson(m). To find the Jeffreys’ prior for m, So e.g. for m = s + b, this means the prior p(s) ~ 1/√(s + b), which depends on b. Note this is not designed as a degree of belief about s. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  11. Bayesian limit on s with uncertainty on b Uncertainty on b goes into the prior, e.g., Put this into Bayes’ theorem, Marginalize over the nuisance parameter b, Then use p(s|n) to find intervals for s with any desired probability content. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  12. Digression: marginalization with MCMC Bayesian computations involve integrals like often high dimensionality and impossible in closed form, also impossible with ‘normal’ acceptance-rejection Monte Carlo. Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation. MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers: cannot use for many applications, e.g., detector MC; effective stat. error greater than if all values independent . Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  13. Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC: Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc. Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?) ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  14. Bayesian model selection (‘discovery’) The probability of hypothesis H0 relative to an alternative H1 is often given by the posterior odds: no Higgs Higgs prior odds Bayes factor B01 The Bayes factor is regarded as measuring the weight of evidence of the data in support of H0 over H1. Interchangeably use B10 = 1/B01 ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  15. Rewriting the Bayes factor Suppose we have models Hi, i = 0, 1, ..., each with a likelihood and a prior pdf for its internal parameters so that the full prior is where is the overall prior probability for Hi. The Bayes factor comparing Hi and Hj can be written ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  16. Bayes factor is independent of P(Hi) For Bij we need the posterior probabilities marginalized over all of the internal parameters of the models: Use Bayes theorem Ratio of marginal likelihoods So therefore the Bayes factor is The prior probabilities pi = P(Hi) cancel. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  17. Numerical determination of Bayes factors Both numerator and denominator of Bij are of the form ‘marginal likelihood’ Various ways to compute these, e.g., using sampling of the posterior pdf (which we can do with MCMC). Harmonic Mean (and improvements) Importance sampling Parallel tempering (~thermodynamic integration) Nested sampling (MultiNest) ... See e.g. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  18. Assessing Bayes factors One can use the Bayes factor much like a p-value (or Z value). There is an “established” scale, analogous to HEP's 5σ rule: B10 Evidence against H0 -------------------------------------------- 1 to 3 Not worth more than a bare mention 3 to 20 Positive 20 to 150 Strong > 150 Very strong Kass and Raftery, Bayes Factors, J. Am Stat. Assoc 90 (1995) 773. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  19. Gross and Vitells, EPJC 70:525-530,2010, arXiv:1005.1891 The Look-Elsewhere Effect Suppose a model for a mass distribution allows for a peak at a mass m with amplitude μ. The data show a bump at a mass m0. How consistent is this with the no-bump (μ = 0) hypothesis? ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  20. p-value for fixed mass First, suppose the mass m0 of the peak was specified a priori. Test consistency of bump with the no-signal (μ= 0) hypothesis with e.g. likelihood ratio where “fix” indicates that the mass of the peak is fixed to m0. The resulting p-value gives the probability to find a value of tfix at least as great as observed at the specific mass m0. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  21. p-value for floating mass But suppose we did not know where in the distribution to expect a peak. What we want is the probability to find a peak at least as significant as the one observed anywhere in the distribution. Include the mass as an adjustable parameter in the fit, test significance of peak using (Note m does not appear in the μ = 0 model.) ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  22. Gross and Vitells Distributions of tfix, tfloat For a sufficiently large data sample, tfix ~chi-square for 1 degree of freedom (Wilks’ theorem). For tfloat there are two adjustable parameters, μ and m, and naively Wilks theorem says tfloat ~ chi-square for 2 d.o.f. In fact Wilks’ theorem does not hold in the floating mass case because on of the parameters (m) is not-defined in the μ = 0 model. So getting tfloat distribution is more difficult. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  23. Gross and Vitells Approximate correction for LEE We would like to be able to relate the p-values for the fixed and floating mass analyses (at least approximately). Gross and Vitells show the p-values are approximately related by where 〈N(c)〉 is the mean number “upcrossings” of -2ln λ(0) in the fit range based on a threshold and where Zfix is the significance for the fixed mass case. So we can either carry out the full floating-mass analysis (e.g. use MC to get p-value), or do fixed mass analysis and apply a correction factor (much faster than MC). ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  24. Gross and Vitells Upcrossings of -2lnλ The Gross-Vitells formula for the trials factor requires 〈N(c)〉, the mean number “upcrossings” of -2lnλ(0) in the fit range based on a threshold c = tfix= Zfix2. 〈N(c)〉 can be estimated from MC (or the real data) using a much lower threshold c0: In this way 〈N(c)〉 can be estimated without need of large MC samples, even if the the threshold c is quite high. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  25. Vitells and Gross, Astropart. Phys. 35 (2011) 230-234; arXiv:1105.4355 Multidimensional look-elsewhere effect Generalization to multiple dimensions: number of upcrossings replaced by expectation of Euler characteristic: Applications: astrophysics (coordinates on sky), search for resonance of unknown mass and width, ... ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  26. Summary on Look-Elsewhere Effect Remember the Look-Elsewhere Effect is when we test a single model (e.g., SM) with multiple observations, i..e, in mulitple places. Note there is no look-elsewhere effect when considering exclusion limits. There we test specific signal models (typically once) and say whether each is excluded. With exclusion there is, however, the analogous issue of testing many signal models (or parameter values) and thus excluding some even in the absence of signal (“spurious exclusion”) Approximate correction for LEE should be sufficient, and one should also report the uncorrected significance. “There's no sense in being precise when you don't even know what you're talking about.” –– John von Neumann ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  27. Discovery significance for n ~ Poisson(s + b) • Consider the frequently seen case where we observe n events , • model as following Poisson distribution with mean s + b • (assume b is known). • For an observed n, what is the significance Z0 with which • we would reject the s = 0 hypothesis? • What is the expected (or more precisely, median ) Z0 if • the true value of the signal rate is s? ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  28. Gaussian approximation for Poisson significance For large s + b, n → x ~ Gaussian(m,s) , m = s + b, s = √(s + b). For observed value xobs, p-value of s = 0 is Prob(x > xobs | s = 0),: Significance for rejecting s = 0 is therefore Expected (median) significance assuming signal rate s is ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  29. Better approximation for Poisson significance Likelihood function for parameter s is or equivalently the log-likelihood is Find the maximum by setting gives the estimator for s: ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  30. Approximate Poisson significance (continued) The likelihood ratio statistic for testing s = 0 is For sufficiently large s + b, (use Wilks’ theorem), To find median[Z0|s+b], let n → s + b (i.e., the Asimov data set): This reduces to s/√b for s << b. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  31. n ~ Poisson(μs+b), median significance,assuming μ = 1, of the hypothesis μ = 0 CCGV, arXiv:1007.1727 “Exact” values from MC, jumps due to discrete data. Asimov √q0,A good approx. for broad range of s, b. s/√b only good for s « b. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  32. Some comments on unfolding Given an observed distribution subject to statistical fluctuations and event migration between bins: try to estimate the “true” underlying distribution: ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  33. The (naive) Maximum Likelihood solution Maximizing the likelihood: response matrix back- ground gives Catastrophic failure??? ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  34. Regularized unfolding ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  35. Stat. and sys. errors of unfolded solution In general the statistical covariance matrix of the unfolded estimators is not diagonal; need to report full But unfolding necessarily introduces biases as well, corresponding to a systematic uncertainty (also correlated between bins). This is more difficult to estimate. Suppose, nevertheless, we manage to report both Ustat and Usys. To test a new theory depending on parameters θ, use e.g. Mixes frequentist and Bayesian elements; interpretation of result can be problematic, especially if Usys itself has large uncertainty. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  36. Folding Suppose a theory predicts f(y) → μ(may depend onparametersθ). Given the response matrix R and expected background β, this predicts the expected numbers of observed events: From this we can get the likelihood, e.g., for Poisson data, And using this we can fit parameters and/or test, e.g., using the likelihood ratio statistic ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  37. Versus unfolding If we have an unfolded spectrum and full statistical and systematic covariance matrices, to compare this to a model μ compute likelihood where Complications because one needs estimate of systematic bias Usys. If we find a gain in sensitivity from the test using the unfolded distribution, e.g., through a decrease in statistical errors, then we are exploiting information inserted via the regularization (e.g., imposed smoothness). ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  38. ML solution again From the standpoint of testing a theory or estimating its parameters, the ML solution, despite catastrophically large errors, is equivalent to using the uncorrected data (same information content). There is no bias (at least from unfolding), so use The estimators of θshould have close to optimal properties: zero bias, minimum variance. The corresponding estimators from any unfolded solution cannot in general match this. Crucial point is to use full covariance, not just diagonal errors. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  39. Some unfolding references H. Prosper and L. Lyons (eds.), Proceedings of PHYSTAT 2011, CERN-2011-006; many contributions on unfolding from p 225. G. Cowan, A Survey Of Unfolding Methods For Particle Physics, www.ippp.dur.ac.uk/old/Workshops/02/statistics/proceedings/cowan.pdf G. Cowan, 2012 CERN Academic Traning Lecture #4: indico.cern.ch/conferenceDisplay.py?confId=173728 G. Cowan, Statistical Data Analysis, OUP (1998), Chapter 11. V. Blobel, An Unfolding Method for High Energy Physics Experiments, arXiv:hep-ex/0208022v1 G. D'Agostini, Improved iterative Bayesian unfolding, arXiv:1010.0632 G. Choudalakis, Fully Bayesian Unfolding, arXiv:1201.4612 Tim Adye, Unfolding algorithms and tests using RooUnfold, arXiv:1105.1160v1 ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  40. Summary Bayesian methods for limits Beware of issues with flat priors Consider reference priors Issues related to discovery For (frequentist) sensitivity, use Look-Elsewhere Effect ~solved For Bayesian discovery, use Bayes Factor Unfolding A big topic – avoid if possible! ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  41. Extra slides ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  42. Why 5 sigma? Common practice in HEP has been to claim a discovery if the p-value of the no-signal hypothesis is below 2.9 × 10-7, corresponding to a significance Z = Φ-1 (1 – p) = 5 (a 5σ effect). There a number of reasons why one may want to require such a high threshold for discovery: The “cost” of announcing a false discovery is high. Unsure about systematics. Unsure about look-elsewhere effect. The implied signal may be a priori highly improbable (e.g., violation of Lorentz invariance). ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  43. Why 5 sigma (cont.)? But the primary role of the p-value is to quantify the probability that the background-only model gives a statistical fluctuation as big as the one seen or bigger. It is not intended as a means to protect against hidden systematics or the high standard required for a claim of an important discovery. In the processes of establishing a discovery there comes a point where it is clear that the observation is not simply a fluctuation, but an “effect”, and the focus shifts to whether this is new physics or a systematic. Providing LEE is dealt with, that threshold is probably closer to 3σ than 5σ. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  44. J. Bernardo, L. Demortier, M. Pierini (PHYSTAT 2011) Reference priors Maximize the expected Kullback–Leibler divergence of posterior relative to prior: This maximizes the expected posterior information about θ when the prior density is π(θ). Finding reference priors “easy” for one parameter: ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  45. J. Bernardo, L. Demortier, M. Pierini (PHYSTAT 2011) Reference priors (2) Actual recipe to find reference prior nontrivial; see references from Bernardo’s talk, website of Berger (www.stat.duke.edu/~berger/papers) and also Demortier, Jain, Prosper, PRD 82:33, 34002 arXiv:1002.1111: Prior depends on order of parameters. (Is order dependence important? Symmetrize? Sample result from different orderings?) ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  46. MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points Proposal density e.g. Gaussian centred about 1) Start at some point 2) Generate 3) Form Hastings test ratio 4) Generate move to proposed point 5) If else old point repeated 6) Iterate ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

  47. Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one). For our purposes this correlation is not fatal, but statistical errors larger than naive The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric: Test ratio is (Metropolis-Hastings): I.e. if the proposed step is to a point of higher , take it; if not, only take the step with probability If proposed step rejected, hop in place. ATLAS UK Cambridge 1 May 2012 / Statistics for SUSY/Exotics

