170 likes | 187 Views
This forum highlights burning issues in statistics for the ATLAS experiment at CERN, discussing sensitivity measures, cut optimization, discovery strategies, and incoporating systematic uncertainties. Key topics include optimizing for discovery, significance measures, limit setting, CLs methods, power constrained limits, and displaying Poisson errors in data plots.
E N D
Burning Issues for the Statistics Forum1) News items / organization2) Highlights from recent hypernews discussions ATLAS Statistics Forum CERN, 5 October 2010 Glen Cowan, RHUL Eilam Gross, Weizmann Institute Burning Issues / 5 Oct 2010
https://twiki.cern.ch/twiki/bin/view/AtlasProtected/ATLASStatisticsFAQhttps://twiki.cern.ch/twiki/bin/view/AtlasProtected/ATLASStatisticsFAQ
ATLAS Statistics Forum • Conveners: Glen Cowan + Eilam Gross • Roostats: Kyle Cranmer • Experts: Diego Casadei • Physics Groups Representatives • Higgs: Aaron Armbruster • SM: Frank Ellinghaus • Top: Julien Donini • Exotics: Samir Ferrag, Daniel Whiteson • SUSY: Vasia Mitsou • B: Roger Jones
ANNOUNCING PHYSTAT 201117-19 January 2011 • It will deal with the statistical issues related to discovery claims in search experiments, concentrating on those at the LHC • There is no registration fee but would-be participants should register on the conference website, http://indico.cern.ch/event/phystat2011, where further details are available.
hn-atlas-physics-Statistics Lots of recent activity; mostly followed up ... Burning Issues / 5 Oct 2010
Measures of sensitivity and cut optimization (Vladimir S., Xavier P., Andrew B., Bill M., Yoram R., Michael K., Johan L., Alex R., GDC.,...) Recent resurrection of discussion about what measure of sensitivity to maximize when designing analysis (setting cuts). s/√b? s/√(s+b)? other? (e.g., consider systematics?) Sensitivity measure depends on goal of analysis: Measurement? Minimize e.g. expected relative uncertainty in the thing you want to measure. For Poisson, maximize s/√(s+b) (or es × purity). Discovery? Maximize probability to make the discovery assuming something is there to be discovered. Burning Issues / 5 Oct 2010
Optimizing for discovery Using profile likelihood methods we can write down expected sensitivities that incorporate systematic uncertainties through nuisance parameters, e.g., median discovery significance assuming s+b hypothesis: where “A” denotes the Asimov data set (data = expectation) and See Higgs CSC chapter and Cowan, Cranmer, Gross, Vitells “Asimov” paper, arXiv:1007.1727 Burning Issues / 5 Oct 2010
Measure of significance for Poisson data For Poisson data and in the absence of systematics, the median discovery significance assuming m = 1 (s+b) is approximately: CCGV, arXiv:1007.1727 Reduces to s/√b for s « b Jumps in exact median due to discreteness of data. Formula used by some groups in e.g. CSC exercise; should become standard. Burning Issues / 5 Oct 2010
Incorporate systematic uncertainties in a limit Philippe M., Nils K., Diego C., Kevin K.,... Side issue: what CL should be used for limits? Propose 95% (need to agree with CMS). Key to incorporating systematics is to associate the uncertianty with nuisance parameters. Then use either profile likelihood-based methods or Bayesian methods. Frequentist: see e.g., Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727 Bayesian: talks today by Kevin, Diego Burning Issues / 5 Oct 2010
Marc, Alex, Eilam,... CLs discussion The CLs solution (A. Read et al.) is to base the test not on the usual p-value (CLs+b), but rather to divide this by CLb (one minus the background of the b-only hypothesis, i.e., f (q|s+b) Define: f (q|b) qobs 1-CLb = pb CLs+b = ps+b Reject s+b hypothesis if: q Reduces “effective” p-value when the two distributions become close (prevents exclusion if sensitivity is low). Burning Issues / 5 Oct 2010
Power Constrained Limits (PCL) Statistics community does not like ratio of p-values such as CLs; would prefer to regard a parameter q as excluded if: (a) p-value of q < 0.05 (b) power of test of q with respect to background-only > some threshold (or use e.g. Bayes factor) Requiring (a) alone gives the standard frequentist interval, (CLs+b method) which has the correct coverage. Requiring ANDed combination of (a) and (b) is more conservative; end effect is similar to CLs, but makes more explicit the minimum the role of minimum sensitivity (as quantified by power). Studies ongoing (see StatForum FAQ); need to agree with CMS. Will still want to report CLs based limit so we can compare with e.g. Tevatron. Burning Issues / 5 Oct 2010
“Functions for Poisson Errors” Diego C., Bruce M., Eric F. Kristin L., Joseph T, GDC, et al. We observe n~ Poisson(n). (or ni ~ Poisson(ni), i = 1,...,N). How should this be displayed on a plot? No error bars? With error bars? Computed how? This is part “statistics issue”, part “presentational issue”. Burning Issues / 5 Oct 2010
(Public) CDF Discussion paper www-cdf.fnal.gov/physics/statistics/notes/pois_eb.txt ERROR BARS FOR POISSON DATA The Statistics Committee had been asked by the Spokespersons to make a recommendation about the magnitude of error bars to be shown on histograms in CDF publications. This produced very animated discussions in the Statistics Committee... ... it was decided that it is simplest to keep to the traditional practice of using sqrt(n) for the error bars, where n is the observed number of events. Burning Issues / 5 Oct 2010
Common approach: error bar = √n Treat observed n as estimator for n. Estimator has a variance (which is a function of n) Construct an estimator of the variance of the estimator of n Square root is error bar, i.e., an estimate of standard deviation of estimator of n: Simple. Easy to communicate what you did. Can be misleading, e.g., 0 ± 0, 1 ± 1 Burning Issues / 5 Oct 2010
No error bars In principle, no error bar needed: the observed data are “exact”. Reader is expected to mentally compute √n (or relevant p-value) to get idea of expected fluctuations. Avoids problems of 0 ± 0, 1 ± 1, etc., which do not correctly convey the inference we want to make when observing 0, 1, etc. For example, if a model predicts n =10-8 and we observe n = 1, then 1 ± 1 seems to indicate data and model are compatible; in fact p-value = P(n ≥ nobs|n) very small. Burning Issues / 5 Oct 2010
Confidence intervals (frequentist) Construct e.g. central 68.3% confidence interval for n (PDG): with alo = aup = (1 – 0.683)/2 = 0.158. Asymmetric intervals, becoming symmetric ±s for large n. Because of Poisson discreteness, coverage can be greater than or equal to 68.3%. Easy to describe: “Error bars represent 68.3% central confidence intervals” For n = 0, nlo = 0; in that case might want, e.g., 95% upper limit, but then things get complicated. Burning Issues / 5 Oct 2010
Bayesian credible intervals → Diego’s talk After Diego’s talk we should discuss and agree on some Recommendations: One recommendation could be, do either: √n frequentist confidence interval no error bars on histograms Bayesian credible interval depending on what the analyst feels best communicates what he/she wants to show, and always say what was done (e.g., in the plot’s caption). Burning Issues / 5 Oct 2010