Advanced Statistical Techniques in Particle Physics Conference Summary (Thanks to Bob Cousins!). Jim Linnemann MSU HEP Seminar 23 April, 2002. Conference Overview. Durham, UK 5 days, nearly no rain! Mixture of “theoretical” and practical Overview/Tutorial talks
Conference Overview Durham, UK • 5 days, nearly no rain! • Mixture of “theoretical” and practical • Overview/Tutorial talks • Systematic Comparisons of Methods • New Developments • Problems • Visiting (tolerant!) Statisticians: • Michael Goldstein • Wolfgang Rolke • Radical idea: if phenomenologist in collaboration, why not a professional statistician (a la medical research)? http://www.ippp.dur.ac.uk/statistics/
Fred James: Overview Goodness of Fit vs. Intervals Roger Barlow: Systematics: mistakes, effects, errors Multidimensional: Sherry Towers: PDE’s Reducing variables in classification Harrison Prosper: multi- dimensional methods Tony Vaiculis: Support Vector Machines Niels Kjaer: Monte Carlo Interpolating (+ much else) Pekka Sinervo: Significance Berkan Aslan (G. Zech); Goodness of Fit measures Glen Cowan, Volker Blobel: Unfolding Paul Harrison: Blind Analysis Tutorials, Overviews, Explanations
Chris Parkes Combining Lep W results Gary Hill, Tyce De Young Bayes in Amanda tracking Rudy Bock, Wolfgang Wittek Multidimensional methods for Gamma/hadron separation Volker Blobel Global Alignment Fits Alex Read CLS Dean Karlen Credibility of Conf Intervals Raja Uncertainty of Limits Theory, Practice, and Methods
Problems to Chew On • Nigel Smith and Dan Tovey • Dark Matter Searches • Bruce Yabsley • Statistics in Practice at Belle
Fred James Important not to confuse these problems, e.g., interval estimation and goodness-of-fit testing.
Roger Barlow Calculated the to use for comparison checks…
Multidimensional Methods • Aspire to full extraction of information • Equivalent to trying to fit P(signal)/P(background) (Neyman-Pearson) • Issues in • choice of dimensionality (no one tells you how many!) • methods of approximation • control of bias/variance tradeoff • complexity of fit • Number of free parameters • Amount of training data needed • “ease” of interpretation • We are following the field; hope for theory to help • See: Elements of Statistical Learning, • Hastie, Tibshirani, Friedman
Sherry Towers Wow! Several questions come to my mind… [In general case, variables deletion is safer than variable addition. –M.G.]
Harrison Prosper • Thumbnail sketch of some methods of interest: • Fisher Linear Discriminant • Principal Components Analysis • Independent Component Analysis • Self-Organizing Map • Grid Search • Probability Density Estimation • Neural Networks • Support Vector Machines • Said these all are attempts to solve the single classification problem whose solution is the Bayes discriminator D(x) = P(S|x)/P(B|x) = (L(S)/L(B)) (P(S)/P(B)) … = Neyman-Pearson when P(S)=P(B) • Multivariate analysis is hard: important to use all the information used by D(x) (which might be lost, e.g., by marginalization). Appears that there is no single optimal approximation.
SVM Vaiculis Ref
Inherently unstable Measured smoother than true Un-smooth: Enhances noise! Nice discussion of regularization, biases, uncertainies See talk and his statistics book One program: Must balance between oscillations and over-smoothed result = Bias-variance tradeoff Same issues in multidimensional methods Glen Cowan Unfolding (unsmearing)
View as matrix problem “ill-posed” = singular Analyze in terms of Eigenvalues/vectors and condition number V. Blobel Unfolding: Insight Statistical error Truncate eigenfunctions when below error bound
Blobel Unfolding Results oversmoothed Statistical error in neighboring bins no longer uncorrelated! High frequencies not measured: Report fewer bins? (or supply from prior??) Higher modes converge slowly: interate only a few times (d’Agostini)
N.J. Kjaer (I) Delphi MC Re-interpretation of data to interpolate on physics paramters Analogy with Stat Mech MC techniques?
Aslan/Zech Goodness of Fit “Energy Test” (electrostatics motivated)
“liberating” Paul Harrison Blind Analysis Cousins: it takes longer, especially first time By the way, no one can read light green print…
Blind Analysis • A called shot • Step towards making 3 mean 3 • Many ways to blind • 10% of data; background-only, obscure fit result • Creates a mindset • Avoiding biases and subjectivity
R.K. Bock Interesting idea: Expansion in dimensionality of correlation
Very Interesting Technique! • Let’s relate it to something we do: say particle ID in a detector: • In hot part of detector near beam: lots of background, we tighten particle-ID cuts • In lower-occupancy part of the detector away from beam, can loosen certain particle-ID cuts without letting in a lot of background • Use our knowledge of position-dependent occupancy rates in Bayes’s Theorem to calculate the probability that a given particle in a given location is the species of interest.
Comments: • If all input P’s are frequentist P’s, the output P(particle type | data) is a frequentist P. • We can use this posterior frequentist P like any other observable for cuts, weights, etc. If we independently calibrate the signal efficiency/ background rejection of this use, there is nothing circular about using our knowledge of the input occupancies. • If the input occupancy knowledge is imperfect it will not introduce a bias, but rather make the technique less powerful.
Bayes’s Theorem applies to any P satisfying the axioms of probability • Frequentist P: limiting frequency • Theorem not much use if the unknown is a constant of nature: P(unknown) = delta-function at unknown value • Bayesian P: degree of belief • For constant of nature, P(unknown) can be combination of delta-function and continuous function, reflecting degree of belief • Is the Amanda technique “Bayesian”? • Not if “Bayesian” implies “not frequentist”, as I think is common, even though frequency P is emulated in a certain application/limit of degree of belief. • In any case, instructive example!
Practicalities of Combining Analyses:W Physics Results at LEP Now the stuff you don’t normally see… Chris Parkes RC: An informative talk about both methodology and sociology! An important reminder: pragmatic considerations (sometimes even irrational) can be as important as principles in order to get out a result.
This talk is not for the squeamish or over-idealistic, but it is a vivid description of the real world in action! • LEP experiments contained a sizable fraction of world HEP community, and reached very mature state of analysis. • We have much to learn from them, both theoretical and practical. Cousins:
Byron Roe and Michael Woodroofe: Mini-Boone Jan Conrad: Coverage with Systematics Rolke and Lopez: Bias correction via double-bootstrap Giunti and Laveder: the “power” of confidence intervals Punzi: Strong Confidence Intervals Giovanni Signorelli et al: Strong C.I. And systematics Studies of Intervals
Dean Karlen’s Proposal to Evaluate Credibility of Confidence Intervals • Yesterday evening, generally interested-to-favorable reaction • Cousins: I’m outlier: I think it will only encourage unthinking “easy” use of Bayes, with more flat (i.e., not degree of belief) priors. • We evaluate Bayesian intervals with serious frequentist methods. • Why not evaluate confidence intervals with serious Bayesian methods? One metric-dependent prior constituteth not a sensitivity analysis. • Who was it who said “How do you know that the outlier isn’t right?”
Alex Read’s Beautiful Talk on CLS • CLs = PV(s+b)/PV(b) (Cowan, PDG Stats) • PV = P Value = prob(obs), posterior, like P(2) • Behavior compared to LR Ordering (F-C) is understood and lucidly explained. Application to neutrino oscillations! • Please see his talk • Cousins comment: The non-standard conditioning (inequality, not ancillary statistic) of Zech and Roe&W and Read leads to problems with lower end of confidence intervals (see Cousins PRD Comment). Alex recognized this. • Therefore, Alex now advocates CLS only for limits, and in case of signal, he now would use LR Ordering.
To Use Bayes or not? • Professional Statisticians are much more Bayes-oriented in last 20 years • Computationally possible • Philosophically coherent • (solipsistic?? Subjective Bayes…) • In HEP: want to publish result, not prior • We want to talk about P(theory|data) • But this requires prior: P(theory) • Likelihoods we can agree on! • Conclusions should be insensitive to a range of priors • Probably true, with enough data • Search limits DO depend on priors! • Hard to convince anyone of a single objective prior!!! • Unpleasant properties of naïve frequentist limits, too • Feldman-Cousins is current consensus • Systematic errors hard to treat in frequentist • PDG currently recommends Bayes “smearing of likelihood” • close in spirit to Cousins-Highland mixed Frequentist-Bayesian
Michael Goldstein • A real pleasure to have you here! • Since subjective Bayes is rarely used in HEP, but is “known” to be the “coherent” version, it has been very enlightening: • “Sensitivity Analysis is at the heart of scientific Bayesianism” • How skeptical would the community as a whole have to be in order not to be convinced. • What prior gives P(hypothesis) > 0.5 • What prior gives P(hypothesis) > 0.99, etc • There’s a split among Bayesians; M.G. is in the group that sees no virtue in objective (“arbitrary”) priors (except as one of many examples of possible prior beliefs in a sensitivity analysis).
Michael Goldstein (cont.) • Procedures should obey the likelihood principle. Frequentist methods don’t obey it: fundamental flaw. • Bayesian methods are hard to do right, but they are the only way to attack certain hard problems. • Bayes Linear Methodology: addresses expectations rather than whole pdf’s. • HEP problems: appear to map onto a very similar set of abstract problems.
Cousins would add: • (Coherent) Subjective priors behave like real probabilities under transformations, unlike, e.g., flat priors. • M.G. represents only one school of Bayesian stats, but I don’t think you will find a school advocating uniform prior for a Poisson mean. • M.G. portrays Bayesian methods as hard, but worth the effort. This should be stressed in HEP, where the hard part (subjective prior) is dodged, and the math is (indeed) easily cranked out (without backwards thinking) to give an “answer” that I think is without much content unless evaluated by frequentist standards. • I think M.G.’s point about sensitivity analysis has to be taken to heart in HEP, whether one uses objective or subjective priors.
Cousins’ Last Words (for now!) • The area under the likelihood function is meaningless. • Mode of a probability density is metric-dependent, as are shortest intervals. • A confidence interval is a statement about P(data | parameters), not P(parameters | data) • Don’t confuse confidence intervals (statements about parameter) with goodness of fit (statement about model itself). • P(non-SM physics | data) requires a prior; you won’t get it from frequentist statistics. • The argument for coherence of Bayesian P is based on P = subjective degree of belief.