310 likes | 329 Views
Bayesian Methods in Particle Physics: From Small-N to Large. Harrison B. Prosper Florida State University SCMA IV 12-15 June, 2006. Outline. Measuring Zero Bayesian Fit Finding Needles in Haystacks Summary. Measuring Zero. Measuring Zero – 1. In the mid-1980s, an experiment at the
E N D
Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State University SCMA IV 12-15 June, 2006 From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Outline • Measuring Zero • Bayesian Fit • Finding Needles in Haystacks • Summary From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Measuring Zero – 1 In the mid-1980s, an experiment at the Institut Laue Langevin (Grenoble, France) searched for evidence of neutron antineutron oscillations, a characteristic prediction of certain Grand Unified Theories. From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Neutron gas CRISP Experiment Institut Laue Langevin Field-off: -> N Field-on: -> B Magnetic shield From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Measuring Zero – 2 Count number of signal + background events N. Suppress putative signal and count background events B, independently. Results: N = 3 B = 7 From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Measuring Zero – 3 Classic 2-Parameter Counting Experiment N ~ Poisson(s+b) B ~ Poisson(b) Infer a statement of form: Pr[s < u(N,B)] ≥ 0.9 From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Measuring Zero – 4 In 1984, no exact solution existed in the particle physics literature! Moreover, calculating exact confidence intervals is, according to Kendal and Stuart, “a matter of very considerable difficulty” From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Measuring Zero – 5 Exact in what way? Over some ensemble of statements of the form 0 < s < u(N,B) at least 90% of them should be true whatever the true values of s and b. Neyman (1937) From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Measuring Zero - 6 Tried a Bayesian approach: f(s, b|N) = f(N|s, b) p(s, b) / f(N) = f(N|s, b) p(b|s) p(s) / f(N) Step 1. Compute the marginal likelihood f(N|s) = ∫f(N|s, b) p(b|s) db Step 2. f(s|N) = f(N|s) p(s) / ∫f(N|s) p(s) ds From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Discrepancy “Distance” between models But is there a signal? 1. Hypothesis testing (J. Neyman) H0: s = 0 H1 : s > 0 2. p-value (R.A. Fisher) H0: s = 0 3. Decision theory (J.M. Bernardo, 1999) From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Bayesian Fit Problem: Given counts Data: N = N1, N2,..,NM Signal model: A = A1, A2,..,AM Background model: B = B1, B2,..,BM where M is number of bins (or pixels) find the admixture of A and B that best matches the observations N. From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Problem (DØ, 2005) Observations = Background + Signal model model (M) From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Bayesian Fit - Details Assume model of the form Marginalize over a and b From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Bayesian Fit – Pr(Model) Moreover,… One can compute f(N|pa, pb) for different signal models M, in particular, for models M that differ by the value of a single parameter. Then compute the probability of model M Pr(M|N) = ∫dpa ∫dpb f(N|pa, pb, M) p(pa,pb|M) p(M) / p(N) From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
0.3 0.2 P(M|N) 0.1 0 130 140 150 160 170 180 190 200 210 220 230 Top quark mass hypothesis (GeV) Bayesian Fit – Results (DØ, 1997) mass = 173.5 ± 4.5 GeV signal = 33 ± 8 background = 50.8 ± 8.3 From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
The Needles single top quark events 0.88 pb 1.98 pb From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
The Haystacks W boson events 2700 pb signal : noise = 1 : 1000 From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
The Needles and the Haystacks From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Finding Needles - 1 The optimal solution is to compute p(S|x) = p(x|S) p(S) / [p(x|S) p(S) + p(x|B) p(B)] Every signal/noise discrimination method is ultimately an algorithm to approximate p(S|x), or a function thereof. From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Finding Needles - 2 Problem: Given D = x (= x1,…xN), y (= y1,…yN) of N labeled events. x are the data, y are the labels. Find A function f(x, w), with parameters w, that approximates p(S|x): p(w|x, y) = p(x, y|w) p(w) / p(x, y) = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) = p(y|x, w) p(w) / p(y|x) assuming p(x|w) = p(x) From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Finding Needles - 3 Likelihood for classification: p(y|x, w) = Pi f(xi, w)y [1 – f(xi, w)]1-y where y = 0 for background events y = 1 for signal events If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Finding Needles - 4 However, in a Bayesian calculation it is more natural to average with respect to the posterior density f(x|D) = ∫ f(x, w) p(w|D) dw Questions: 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
u, a x1 v, b f(x,w) x2 Answer 1: Yes! A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:Rn -> R The parameters w = (u, a, v, b) are called weights From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Answer 2: Yes! Computational Method Generate a Markov Chain (MC) of K points {w}, whose stationary density is p(w|D), and average over the stationary part of the chain. Map problem to that of a “particle” moving in a spatially-varying “potential” and use methods of statistical mechanics to generate states (p, w) with probability ~ exp(-H), where H is the “Hamiltonian” H = p2 + log p(w|D), with “momentum” p. From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Hybrid Markov Chain Monte Carlo Computational Method… For a fixed H traverse space (p, w) using Hamilton’s equations, which guarantees that all points consistent with H will be visited with equal probability. To allow exploration of states with differing values of H one introduces, periodically, random changes to the momentum p. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.html From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Example - Finding SUSY! Transverse momentum spectra Signal: black curve Signal:Noise 1:25,000 From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Example - Finding SUSY! Distribution of f(x|D) beyond 0.9 Assuming L = 10 fb-1 Cut S B S/√B 0.99 1x103 2x104 7.0 Signal:Noise 1:20 From Small-N to Large Harrison B. Prosper SCMA IV, June 2006
Summary • Bayesian methods have been at the heart of several important results in particle physics. • However, there is considerable room for expanding their domain of application. • A couple of current issues: • Is there a signal? Is the Bernardo approach useful in particle physics? • Fitting: Is there a practical (Bayesian?) method to test whether or not an N-dimensional function fits an N-dimensional swarm of points? From Small-N to Large Harrison B. Prosper SCMA IV, June 2006