290 likes | 300 Views
This presentation discusses the use of Bayesian analysis in measuring zero in various experiments in particle physics. It covers topics such as signal/background discrimination, 1-D and 14-D examples, and open issues in the field. The talk highlights the challenges and advancements in calculating exact confidence intervals and the potential of working directly in high-dimensional spaces using computational resources.
E N D
Bayesian Within The GatesA View From Particle Physics Harrison B. Prosper Florida State University SAMSI 24 January, 2006 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Outline • Measuring Zero as Precisely as Possible! • Signal/Background Discrimination • 1-D Example • 14-D Example • Some Open Issues • Summary Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Measuring Zero! Diamonds may not be forever Neutron <-> anti-neutron transitions, CRISP Experiment (1982 – 1985), Institut Laue Langevin Grenoble, France Method Fire gas of cold neutrons onto a graphite foil. Look for annihilation of anti-neutron component. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Measuring Zero! Count number of signal + background events N. Suppress putative signal and count background events B, independently. Results: N = 3 B = 7 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Measuring Zero! Classic 2-Parameter Counting Experiment N ~ Poisson(s+b) B ~ Poisson(b) Wanted: A statement like s < u(N,B) @ 90% CL Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Measuring Zero! In 1984, no exact solution existed in the particle physics literature! But, surely it must have been solved by statisticians. Alas, from Kendal and Stuart I learnt that calculating exact confidence intervals is “a matter of very considerable difficulty”. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Measuring Zero! Exact in what way? Over the ensemble of statements of the form s є [0, u) at least 90% of them should be true whatever the true value of the signal s AND whatever the true value of the background parameter b. blame…Neyman (1937) Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
“Keep it simple, but no simpler” Albert Einstein Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Bayesian @ the Gate (1984) Solution: p(N,B|s,b) = Poisson(s+b) Poisson(b) the likelihood p(s,b) = uniform(s,b) the prior Compute the posteriordensity p(s,b|N,B) p(s,b|N,B) = p(N,B|s,b) p(s,b)/p(N,B) Marginalize over b p(s|N,B) = ∫p(s,b|N,B) db This reasoning was compelling to me then, and is much more so now! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Particle Physics Data proton + anti-proton -> positron (e+) neutrino (n) Jet1 Jet2 Jet3 Jet4 This event “lives” in 3 + 2 + 3 x 4 = 17 dimensions. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Particle Physics Data CDF/Dzero Discovery of top quark (1995) Data red Signal green Background blue, magenta Dzero: 17-D -> 2-D Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
But that was then, and now is now! Today we have 2 GHz laptops, with 2 GB of memory! It is fun to deploy huge, sometimes unreliable, computational resources, that is, brains, to reduce the dimensionality of data. But perhaps it is now feasible to work directly in the original high-dimensional space, using hardware! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Signal/Background Discrimination The optimal solution is to compute p(S|x) = p(x|s) p(s) / [p(x|s) p(s) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this solution, or a mapping thereof. Therefore, if a method is already at the Bayes limit, no other method, however sophisticated, can do better! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Signal/Background Discrimination Given D =x,y x = {x1,…xN}, y = {y1,…yN} of N training examples Infer A discriminant function f(x, w), with parameters w p(w|x, y) = p(x, y|w) p(w) / p(x, y) = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Signal/Background Discrimination A typical likelihood for classification: p(y|x, w) = Pi f(xi, w)y [1 – f(xi, w)]1-y where y = 0 for background events y = 1 for signal events If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Signal/Background Discrimination However, in a full Bayesian calculation one usually averages with respect to the posterior density y(x) = ∫ f(x, w) p(w|D) dw Questions: 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Answer 1: Hilbert’s 13th Problem! Prove that the following is impossible y(x,y,z) = F( A(x), B(y), C(z) ) In 1957, Kolmogorov proved the contrary conjecture y(x1,..,xn) = F( f1(x1),…,fn(xn) ) I’ll call such functions, F, Kolmogorov functions Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
u, a x1 v, b n(x,w) x2 Kolmogorov Functions A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:RN -> U The parameters w = (u, a, v, b) are called weights Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Answer 2: Use Hybrid MCMC Computational Method Generate a Markov chain (MC) of N points {w} drawn from the posterior density p(w|D) and average over the last M points. Each point corresponds to a network. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.html Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
A 1-D Example Signal • p+pbar -> t q b Background • p+pbar -> W b b NN Model Class • (1, 15, 1) MCMC • 500 tqb + Wbb events • Use last 20 networks in a MC chain of 500. Wbb tqb x Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
A 1-D Example Dots p(S|x) = HS/(HS+HB) HS, HB, 1-D histograms Curves Individual NNs n(x, wk) Black curve < n(x, w) > x Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
A 14-D Example (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise 1/100,000 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
A 14-D Example (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Variable count 4 x (ET, h, f) + (ET, f) = 14 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
A 14-D Example (Finding Susy!) Signal 250 p+pbar -> top + anti-top (MC) events Background 250 p+pbar -> gluino gluino (MC) events NN Model Class (14, 40, 1) (641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20. Likelihood Prior Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
But does it Work? Signal to noise can reach 1/1 with an acceptable signal strength Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
But does it Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the signal and background events are weighted with y(x), we should recover the signal density. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
But does it Work? Amazingly well ! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Some Open Issues • Why does this insane function p(w1,…,w641|x1,…,x500) behave so well? 641 parameters > 500 events! • How should one verify that an n-D (n ~ 14) swarm of simulated background events matches the n-D swarm of observed events (in the background region)? • How should one verify that y(x) is indeed a reasonable approximation to the Bayes discriminant, p(S|x)? Bayesian within the Gates Harrison B. Prosper SAMSI, 2006
Summary • Bayesian methods have been, and are being, used with considerable success by particle physicists. Happily, the frequentist/Bayesian Cold War is abating! • The application of Bayesian methods to highly flexible functions, e.g., neural networks, is very promising and should be broadly applicable. • Needed: A powerful way to compare high-dimensional swarms of points. Agree, or not agree, that is the question! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006