1 / 34

Signal/Background Discrimination in Particle Physics

Signal/Background Discrimination in Particle Physics. Harrison B. Prosper Florida State University SAMSI 8 March, 2006. Outline. Particle Physics Data Signal/Background Discrimination Summary. Particle Physics Data. proton + anti-proton -> positron ( e + ) neutrino ( n ) Jet1

sai
Download Presentation

Signal/Background Discrimination in Particle Physics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Signal/Background Discrimination in Particle Physics Harrison B. Prosper Florida State University SAMSI 8 March, 2006 Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  2. Outline • Particle Physics Data • Signal/Background Discrimination • Summary Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  3. Particle Physics Data proton + anti-proton -> positron (e+) neutrino (n) Jet1 Jet2 Jet3 Jet4 This event is described by (at least) 3 + 2 + 3 x 4 = 17 measured quantities. Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  4. Particle Physics Data 106 1 H0 Standard Model H1 Model of the Week Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  5. Signal/Background Discrimination To minimize misclassification probability, compute p(S|x) = p(x|S) p(S) / [p(x|S) p(S) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this function, or a mapping thereof. p(s) / p(b) is the prior signal to background ratio, that is, it is S/B before applying a cut to p(S|x). Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  6. Signal/Background Discrimination Given D =x,y x = {x1,…xN}, y = {y1,…yN} of N training examples (events) Infer A discriminant function f(x, w), with parameters w p(w|x, y) = p(x, y|w) p(w) / p(x, y) = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  7. Signal/Background Discrimination A typical likelihood for classification: p(y|x, w) = Pi f(xi, w)y [1 – f(xi, w)]1-y where y = 0 for background events y = 1 for signal events If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  8. Signal/Background Discrimination However, in a Bayesian calculation it is more natural to average y(x) = ∫ f(x, w) p(w|D) dw Questions: 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  9. Answer 1: Yes! Hilbert’s 13th problem: Prove a special case of the conjecture: The following is impossible, in general, f(x1,…,xn) = F( g1(x1),…, gn(xn) ) In 1957, Kolmogorov proved the contrary: A function f:Rn -> R can be represented as follows f(x1,..,xn) = ∑i=12n+1 Qi( ∑j=1n Gij(xj) ) where Gij are independent of f(.) Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  10. u, a x1 v, b n(x,w) x2 Kolmogorov Functions A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:Rn -> R The parameters w = (u, a, v, b) are called weights Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  11. Answer 2: Yes! Computational Method Generate a Markov chain (MC) of N points {w}, whose stationary density is p(w|D), and average over the last M points. Map problem into that of “particle” moving in a spatially-varying “potential” and use methods of statistical mechanics to generate states (p, w) with probability ~ exp(-b H), where H is the “Hamiltonian” H = log p(w|D) + p2, with “momentum” p. Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  12. Hybrid Markov Chain Monte Carlo Computational Method… For a fixed H traverse space (p, w) using Hamilton’s equations, which guarantees that all points consistent with H will be visited with equal probability ~ exp(-bH). To allow exploration of states with differing values of H one introduces, periodically, random changes to the momentum p. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.html Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  13. Example 1

  14. Example 1: 1-D Signal • p+pbar -> t q b Background • p+pbar -> W b b NN Model Class • (1, 15, 1) MCMC • 500 tqb + Wbb events • Use last 20 points in a chain of 10,000, Wbb tqb x skipping every 20th Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  15. Example 1: 1-D Dots p(S|x) = HS/(HS+HB) HS, HB, 1-D histograms Curves Individual NNs n(x, wk) Black curve < n(x, w) > x Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  16. Example 2

  17. Example 2: 14-D (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise 1/25,000 Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  18. Example 2: 14-D (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Measured quantities: 4 x (ET, h, f) + (ET, f) = 14 Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  19. Example 2: 14-D (Finding Susy!) Signal 250 p+pbar -> gluino, gluino (Susy) events Background 250 p+pbar -> top, anti-top events NN Model Class (14, 40, 1) (wє641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20. Likelihood Prior Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  20. Results Network distribution beyond n(x) > 0.9 Assuming L = 10 fb-1 Cut S B S/√B 0.90 5x103 2x106 3.5 0.95 4x103 7x105 4.7 0.99 1x103 2x104 7.0 Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  21. But Does It Really Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the data (signal + background) are weighted with y(x), we should recover the signal density. Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  22. But Does It Really Work? It seems to! Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  23. Example 3

  24. Particle Physics Data, Take 2 • Two varieties of jet: • Tagged (Jet 1, Jet 4) • Untagged (Jet 2, Jet 3) • We are often interested in • Pr(Tagged|Jet Variables) Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  25. Example 3: “Tagging” Jets p(x|T) or d(x) p(T|x)= p(x|T) p(T) / d(x) d(x) = p(x|T) p(T) + p(x|U) p(U) x = (PT, h, f) (red curve is d(x)!) Tagged-jet Untagged-jet collision point Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  26. Probability Density Estimation Approximate a density by a sum over kernels K(.), one placed at each of the N points xi of the training sample. h is one or more smoothing parameters adjusted to provide the best approximation to the true density p(x). If h is too small, the model will be very spiky; ifh is too large, features of the density p(x) will be lost. Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  27. Probability Density Estimation Why does this work? Consider the limit as N -> ∞ of In the limit N -> ∞, the true density p(x) will be recovered provided that h -> 0 in such a way that Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  28. Probability Density Estimation As long as the kernel behaves sensibly in the N -> ∞ limit any kernel will do. In practice, the most commonly used kernel is the product of 1-D Gaussians, one for each dimension “i”: One advantage of the PDE approximation is that it contains very few adjustable parameters: basically, the smoothing parameters. Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  29. Example 3: “Tagging” Jets Projections of estimated p(T|x) (black curve) onto the PT, h and f axes. Blue points: ratio of blue to red histograms (see slide 25) Tagged-jet collision point Untagged-jet Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  30. Example 3: “Tagging” Jets Projections of data weighted by p(T|x). Recovers tagged density p(x|T). Tagged-jet Untagged-jet collision point Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  31. But, How Well Does It Work? How well do the n-D model and the n-D data agree? A thought (JL, HBP): 1. Project the model and the data onto the same set of randomly directed rays through the origin. 2. Compute some measure of discrepancy for each pair of projections. 3. Do something sensible with this set of numbers!! Tagged-jet Untagged-jet collision point Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  32. But, How Well Does It Work? Projections of p(T|x) onto 3 randomly chosen rays through the origin. Tagged-jet Untagged-jet collision point Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  33. But, How Well Does It Work? Projections of weighted tagged + untagged data onto the 3 randomly selected rays. Tagged-jet Untagged-jet collision point Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

  34. Summary • Multivariate methods have been applied with considerable success in particle physics, especially for classification. However, there is considerable room for improving our understanding of them as well as expanding their domain of application. • The main challenge is data/model comparison when each datum is a point in 1…20 dimensions. During the SAMSI workshop we hope to make some progress on the use of projections onto multiple rays. This may be an interesting area for collaboration between physicists and statisticians. Signal/Background Discrimination Harrison B. Prosper SAMSI, March 2006

More Related