1 / 26

Bayesian Neural Networks

Bayesian Neural Networks. Pushpa Bhat Fermilab Harrison Prosper Florida State University. Outline. Introduction Bayesian Learning Simple Examples Summary. Multivariate Methods. Since the early 1990’s, we have used multivariate methods extensively in Particle Physics Some examples

luisa
Download Presentation

Bayesian Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University

  2. Outline • Introduction • Bayesian Learning • Simple Examples • Summary PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  3. Multivariate Methods • Since the early 1990’s, we have used multivariate methods extensively in Particle Physics • Some examples • Particle ID and signal/background discrimination • Optimization of cuts for top quark discovery at DØ • Precision measurement of top mass • Searches for leptoquarks, technicolor, .. • Neural network methods have become popular due to ease of use, power and successful applications PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  4. Why Multivariate Methods? • Improve several aspects of analysis • Event selection • Triggering, Real-time Filters, Data Streaming • Event reconstruction • Tracking/vertexing, particle ID • Signal/Background Discrimination • Higgs discovery, SUSY discovery, Single top, … • Functional Approximation • Jet energy corrections, tag rates, fake rates • Parameter estimation • Top quark mass, Higgs mass, SUSY model parameters • Data Exploration • Knowledge Discovery via data-mining • Data-driven extraction of information, latent structure analysis PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  5. Multi Layer Perceptron • A popular and powerful neural network model: ji kj k i j Need to find ’s and ’s, the free parameters of the model PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  6. The Bayesian Connection Output of a feed forward neural network can approximate the posterior probability P(s|x1,x2). PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  7. The Top QuarkPost-Evidence, Pre-Discovery ! P. Bhat, DPF94 Fisher Analysis of tte channel One candidate event (S/B)(mt = 180 GeV) = 18 w.r.t. Z = 10 w.r.t WW NN Analysis tt  e+jets channel tt W+jets tt160 Data W+jets PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  8. Measuring the Top Quark Mass DØ Lepton+jets Discriminant variables The Discriminants mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2 Fit performed in 2-D: (DLB/NN, mfit) PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  9. Higgs Discovery Reach • The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis • Improved bb mass resolution & b-tag efficiency crucial Run II Higgs study hep-ph/0010338 (Oct-2000) P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  10. Limitations of “Conventional NN” • The training yields one set of weights or network parameters • Need to look for “best” network, but avoid overfitting • Heuristic decisions on network architecture • Inputs, number of hidden nodes, etc. • No direct way to compute uncertainties PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  11. Ensembles of Networks NN1 y1 NN2 y2 X NN3 y3 NNM yM Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network. PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  12. Bayesian Learning • The result of Bayesian training is a posteriordensity of the network weights  P(w|training data) • Generate a sequence of weights (network parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points: PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  13. Bayesian Learning – 2 • Advantages • Less prone to over-fitting, because of Bayesian averaging. • Less need to optimize the size of the network. Can use a large network! Indeed, number of weights can be greater than number of training events! • In principle, provides best estimate of p(t|x) • Disadvantages • Computationally demanding! PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  14. Bayesian Learning – 3 • Computationally demanding because • The dimensionality of the parameter space is, typically, large. • There could be multiple maxima in the likelihood function p(t|x,w), or, equivalently, multiple minima in the error function E(x,w). PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  15. Bayesian Neural Networks – 1 • Basic Idea • Compute • Then estimate p(t|xnew) by averaging over NNs Likelihood Prior PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  16. Bayesian Neural Networks – 2 • Likelihood • Where ti = 0 or 1 for background/signal • Prior PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  17. Bayesian Neural Networks – 3 • Computational method • Generate a Markov chain (MC) of N points {w} from the posterior density p(w|x) and average over last K • Markov Chain Monte Carlo software from http://www.cs.toronto.edu/~radford/fbm.software.html by Radford Neal PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  18. Bayesian Neural Networks – 4 • Treat sampling of posterior density as a problem in Hamiltonian dynamics in which the phase space (p,q) is explored using Markov techniques PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  19. A Simple Example • Signal • ppbar tqb (m channel) • Background • ppbar Wbb • NN Model • (1, 15, 1) • MCMC • 5000 tqb + Wbb events • Use last 20 networks in a MC chain of 500. Wbb tqb HT_AllJets_MinusBestJets (scaled) PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  20. A Simple Example Estimate of Prob(s|HT) Blue dots: p(s|HT) = Htqb/(Htqb+HWbb) Curves: (individual NNs) y(HT, wn) Black curve: < y(HT, w) > PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  21. Example: Single Top Search • Training Data • 2000 events (1000 tqb-m + 1000 Wbb-m) • Standard set of 11 variables • Network • (11, 30, 1) Network (391 parameters!) • Markov Chain Monte Carlo (MCMC) • 500 iterations, but use last 100 iterations • 20 MCMC steps per iteration • NN-parameters stored after each iteration • 10,000 steps • ~ 1000 steps / hour (on 1 GHz, Pentium III laptop) PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  22. Signal/Bkgd. Distributions PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  23. PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  24. Weighting with NN output • Number of data events: • Create weighted histograms of variables PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  25. Weighted Distributions Magenta: Weighting signal only;Blue: Weighting signal & background Black: Un-weighted signal distribution PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

  26. Summary • Bayesian learning of neural networks takes us another step closer to realizing optimal results in classification (or density estimation) problems. It allows a fully probabilistic approach with proper treatment of uncertainties. • We have started to explore Bayesian neural networks and the initial results are promising, though computationally challenging. PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

More Related