Multivariate Methods in HEP

Multivariate Methods in HEP Pushpa Bhat Fermilab Pushpa Bhat, Fermilab

Outline • Introduction/History • Physics Analysis Examples • Popular Methods • Likelihood Discriminants • Neural Networks • Bayesian Learning • Decision Trees • Future • Issues and Concerns • Summary Pushpa Bhat, Fermilab

Some History • In 1990 most of the HEP community was skeptical towards use of multivariate methods, particularly so in case of neural networks (NN) • NN as a black box • Can’t understand weights • Nonlinear mapping; higher order correlations • Though mathematical function can’t explain in terms of physics • Can’t calculate systematic errors reliably  Uni-variate or “cut-based” analysis was the norm • Some were pursuing application of neural network methods to HEP around 1990 • Peterson, Lonnblad, Denby, Becks, Seixas, Lindsey, etc • First AIHENP (Artificial Intelligence in High Energy & Nuclear Physics) workshop was in 1990. • Organizers included D. Perret-Gallix, K.H. Becks, R. Brun, J.Vermaseren. AIHENP metamorphosed into ACAT ten years later, in 2000 • Multivariate methods such as Fisher discriminants were in limited use. • In 1990, I began to pursue the use of multivariate methods, especially NN, in top quark searches at Dzero. Pushpa Bhat, Fermilab

Mid-1990’s • LEP experiments had been using NN and likelihood discriminants for particle-ID applications and eventually for signal searches (Steinberger; tau-ID) • H1 at HERA successfully implemented and used NN for triggering (Kiesling). • Hardware NN was attempted at Fermilab at CDF • Fermilab Advanced Analysis Methods Group brought CDF and DØ together for discussion of these methods and applications in physics analyses. Pushpa Bhat, Fermilab

The Top QuarkPost-Evidence, Pre-Discovery ! P. Bhat, DPF94 Fisher Analysis of tte channel One candidate event (S/B)(mt = 180 GeV) = 18 w.r.t. Z = 10 w.r.t WW NN Analysis tt  e+jets channel tt W+jets tt160 Data W+jets Pushpa Bhat, Fermilab

Signal Jan. ’95 (Aspen) cut Background Mar. ’95 Discovery cut Contours: Possible NN cuts Feb. ‘95 S/B reach with 2-v NN analysis for similar efficiency S/B (Feb-Mar, 95 -Discovery Conventional cut) (Jan, 95 –Aspen mtg. Conventional cut) Sig. Eff. Cut Optimization for Top Discovery Feb. ‘95 P. Bhat, H.Prosper, E. Amidi D0 Top Marathon, Feb. ‘95 Neural Network Equi-probability Contour cuts from 2-variable analysis compared with conventional cuts used in Jan. ’95 and in Observation paper Pushpa Bhat, Fermilab Sig. Eff.

Measurement of the Top Quark Mass First significant physics result using multivariate methods DØ Lepton+jets Discriminant variables The Discriminants mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2 Run I (1996) result with NN and likelihood Recent (CDF+D0) mt measurement: mt= 171.4 ± 2.1 Gev/c2 Fit performed in 2-D: (DLB/NN, mfit) Pushpa Bhat, Fermilab

Higgs, the Holy Grail of HEPDiscovery Reach at the Tevatron • The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis • Improved bb mass resolution & b-tag efficiency crucial Run II Higgs study hep-ph/0010338 (Oct-2000) P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022 Pushpa Bhat, Fermilab

Then, it got easier • One of the important steps in getting the NN accepted at the Tevatron experiments was to make the Bayesian connection. • Another important message to drive home was “the maximal use of information in the event” for the job at hand • Developed a random grid search technique that can be used as baseline for comparison • Neural network methods now have become popular due to the ease of use, power and many successful applications Maybe too easy?? Pushpa Bhat, Fermilab

r(x,y) = constant defines an optimal decision boundary Feature space Optimal Event Selection S = B = Conventional cuts Pushpa Bhat, Fermilab

The NN-Bayesian Connection Output of a feed forward neural network can approximate the posterior probability P(s|x1,x2). Pushpa Bhat, Fermilab

Limitations of “Conventional NN” • The training yields one set of weights or network parameters • Need to look for “best” network, but avoid overfitting • Heuristic decisions on network architecture • Inputs, number of hidden nodes, etc. • No direct way to compute uncertainties Pushpa Bhat, Fermilab

Ensembles of Networks NN1 y1 NN2 y2 X NN3 y3 NNM yM Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network. Pushpa Bhat, Fermilab

Bayesian Learning • The result of Bayesian training is a posteriordensity of the network weights  P(w|training data) • Generate a sequence of weights (network parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points: Pushpa Bhat, Fermilab

Bayesian Learning – 2 • Advantages • Less prone to over-fitting • Less need to optimize the size of the network. Can use a large network! Indeed, number of weights can be greater than number of training events! • In principle, provides best estimate of p(t|x) • Disadvantages • Computationally demanding! • The dimensionality of the parameter space is, typically, large • There could be multiple maxima in the likelihood function p(t|x,w), or, equivalently, multiple minima in the error function E(x,w). Pushpa Bhat, Fermilab

Example: Single Top Search • Training Data • 2000 events (1000 tqb-m + 1000 Wbb-m) • Standard set of 11 variables • Network • (11, 30, 1) Network (391 parameters!) • Markov Chain Monte Carlo (MCMC) • 500 iterations, but use last 100 iterations • 20 MCMC steps per iteration • NN-parameters stored after each iteration • 10,000 steps • ~ 1000 steps / hour (on 1 GHz, Pentium III laptop) Pushpa Bhat, Fermilab

Example: Single Top Search Signal:tqb; Background:Wbb Distributions Pushpa Bhat, Fermilab

Pushpa Bhat, Fermilab

Decision Trees DØ single top analysis • Recover events that fail criteria in cut-based analyses • Start at first “node” with a fraction of the “training sample” • Select best variable and cut with best separation to produce two “branches ” of events, (F)ailed and (P)assed cut • Repeat recursively on successive nodes • Stop when improvement stops or when too few events are left • Terminal node is called a “leaf ” with purity = Ns/(Ns+Nb) • Run remaining events and data through the tree to derive results • Boosting DT: • Boosting is a recently developed technique that improves any weak classifier (decision tree, neural network, etc) • Boosting averages the results of many trees, dilutes the discrete nature of the output, improves the performance Pushpa Bhat, Fermilab

Matrix Element MethodExample: Top mass measurement • Maximal use of information in each event by calculating event-by-event signal and background probabilities based on the respective matrix element x: reconstructed kinematic variables of final state objects JES: jet energy Scale from Mw constraint • Signal and background probabilities from differential cross sections • Write combined likelihood for all events • Maximize likelihood w.r.t. mtop, JES Pushpa Bhat, Fermilab

Summary • Multivariate methods are now used extensively in HEP data analysis • Neural networks, because of their ease of use and power, are favorites for particle-ID and signal/background discrimination • Bayesian neural networks take us one step closer to optimization • Likelihood discriminants and Decision trees are becoming popular because they are easier to “defend” (no “black-box” stigma) • Many issues remain to be addressed as we get ready to deploy the multivariate methods for discoveries in HEP Pushpa Bhat, Fermilab

No amount of experimentation can ever prove me right; a single experiment can prove me wrong. - Albert Einstein Nothing tends so much to the advancement of knowledge as the application of a new instrument - Humphrey Davy Pushpa Bhat, Fermilab

CDF CDF World’s Highest Energy Laboratory(for now) DØ DØ DØ Booster Pushpa Bhat, Fermilab

Circumference = 27km Beam Energy = 7.7 TeV Luminosity =1.65x1034 cm-2sec-1 Startup date: 2007 LHC Magnet LHC Tunnel p p CMS LHC Ring TI 8 SPS Ring TI 2 PS Our Fancy New Toys The Large Hadron Collider Pushpa Bhat, Fermilab

14 TeV Proton Proton colliding beams LHC Environment Pushpa Bhat, Fermilab

CMS Silicon Tracker Challenges Pushpa Bhat, Fermilab

CMS Si Tracker Outer Barrel (TOB) Pixels Inner Barrel & Disks (TIB & TID) 2,4 m 5.4 m Pushpa Bhat, Fermilab

Lots of Silicon 214m2 of silicon sensors11.4 million silicon strips66 million pixels! Pushpa Bhat, Fermilab

Si Tracker Challenges • Large and complex system • 77.4 million total channels (out of a total of 78.2 M for experiment) • Detector monitoring, data organization, data quality monitoring, analysis, visualization, interpretation all daunting! • Need to monitor every channel and make sure most of the detector is working at all times (live fraction of the detector and efficiencies bound to decrease with time) • Need to verify data integrity and data quality for physics • Diagnose and fix problems ASAP • Keep calibration and alignment parameters current Pushpa Bhat, Fermilab

Detector/Data Monitoring • Monitor • Environmental variables • Temperatures, coolant flow rates, interlocks, radiation doses • Hardware status • Voltages, currents • Channel Data • Readout states, Errors, missing data/channels, bad ID for channel/module • many kinds to be categorized and tracked and displayed • should be able to find rare problems/errors (with low occurrence rate) that may corrupt data Problems (Rare problems may indicate a developing failure mode or hidden bad behavior) • Correlate problem/noisy channels with history, temperature, currents, etc. Pushpa Bhat, Fermilab

Data Quality Monitoring • Monitor • Raw Data • Pedestals, noise, adc counts, occupancies, efficiencies • Processed high level objects • Clusters, tracks, etc. • Evaluate thousands of histograms • Can’t visually examine all • Automatically evaluate histograms by comparing to reference histograms • Adaptive, efficient, find evolving patterns over time • Quantiles? q-q plots/comparison instead of KS test? • A variety of 2D “heat” maps • Occupancies, #of bad channels/module, #of errors/module, etc. • Typical occupancy ~ 2% in strip tracker • 200,000 channels written out 100 times/sec Pushpa Bhat, Fermilab

Module Assembly Precision Example of a “Heat” map Pushpa Bhat, Fermilab

Need smart approaches • What are the best techniques for data-mining? • To organize data for analysis and data visualization • complex geometry/addressing makes visualization difficult • For finding problematic channels quickly, efficiently  clustering, exploratory data-mining • For finding anomalies, corrupt data, patterns of behavior • Feature-finding algorithms, superpose many events, time evolution, spatial and temporal correlations • Noise Correlations • Via correlation coefficients of defined groups • Correlate to history (time variations), environmental variables Pushpa Bhat, Fermilab

Data Visualization • Based on hierarchical/geometrical structure of the tracker • Display every channel, attach objects/info to each Sub-structures Layers/rings Modules Readout Chips Pushpa Bhat, Fermilab

Multivariate Analysis Issues • Dimensionality Reduction • Choosing Variables optimally without losing information • Choosing the right method for the problem • Controlling Model Complexity • Testing Convergence • Validation • Given a limited sample what is the best way? • Computational Efficiency Pushpa Bhat, Fermilab

Multivariate Analysis Issues • Correctness of modeling • How do we make sure the multivariate modeling is correct? • The data used for training or building PDEs represent reality. Is it sufficient to check the modeling in the mapped variable? Pair-wise correlations? Higher order correlations? • How do we show that the background is modeled well? How do we quantify the correctness of modeling? • In conventional analysis, we normally look for variables that are well modeled in order to apply cuts • How well is the background modeled in the signal region? • Worries about hidden bias • Worries about underestimating errors Pushpa Bhat, Fermilab

Sociological Issues • We have been conservative in the use of MV methods for discovery. • We have been more aggressive in the use of MV methods for setting limits. • But discovery is more important and needs all the power you can muster! • This is expected to change at LHC. Pushpa Bhat, Fermilab

Summary • The next generation of experiments will need to adopt advanced data mining and data analysis techniques • Conventional/routine tasks such as alignment, detector performance and data quality monitoring and data visualization will be challenging and require new approaches • Many issues regarding use of multivariate methods of data analysis for discoveries and measurements need to be addressed to make optimal use of data Pushpa Bhat, Fermilab

MV: Where can we use them? • Almost everywhere since HEP events are multivariate • Improve several aspects of analysis • Event selection • Triggering, Real-time Filters, Data Streaming • Event reconstruction • Tracking/vertexing, particle ID • Signal/Background Discrimination • Higgs discovery, SUSY discovery, Single top, … • Functional Approximation • Jet energy corrections, tag rates, fake rates • Parameter estimation • Top quark mass, Higgs mass, SUSY model parameters • Data Exploration • Knowledge Discovery via data-mining • Data-driven extraction of information, latent structure analysis Pushpa Bhat, Fermilab

Multivariate Methods in HEP

Multivariate Methods in HEP

Presentation Transcript

Methods for Multivariate Analysis

Multivariate Methods

Statistical Methods in HEP

Multivariate Methods

Multivariate Methods

Multivariate Methods

Bayesian statistical methods for HEP

Multivariate Methods

Multivariate Methods

Multivariate statistical methods

Multivariate Methods

Multivariate statistical methods

Multivariate Data Analysis in HEP. Successes, challenges and future outlook

Statistical Methods in Particle Physics Day 3: Multivariate Methods (II)

Multivariate Methods in Particle Physics Today and Tomorrow

Discussion on multivariate methods

Multivariate Methods

Statistical Methods in Particle Physics Day 3: Multivariate Methods (II)

Multivariate Methods

Bayesian statistical methods for HEP