180 likes | 370 Views
T MVA. Andreas H öcker (CERN). CERN meeting, Oct 6 , 2006. Toolkit for Multivariate Data Analysis. MVA Experience. Many (more and more) HEP data analyses uses multivariate techniques. Often analysts use custom tools, without much comparison
E N D
TMVA Andreas Höcker (CERN) CERN meeting, Oct 6, 2006 Toolkit for Multivariate Data Analysis
MVA Experience • Many (more and more) HEP data analyses uses multivariate techniques • Often analysts use custom tools, without much comparison • MVAs tedious to implement & debug, therefore few comparisons between methods ! • most accepted: cuts • also widely understood and accepted: likelihood (probability density estimators – PDE ) • often disliked, but more and more used (LEP, BABAR): Artificial Neural Networks • much used in BABAR and Belle: Fisher discriminants • introduced by D0 (for electron id): H-Matrix • used by MiniBooNE and recently by BABAR: Boosted Decision Trees • All interesting methods … but how to dissipate the widespread skepticism ? • black boxes ! • what if the training samples incorrectly describe the data ? • how can one evaluate systematics ? Andreas Höcker –
What is TMVA • Toolkit for Multivariate Analysis (TMVA):provides aROOT-integrated environment for the parallel processing and evaluation of MVA techniques to discriminate signal from background samples. Note that TMVA does not seek to promote a particular method ! • TMVA presently includes (ranked by complexity): • Rectangular cut optimisation • Likelihood estimator (PDE approach) • Multi-dimensional likelihood estimator (PDE - range-search approach) • Fisher discriminant • H-Matrix approach (2 estimator) • Multilayer Perceptron Artificial Neural Network (three different implementations) • Boosted Decision Trees • RuleFit • upcoming: Support Vector Machines, “Committee” methods • Common preprocessing of input data: decorrelation, principal-component analysis • The TMVA analysis provides training, testing and evaluation of the MVAs • Each MVA method provides a ranking of the input variables • MVAs produce weight files that are read by reader class for actual MVA analysis Andreas Höcker –
TMVA Technicalities • TMVA is a sourceforge (SF) package to accommodate world-wide access • code can be downloaded as tar file, or via anonymous or developer cvs access • home page …………………………… http://tmva.sourceforge.net/ • SF project page ……………………… http://sourceforge.net/projects/tmva • view CVS …………………………….. http://tmva.cvs.sourceforge.net/tmva/TMVA/ • mailing lists …………………………… http://sourceforge.net/mail/?group_id=152074 • ROOT::TMVA class index…………... http://root.cern.ch/root/htmldoc/TMVA_Index.html • TMVA is written in C++ and is built upon ROOT • Since 5.11/03 integrated and distributed within ROOT • TMVA is modular • training, testing and evaluation factory iterates over all registered methods • each method has specific options that can be conveniently set by the user • ROOT scripts are provided for most important performance assessments • We enthusiastically (!) welcome new users and developers • at present: 18 SF-registered TMVA developers ! (http://sourceforge.net/project/memberlist.php?group_id=152074) Andreas Höcker –
TMVAMethods Andreas Höcker –
Cut Optimisation • Simplest method: cut in rectangular volume using Nvarinput variables • Usually training files in TMVA do not contain realistic signal and background abundance cannot optimize for best significance • scan in signal efficiency [0 1] and maximise background rejection • Technical problem: how to perform maximisation • Minuit fit (SIMPLEX) found to be not reliable enough • Robust, but suboptimal: random sampling • New available techniques: Genetics Algorithm and Simulated Annealing (SA not yet optimal) • Huge speed improvement by sorting training events in Nvar-dim. Binary Trees • for 4 variables: 41 times faster than simple volume cut Andreas Höcker –
automatic, unbiased, but suboptimal difficult to automate Projected Likelihood Estimator (PDE Approach) • Combine probability density distributions to likelihood estimator PDFs discriminating variables Likelihood ratio for event ievent Species: signal, background types • Assumes uncorrelated input variables • optimal MVA approach if true, since containing all the information • performance reduction if not true reason for development of other methods! • Technical problem: how to implement reference PDFs • 3 ways: function fittingparametric fitting (splines, kernel est.)counting easy to automate, can create artefacts; currently in TMVA: Splines0-5, Gaussian KDEs Andreas Höcker –
Preprocessing the Input Variables: Decorrelation • Linear correlations can be removed by diagonalising the input variable space • Determine square-rootC of correlation matrix C, i.e., C = CC • compute C by diagonalising C: • transformation from original(x)in de-correlated variable space(x)by:x = C1x • Various ways to choose diagonalisation (also implemented: principal component analysis) • Note that this “de-correlation” is only complete, if: • input variables are Gaussians • correlations linear only • in practise: gain form de-correlation often rather modest – or even harmful! • Commonly realised for all methods in TMVA (centrally in DataSet class) Andreas Höcker –
Multidimensional Likelihood Estimator • Generalisation of 1D PDE approach to Nvar dimensions • Optimal method – in theory – since full information is used • Practical challenges: • parameterisation of multi-dimensional phase space needs huge training samples • implementation of Nvar-dim. reference PDF with kernel estimates or counting • for kernel estimates: difficult to control fidelity of parameterisation • TMVA implementation following Range-Search method • count number of signal and background events in “vicinity” of data event • “vicinity” defined by fixed or adaptiveNvar-dim. volume size • volumes can be rectangular or elliptical • use multi-D kernels (Gaussians, triangular, …) to weight events within volume • speed up range search by sorting training events in Binary Trees Carli-Koblitz, NIM A501, 576 (2003) Andreas Höcker –
Fisher Discriminant (and H-Matrix) • Well-known, simple and elegant MVA method: event selection is performed in a transformed variable space with zero linear correlations, by distinguishing the mean values of the signal and background distributions • Instead of equations, words: • optimal for linearly correlated Gaussians with equal RMS’ and different means • no separation if equal means and different RMS (shapes) An axis is determined in the (correlated) hyperspace of the input variables such that, when projecting the output classes (signal and background) upon this axis, they are pushed as far as possible away from each other, while events of a same class are confined in a close vicinity. The linearity property of this method is reflected in the metric with which "far apart" and "close vicinity" are determined: the covariance matrix of the discriminant variable space. • Computation of Fisher MVA couldn’t be simpler: “Fisher coefficents” • H-Matrix estimator: correlated 2 – poor man’s variation of Fisher discriminant Andreas Höcker –
1 input layer k hidden layers 1 ouput layer ... 1 1 1 2 output classes (signal and background) . . . . . . . . . Nvar discriminating input variables i j Mk . . . . . . N M1 (“Activation” function) with: Artificial Neural Network (ANN) • ANNs are non-linear discriminants: Fisher= ANN without hidden layer • ANNs seem to be better adapted to realistic use cases than Fisher and Likelihood • TMVA has three different ANN implementations – all are Multilayer Perceptrons • Clermont-Ferrand ANN: used for ALEPH Higgs analysis; translated from FORTRAN • TMultiLayerPerceptron interface: ANN implemented in ROOT • MLP-ANN (new): similar to TMultiLayerPerceptron but faster and more flexible • recent comparison on tau-ID in ATLAS with Stuttgart NN confirmed excellent performance! Multilayer Perceptron Andreas Höcker –
Decision Trees • Decision Trees: a sequential application of “cuts” which splits the data into nodes, and the final nodes (leaf) classifies an event as signal or background Decision tree ROOT-Node Nsignal Nbkg depth: 1 • Pros: • robust out-of-the-box method • immune against addition of weak variables Node Nsignal Nbkg Leaf-Node Nsignal<Nbkg bkg depth: 2 • Contras: • instability: small changes in training sample can give large changes in tree structure • overtraining: requires huge input statistics Leaf-Node Nsignal >Nbkg signal Node Nsignal Nbkg • Boosting (bagging, …), pruning: • combine several decision trees into a forest, where each tree is optimised for different event weights; can also give overall weights to trees in forest • pruning: remove statistically insignificant nodes depth: 3 Leaf-Node Nsignal<Nbkg bkg Leaf-Node Nsignal >Nbkg signal Andreas Höcker –
RuleFit • Ref: Friedman-Popescu: http://www-stat.stanford.edu/~jhf/R-RuleFit.html • Model is a linear combination of rules;a rule is a sequence of cuts: score coefficients rules data • The problem to solve is: • create the rule ensemble created from a set of decision trees • fit the coefficients “Gradient direct regularization” (Friedman et al) • Fast, robust and good performance Test on 4k simulated SUSY + top events with 4 discriminating variables Methods tested: • Multi Layer Perceptron (MLP) • Fisher • RuleFit Andreas Höcker –
TMVA– an example Andreas Höcker –
A Truly Academic Example • Simple toy to illustrate the strength of the de-correlation technique • 4 linearly correlated Gaussians, with equal RMS and shifted means between S and B TMVA output : --- TMVA_Factory: correlation matrix (signal): ---------------------------------------------------------------- --- var1 var2 var3 var4 --- var1: +1.000 +0.336 +0.393 +0.447 --- var2: +0.336 +1.000 +0.613 +0.668 --- var3: +0.393 +0.613 +1.000 +0.907 --- var4: +0.447 +0.668 +0.907 +1.000 --- TMVA_MethodFisher: ranked output (top --- variable is best ranked) ---------------------------------------------------------------- --- Variable : Coefficient: Discr. power: ---------------------------------------------------------------- --- var4 : +8.077 0.3888 --- var3 : 3.417 0.2629 --- var2 : 0.982 0.1394 --- var1 : 0.812 0.0391 Andreas Höcker –
Quality Assurance Likelihood reference PDFs MLP convergence MLP architecture Andreas Höcker –
Fisher, LikelihoodD, ANNs are optimal The Output • TMVA output distributions for Fisher, Likelihood, BDT and MLP Note: About All Realistic Use Cases are Much More Difficult Than This One Andreas Höcker –
Concluding Remarks • TMVA is still a young project ! • TMVA is open source: we want to gather as much development help as possible (from all areas, including real experts from statistics faculties – there is no one so far!) to make it to a reference tool in HEP • New MVA methods are important, but the emphasis lies on the consolidation of the currently implemented methods: • Implement fully independent, phase-space-dependent MVAs • Implement use of event weights throughout all methods (missing: Cuts, Fisher, H-Matr., RuleFit) • Improve kernel estimators • Improve decorrelation techniques • Expect performance gain from combination of MVAs in Committees • Although the number of users rises, still need more real-life experience ! • Need to write a manual ! Andreas Höcker –