330 likes | 525 Views
T MVA Toolkit for Multivariate Data Analysis with ROOT. Helge Voss, MPI-K Heidelberg on behalf of: Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer*. Supply an environment to easily: apply different sophisticated data selection algorithms have them all trained, tested and evaluated
E N D
TMVA Toolkit for Multivariate Data Analysis with ROOT Helge Voss, MPI-K Heidelberg on behalf of:Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer* Supply an environment to easily: • apply different sophisticated data selection algorithms • have them all trained, tested and evaluated • find the best one for your selection problem and contributors: A.Christov, S.Henrot-Versillé, M.Jachowski, A.Krasznahorkay Jr., Y.Mahalalel, X.Prudent, P.Speckmayer, M.Wolter, A.Zemla http://tmva.sourceforge.net/ arXiv: physics/0703039 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Motivation/Outline ROOT: is the analysis framework used by most (HEP)-physicists Idea: rather than just implementing new MVA techniques and making them somehow available in ROOT (i.e. like TMulitLayerPercetron does): • have one common platform/interface for all MVA classifiers • easy to use and compare different MVA classifiers • train/test on same data sample and evaluate consistently Outline: • introduction • the MVA classifiers available in TMVA • demonstration with toy examples • summary TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Multivariate Event Classification • All multivariate classifiers condense (correlated) multi-variable input information into a single scalar output variable: Rn R y(Bkg) 0 y(Signal) 1 One variable to base your decision on … TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
What is in TMVA • TMVA currently includes: • Rectangular cut optimisation • Projective and Multi-dimensional likelihood estimator • Fisher discriminant and H-Matrix (2 estimator) • Artificial Neural Network (3 different implementations) • Boosted/bagged Decision Trees • Rule Fitting • Support Vector Machines • all classifiers are highly customizable • common pre-processing of input: de-correlation, principal component analysis • support of arbitrary pre-selections and individual event weights • TMVA package provides training, testing and evaluation of the classifiers • each classifier provides a ranking of the input variables • classifiers produce weight files that are read by reader class for MVA application • integrated in ROOT(since release 5.11/03) and very easy to use! TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Preprocessing the Input Variables: Decorrelation • Commonly realised for all methods in TMVA (centrally in DataSet class): • Removal of linear correlations by rotating variables • using the square-rootof the correlation matrix • using the Principal Component Analysis SQRT derorr. PCA derorr. original • Note that this “de-correlation” is only complete, if: • input variables are Gaussians • correlations linear only • in practise: gain form de-correlation often rather modest – or even harmful TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Cut Optimisation • Simplest method: cut in rectangular volume using • scan in signal efficiency [0 1] and maximise background rejection • from this scan, the optimal working point in terms if S,B numbers can be derived • Technical problem: how to perform optimisation • TMVA uses: random sampling, Simulated Annealingor Genetics Algorithm • speed improvement in volume search: training events are sorted in Binary Seach Trees • do this in normal variable space or de-correlated variable space TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Projected Likelihood Estimator (PDE Appr.) PDFs discriminating variables Likelihood ratio for event ievent Species: signal, background types automatic,unbiased, but suboptimal easy to automate, can create artefacts TMVA uses: Splines0-5, Kernel estimators difficult to automate • Combine probability from different variables for an event to be signal or background like • Optimal if no correlations and PDF’s are correct (known) • usually it is not true development of different methods • Technical problem: how to implement reference PDFs • 3 ways: counting, function fitting , parametric fitting (splines, kernel estimators.) TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Multidimensional Likelihood Estimator • Generalisation of 1D PDE approach to Nvar dimensions • Optimal method – in theory – if “true N-dim PDF” were known • Practical challenges: • derive N-dim PDF from training sample x2 S • TMVA implementation: Range search PDERS • count number of signal and background events in “vicinity” of a data event fixed size or adaptive (latter one = kNN-type classifiers) test event B x1 • volumes can be rectangular or spherical • use multi-D kernels (Gaussian, triangular, …) to weight events within a volume • speed up range search by sorting training events in Binary Trees Carli-Koblitz, NIM A501, 576 (2003) TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Fisher Discriminant (and H-Matrix) • Well-known, simple and elegant classifier: • determine linear variable transformation where: • linear correlations are removed • mean values of signal and background are “pushed” as far apart as possible • the computation of Fisher response is very simple: • linear combination of the event variables * Fisher coefficients “Fisher coefficients” TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Artificial Neural Network (ANN) 1 input layer k hidden layers 1 ouput layer ... 1 1 1 2 output classes (signal and background) . . . . . . . . . Nvar discriminating input variables i j Mk Feed-forward Multilayer Perceptron . . . . . . N M1 with: • Get a non-linear classifier response by giving linear combination of input variables to nodes with non-linear activation • Nodes (or neurons) and arranged in series Feed-Forward Multilayer Perceptrons (3 different implementations in TMVA) (“Activation” function) • Training: adjust weights using known event such that signal/background are best separated TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Decision Trees Decision Trees • sequential application of “cuts” which splits the data into nodes, and the final nodes (leaf) classifies an event as signalorbackground • Training: growing a decision tree • Start with Root node • Split training sample according to cut on best variable at this node • Splitting criterion: e.g., maximum “Gini-index”: purity (1– purity) • Continue splitting until min. number of events or max. purity reached • Classify leaf node according to majority of events, or give weight; unknown test events are classified accordingly Decision tree after pruning Decision tree before pruning • Bottom up Pruning: • remove statistically insignificant nodes avoid overtraining TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Boosted Decision Trees Boosted Decision Trees • Decision Trees: well know since a long time but hardly used in HEP (although very similar to “simple Cuts”) • Disatvantage: instability: small changes in training sample can give large changes in tree structure • Boosted Decision Trees (1996): combine several decision trees: forest • classifier output is the (weighted) majority vote of individual trees • trees derived from same training sample with different event weights • e.g. AdaBoost: wrong classified training events are given a larger weight • bagging (re-sampling with replacement) random weights • Remark: bagging/boosting create a basis of classifiers final classifier is a linear combination of base classifiers TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Rule Fitting(Predictive Learning via Rule Ensembles) RuleFit classifier rules (cut sequence rm=1 if all cuts satisfied, =0 otherwise) normalised discriminating event variables Sum of rules Linear Fisher term • Following RuleFit from Friedman-Popescu: Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003 • Classifier is a linear combination of simple base classifiers that are called rules and are here: sequences of cuts: • The procedure is: • create the rule ensemble created from a set of decision trees • fit the coefficients “Gradient directed regularization” (Friedman et al) TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Support Vector Machines x3 x2 x2 x1 x1 • Find hyperplane that best separates signal from background • best separation: maximum distance between closest events (support) to hyperplane • linear decision boundary x2 • Non linear cases: • transform the variables in higher dimensional feature space where linear boundary (hyperplanes) can separate the data • transformation is done implicitly using Kernel Functions that effectively introduces a metric for the distance measures that “mimics” the transformation • Choose Kernel and fit the hyperplane x1 Available Kernels: Gaussian, Polynomial, Sigmoid x1 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
A Complete Example Analysis create Factory give training/test trees tell which variables (example uses variables not directly avaiable in the tree:i.e.” var1+var2”) select the MVA methods train,test and evaluate void TMVAnalysis( ) { TFile* outputFile = TFile::Open( "TMVA.root", "RECREATE" ); TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile,"!V"); TFile *input = TFile::Open("tmva_example.root"); TTree *signal = (TTree*)input->Get("TreeS"); TTree *background = (TTree*)input->Get("TreeB"); factory->AddSignalTree ( signal, 1. ); factory->AddBackgroundTree( background, 1.); factory->AddVariable("var1+var2", 'F'); factory->AddVariable("var1-var2", 'F'); factory->AddVariable("var3", 'F'); factory->AddVariable("var4", 'F'); factory->PrepareTrainingAndTestTree("", "NSigTrain=3000:NBkgTrain=3000:SplitMode=Random:!V" ); factory->BookMethod( TMVA::Types::kLikelihood, "Likelihood", "!V:!TransformOutput:Spline=2:NSmooth=5:NAvEvtPerBin=50" ); factory->BookMethod( TMVA::Types::kMLP, "MLP", "!V:NCycles=200:HiddenLayers=N+1,N:TestRate=5" ); factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); outputFile->Close(); delete factory;} TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Example Application set tree variables (example uses variables not directly avaiable in the tree) create Reader tell it about the variables selected MVA method event loop calculate the MVA response void TMVApplication( ) { TMVA::Reader *reader = new TMVA::Reader("!Color"); Float_t var1, var2, var3, var4; reader->AddVariable( "var1+var2", &var1 ); reader->AddVariable( "var1-var2", &var2 );reader->AddVariable( "var3", &var3 ); reader->AddVariable( "var4", &var4 ); reader->BookMVA( "MLP method", "weights/MVAnalysis_MLP.weights.txt" ); TFile *input = TFile::Open("tmva_example.root"); TTree* theTree = (TTree*)input->Get("TreeS"); Float_t userVar1, userVar2; theTree->SetBranchAddress( "var1", &userVar1 ); theTree->SetBranchAddress( "var2", &userVar2 ); theTree->SetBranchAddress( "var3", &var3 ); theTree->SetBranchAddress( "var4", &var4 ); for (Long64_t ievt=3000; ievt<theTree->GetEntries();ievt++) { theTree->GetEntry(ievt); var1 = userVar1 + userVar2; var2 = userVar1 - userVar2; cout << reader->EvaluateMVA( "MLP method" ) <<endl; } delete reader;} TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
A purely academic Toy example • Use data set with 4 linearly correlated Gaussian distributed variables: --------------------------------------- Rank : Variable : Separation --------------------------------------- 1 : var3 : 3.834e+02 2 : var2 : 3.062e+02 3 : var1 : 1.097e+02 4 : var0 : 5.818e+01 --------------------------------------- TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Validating the classifiers TMVA GUI Validating the Classifier Training • Projective likelihood PDFs, MLP training, BDTs, .... average no. of nodes before/after pruning: 4193 / 968 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Classifier Output The Output • TMVA output distributions: Fisher PDERS Likelihood correlations removed due to correlations Neural Network Boosted Decision Trees Rule Fitting TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Evaluation Output The Output • TMVA output distributions for Fisher, Likelihood, BDT and MLP… For this case: Fisher discriminant provides the theoretically ‘best’ possible method Same as de-correlated Likelihood Note: About All Realistic Use Cases are Much More Difficult Than This One Cuts and Likelihood w/o de-correlation are inferior TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Evaluation Output (taken from TMVA printout) Evaluation results ranked by best signal efficiency and purity (area) ------------------------------------------------------------------------------ MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi- Methods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance: ------------------------------------------------------------------------------ Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189 MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260 LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251 PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192 RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390 1.092 HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093 BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099 CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000 Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682 ------------------------------------------------------------------------------ Testing efficiency compared to training efficiency (overtraining check) ------------------------------------------------------------------------------ MVA Signal efficiency: from test sample (from traing sample) Methods: @B=0.01 @B=0.10 @B=0.30 ------------------------------------------------------------------------------ Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873) MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873) LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872) PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881) RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848) HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868) BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911) CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715) Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677) ----------------------------------------------------------------------------- Better classifier Check for over-training TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
More Toys: Circular correlations More Toys: Linear-, Cross-, Circular Correlations • Illustrate the behaviour of linear and nonlinear classifiers Circular correlations (same for signal and background) TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Illustustration: Events weighted by MVA-response: Weight Variables by Classifier Performance • Example: How do classifiers deal with the correlation patterns ? Linear Classifiers: Fisher Likelihood decorrelated Likelihood Non Linear Classifiers: Decision Trees PDERS TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Final Classifier Performance Final Classifier Performance • Background rejection versus signal efficiency curve: Circular Example TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
More Toys: “Schachbrett” (chess board) Event Distribution • Performance achieved without parameter adjustments: PDERS and BDT are best “out of the box” • After some parameter tuning, also SVM und ANN(MLP) perform Theoretical maximum Events weighted by SVM response TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
TMVA-Users Guide We (finally) have a Users Guide ! Available from tmva.sf.net TMVA Users Guide 78pp, incl. code examples arXiv: physics/0703039 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Summary • TMVA unifies highly customizable and performing multivariate classification algorithms in a single user-friendly framework • This ensures most objective classifier comparisons and simplifies their use • TMVA is available from tmva.sf.net and in ROOT (>5.11/03) • A typical TMVA analysis requires user interaction with a Factory (for classifier training) and a Reader (for classifier application) • a set of ROOT macros displays the evaluation results • We will continue to improve flexibility and add new classifiers • Bayesian Classifiers • “Committee Method” combination of different MVA techniques • C-code output for trained classifiers (for selected methods…) TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
More Toys: Linear, Cross, Circular correlations More Toys: Linear-, Cross-, Circular Correlations • Illustrate the behaviour of linear and nonlinear classifiers Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Illustustration: Events weighted by MVA-response: Weight Variables by Classifier Performance • How well do the classifier resolve the various correlation patterns ? Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Final Classifier Performance Final Classifier Performance • Background rejection versus signal efficiency curve: Linear Example Circular Example Cross Example TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Stability with respect to irrelevant variables Stability with Respect to Irrelevant Variables • Toy example with 2 discriminating and 4 non-discriminating variables ? use only two discriminant variables in classifiers use all discriminant variables in classifiers TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Using TMVA in Training and Application Can be ROOT scripts, C++ executables or python scripts (via PyROOT), or any other high-level language that interfaces with ROOT TMVA Toolkit for Multivariate Data Analysis: ACAT 2007
Introduction: Event Classification x2 x2 x2 S S B B B x1 x1 x1 • Different techniques use different ways trying to exploit (all) features compare and choose Rectangular cuts? A linear boundary? A nonlinear one? S • How to place the decision boundary? Let the machine learn it from training events TMVA Toolkit for Multivariate Data Analysis: ACAT 2007