1 / 29

Xavier Prudent - LAPP BaBar Collaboration Meeting June 2006 - Montreal

Andreas Höcker (ATLAS), Kai Voss (ATLAS), Helge Voss (LHCb), Jörg Stelzer (BaBar), Peter Speckmayer (CERN). Xavier Prudent - LAPP BaBar Collaboration Meeting June 2006 - Montreal. Multi-variable analysis widely used in HEP (LEP, BaBar, Belle, D0, MiniBooNE, …).

Gabriel
Download Presentation

Xavier Prudent - LAPP BaBar Collaboration Meeting June 2006 - Montreal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andreas Höcker (ATLAS), Kai Voss (ATLAS), Helge Voss (LHCb), Jörg Stelzer (BaBar), Peter Speckmayer (CERN) Xavier Prudent - LAPP BaBar Collaboration Meeting June 2006 - Montreal

  2. Multi-variable analysis widely used in HEP (LEP, BaBar, Belle, D0, MiniBooNE, …) Common Reproaches to Multi-Variable Methods : In case of correlations cuts are not transparent anymore “Black box methods” Training sample may not describe correctly the data Creates no bias, only bad performance Need a control sample Systematics ? Independent & uneasy implementations … Need of a global tool that would : provide with the most common MV methods do both the training and evaluation of these methods enable easy computation of systematics

  3. “TMVA” means Toolkit for Multivariate Analysis Root package written by Andreas Höcker, Kai Voss, Helge Voss, Jörg Stelzer, Peter Speckmayer for the evaluation of MV methods in parallel of an analysis MV Methods available so far : Rectangular cut optimization Correlated Likelihood estimator (PDE) Multi-dimensional likelihood estimator (PDE) Fisher & Mahalanobis discriminant H-Matrix (χ2estimator) Neural Network (2 different implementations) Boosted decision tree TMVA provide training, testing & evaluation of these methods A dedicated class enables to plug the training results in your favorite analysis

  4. Cut Optimization  Scan in signal efficiency for highest background rejection • PDE approach, generalization to multi dimension likelihood, Output transformed by an inverse Fermi function (less peaked) De-correlation possible with the square root of the covariance matrix Correlated & de-correlated likelihood Fisher discriminant and H-matrix  Classical definitions • 2 NNs, both multi-layer perceptrons with stochastic learning • Clermont Ferrand ANN (used for ALEPH Higgs analysis) • TMultiLayerPerceptron (ANN from ROOT) Neural Network • Inspired from MiniBooNE Sequential applications of cuts (Boosted) Decision trees

  5. What is a Boosted Decision Tree ? Cut on the variable that optimizes the separation, based on the purity P (=“node”) Optimization by scanning, Genetic Algorithm soon Var1 > x1 Var1 ≤ x1 S/B 4/37 S/B 48/11 Split until minimal #event reached or limit of purity reached Final node = “leaf” : if P>Pmin “signal leaf” If P<Pmin “background leaf” Var2 > x2 Var2 ≤ x2 Boosting : if a signal event is on a bkg leaf or if a bkg event is on a signal leaf weight modified S/B 2/10 S/B 46/1 Then training re-performed with the new weights (x1000 trees) Testing Start from root node, events go through the 1000 boosted trees Each time an event ends on leaf signal or background signal its weight is modified (↔ Neural Net output) (smoother output than classical discrete output) Training Each event has a weight Wi (= 1 to start) S/B 52/48

  6. How to get and use TMVA ?

  7. How to download TMVA ? • Get a tgz file : from TMVA website http://tmva.sourceforge.net then click on • Via cvs : cvs –z3 –d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/tmva co –P TMVA Download  Automatic creation of 6 directories src/ Source for TMVA library example/ example of how to use TMVA lib/ TMVA library once compiled reader/ all functionalities to apply MV weights macros/ root macro to display results development/ working & testing directories For your own analysis : > cp example myTMVA  Modify “makefile” for compilation in /myTMVA

  8. Detailed Steps for the Example How to compile TMVA ? Include TMVA/lib in PATH /home cd TMVA /home/TMVA source setup.csh /home/TMVA cd src/ /home/TMVA/src make Compile the librairy  libTMVA.so

  9. How to choose the MV method I want ? Go to exemple/ directory and open TMVAnalysis.cpp /home/TMVA/src cd ../examples You will find a list of available methods (Booleans) Switch to 1/0 the method you want/don’t want … Bool_t Use_Cuts = 1; Bool_t Use_Likelihood = 0; Bool_t Use_LikelihoodD = 0; Bool_t Use_PDERS = 0; Bool_t Use_HMatrix = 0; Bool_t Use_Fisher = 1; Bool_t Use_CFMlpANN = 1; Bool_t Use_TMlpANN = 0; Bool_t Use_BDT_GiniIndex = 0; Bool_t Use_BDT_CrossEntro = 0; Bool_t Use_BDT_SdivStSpB = 0; Bool_t Use_BDT_MisClass = 0; … You just have to “switch” on or off the Booleans ! Here for instance I will compare Cuts, Fisher, Neural Net CFM

  10. How to point to TMVA the training samples & variables ? In TMVAnalysis.cpp Both ascii or root files can be used as input Creation of the factory object How to point the input ascii files : How to point the variables (example with 4 variables) : In/examples/data : toy_sig.dat bkg_toy.dat

  11. How to change training options ? In TMVAnalysis.cpp Training cycles, #hidden layer, #neurons per layer factory->PreparTrainigAndTestTree( mycut, 2000, 4000 ); #events used : training testing  Description of every options in class BookMethod

  12. How to I run TMVA ? /home/TMVA/src cd ../examples /home/TMVA/examples make /home/TMVA/examples TMVAnalysis “myOutput.root” Name of output root file What does it create ? Some weights files for each trained MV methods in weight/ A root file in main directory with the MV outputs and efficiencies How to look at the results ? Use the nice ROOT macros in directory macros/ /home/TMVA/examples root –l /home/TMVA/examples .L ../macros/efficiencies.C /home/TMVA/examples efficiencies("MyOutput.root") Plots created in directory plots/

  13. Which ROOT macros are available ? (1) variables.C  Distributions of input variables

  14. Which ROOT macros are available ? (2) correlations.C  Colored correlation matrix of input variables Numeric values displayed during TMVA running

  15. Which ROOT macros are available ? (3) mvas.C  Outputs of MV methods

  16. Which ROOT macros are available ? (4) efficiencies.C  Background rejection vs. Signal efficiency Direct comparison of all MV methods !

  17. I have trained the MV method I want … I have the weight files … How to use this MV method in my analysis ? TMVA/reader/TMVApplication.cpp Detailed example is Dedicated class The next slide shows what must be included in your analysis program … Work in progress (being implemented in ROOT) thus possible differences with later version … reader/TMVA_reader.hh

  18. [1] Include the reader class [2] Create an array of the input variables names (here 4var) [3] Create the reader class [4] Read the weights and build the MV tool [5] Create an array with the input variables values [6] Compute the value of the MV, it is the value you will cut on #include “TMVA_reader.h” using TMVApp::TMVA_Reader; Void MyAnalysis() { vector<string> inputVars; inputVars.push_back( "var1" ); inputVars.push_back( "var2" ); inputVars.push_back( "var3" ); inputVars.push_back( "var4" ); TMVA_Reader *tmva = new TMVA_Reader( inputVars ); tmva->BookMVA(TMVA_Reader::Fisher, “TMVAnalysis_Fisher.weights"); vector<double> varValues; varValues.push_back( var1 ); varValues.push_back( var2 ); varValues.push_back( var3 ); varValues.push_back( var4 ); double mvaFi = tmva->EvaluateMVA( varValues, TMVA_Reader::Fisher ); delete tmva;}

  19. TMVA is already used by several AWG in BaBar Group Dalitz Charmless UK : TMVA Fisher for continuum rejection in the Dalitz-plot analyzes of Ksp+p- and K+p-p+ ( BADs: 1376 and 1512 ). Use of 11 input variables, pictures taken from BAD 1376

  20. Group D0h0 : TMVA ClmF NN for continuum rejection in the measurement of the BFs of the color suppressed modes B0 D0h0 (h0 = ω, η, η’, ρ, π0) and in the measurement of CKM β angle Use of 4 input variables

  21. Measurement of sin(2α) with Bρπ  Uses Clermont-Ferrand NN to get rid of combinatory background Measurement of CKM angle γwith GLW method (Emmanuel Latour – LLR)  Uses Fisher to get rid of combinatory background Signal = MC signal B D*K, D*D0p0, D0Kp Background = MCs udsc

  22. What to keep in mind about TMVA ? A powerful Multivariate toolkit with 12 different methods (more are coming) User friendly package from training to plots ! Already used in BaBar Comparison possible & easy between the different MV methods C++ & Root functionalities, announced in ROOT version V5-11-06 http://root.cern.ch/ Have a look at http://tmva.sourceforge.net/ !! Talk by Kai Voss at CERN http://agenda.cern.ch/askArchive.php?base=agenda&categ=a057207&id=a057207s27t6/transparencies TMVA Tutorial https://twiki.cern.ch/twiki/bin/view/Atlas/AnalysisTutorial1105#TMVA_Multi_Variate_Data_Analysis Physics analysis HN advertisement http://babar-hn.slac.stanford.edu:5090/HyperNews/get/physAnal/2989.html A similar tool has been developed by Ilya Narsky ( StatPatternRecognition )

  23. Back Up Slides

  24. Available Options For Every Methods in TMVAnalysis.cpp Rectangular cut optimization Correlated Likelihood estimator (PDE) Multi-dimensional likelihood estimator (PDE) Fisher & Mahalanobis discriminant H-Matrix (χ2estimator) Neural Network (2 different implementations) Boosted decision tree

  25. Rectangular cuts factory->BookMethod( "MethodCuts", “Method : nBin : OptionVar1: … :OptionVarn" ); TMVA method # bins in the hist of efficiency S/B Method of cut - "MC" : Monte Carlo optimization (recommended) - "FitSel": Minuit Fit: "Fit_Migrad" or "Fit_Simplex" - "FitPDF": PDF-based: only useful for uncorrelated input variables Option for each variables - "FMax" : ForceMax (the max cut is fixed to maximum of variable i) - "FMin" : ForceMin (the min cut is fixed to minimum of variable i) - "FSmart": ForceSmart (the min or max cut is fixed to min/max, based on mean value) - Adding "All" to "option_vari", eg, "AllFSmart" will use this option for all variables - if "option_vari" is empty (== ""), no assumptions on cut min/max are made

  26. Likelihood factory->BookMethod( "MethodLikelihood", “TypeOfSpline : NbSmooth : NbBin : Decorr"); TMVA method Which spline is used for smoothing the pdfs “Splinei” [i=1,2,3,5] How often the input histos are smoothed Average num of events per PDF bin to trigger warning Option for decorrelation or not - “NoDecorr” – do not use square-root-matrix to decorrelate variable space - “Decorr” – decorrelate variable space

  27. Fisher Discriminant and H matrix factory->BookMethod( "MethodFisher", "Fisher" ); TMVA method • Which method • “Fisher” • “Mahalanobis” (another definition of distance) factory->BookMethod( "MethodHMatrix" ); TMVA method

  28. Artificial Neural Network factory->BookMethod( “WhichANN”, “NbCycles:NeuronsL1:NeuronsL2:…:NeuronsLn" ); • Which type of NN • “MethodCFMlpANN” • Clermond Ferrand NN, used for Higgs search in ALEPH • “MethodTMlpANN” ROOT NN Number of training cycles Number of neurons in each layer The 1st layer has necessarily as many neurons as input variables

  29. Boosted Decision Trees TMVA method factory->BookMethod( "MethodBDT", “nTree : BoostType : SeparationType : nEvtMin : MaxNodePurity : nCuts”); Number or trees - AdaBoost - EpsilonBoost Method of boosting • Method for evaluating the misclassification • GiniIndex • CrossEntropy • SdivSqrtSplusB • MisClassificationError Minimum Number of events in a node (leaf criteria) Higher bound for leave or intermediate node Number of steps in the optimization of the cut for a node

More Related