710 likes | 877 Views
Multivariate Analysis with T MVA Basic Concept and New Features. Helge Voss ( * ) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009. ( * ) On behalf of the author team: A. Hoecker, P. Speckmayer, J. Stelzer, J.Therhaag, E.v.Toerne , H. Voss
E N D
Multivariate Analysis with TMVA Basic Concept and New Features Helge Voss(*) (MPI–K, Heidelberg) LHCb MVA Workshop, CERN, July 10, 2009 (*) On behalf of the author team: A. Hoecker, P. Speckmayer, J. Stelzer, J.Therhaag, E.v.Toerne , H. Voss And the contributors: See acknowledgments on page xx On the web: http://tmva.sf.net/ (home), https://twiki.cern.ch/twiki/bin/view/TMVA/WebHome (tutorial)
Event Classification x2 x2 x2 S S S B B B x1 x1 x1 • Suppose data sample of two types of events: with class labels Signal andBackground (will restrict here to two class cases. Many classifiers can in principle be extended to several classes, otherwise, analyses can be staged) • how to set the decision boundary to select events of type S? • we have discriminating variables x1, x2, … Rectangular cuts? A linear boundary? A nonlinear one? • How can we decide? • Once decided on a class of boundaries, how to find the “optimal” one?
Regression In High Energy Physics • Most HEP event reconstruction require some sort of “function estimate”, of which we do not have an analytical expression: • energy deposit in a the calorimeter: shower profile calibration • entry location of the particle in the calorimeter • distance between two overlapping photons in the calorimeter • … you can certainly think of more.. • Maybe you could even imagine some final physics parameter that needs to be fitted to your event data and you would rather use a non analytic fit function from Monte Carlo events than an analytic parametrisation of the theory ???: • reweighing method used in the LEP W-mass analysis • … somewhat wild guessing, might not be reasonable to do.. • Maybe your application can be successfully learned by a machine using Monte Carlo events ??
Regression f(x) x x x • how to estimate a “functional behaviour” from a given set of ‘known measurements” ? • Assume for example “D”-variables that somehow characterize the shower in your calorimeter. • from Monte Carlo or testbeam measurements, you have a data sample with events that measure all D-Variables x and the corresponding particle energy • the particle energy as a function of the D observables x is a surface in the D+1 dimensional space given by D-observables + energy constant ? linear? non - linear? f(x) f(x) • seems trivial?? • maybe… the human eye and brain behind have very good pattern recognition capabilities! • but what if you have more variables?
Event Classification: Why MVA ? Assume 4 Uncorrelated Gaussians: What do you think. Is this event is more “likely” signal or background ??
Event Classification • Find a mapping from D-dimensional input/observable/”feature” space to one dimensional output class lables y(x) • Each event, if Signal or Background, has “D” measured variables. y(x): RnR: D “feature space” most general form y = y(x); x D x={x1,….,xD}: input variables • If one histogramms the resulting y(x) values: • Who sees how this would look like for the regression problem?
Event Classification y(x) >cut: signal =cut: decision boundary <cut: background y(x): • Each event, if Signal or Background, has “D” measured variables. • Find a mapping from D-dimensional input/observable/”feature” space to one dimensional output class lables y(B) 0, y(S) 1 y(x): RnR: D “feature space” • y(x): “test statistic” in D-dimensional space of input variables • distributions of y(x): PDFS(y) and PDFB(y) • used to set the selection cut! • efficiency and purity • y(x)=const: surface defining the decision boundary. • overlap of PDFS(y) and PDFB(y) separation power , purity
MVA and Machine Learning • The previous slide was basically the idea of “Multivariate Analysis” (MVA) • rem: What about “standard cuts” ? • Finding y(x) : RnR ( “basically” the same for regression and classification) • given a certain type of model class y(x) • in an automatic way using “known” or “previously solved” events • i.e. learn from known “patterns” • such that the resulting y(x) has good generalization properties when applied to “unknown” events • that is what the “machine” is supposed to be doing: supervised machine learning • Of course… there’s no magic, we still need to: • choose the discriminating variables • choose the class of models (linear, non-linear, flexible or less flexible) • tune the “learning parameters” bias vs. variance trade off • check generalization properties • consider trade off between statistical and systematic uncertainties
Overview/Outline • Idea: rather than just implementing new MVA techniques and making them available in ROOT (i.e. like TMulitLayerPercetron does): • one common platform / interface for all MVA classification and regression algorithms. • easy to use and switch between classifiers • (common) data pre-processing capabilities • train / test all classifiers on same data sample and evaluate consistently • common analysis, application and evaluation framework (ROOT Macros, C++ executables, python scripts..) • application of trained classifier/regressor using libTMVA.so and corresponding “weight files” or stand alone C++ code (latter is currently disabled (TMVA 4.01) ) • Outline • The TMVA project • Survey of the available classifiers/regressors • Toy examples • General considerations and systematics H. Voss ― Multivariate Analysis with TMVA
TMVA Development and Distribution • TMVA is a sourceforge (SF) package for world-wide access • Home page ………………. http://tmva.sf.net/ • SF project page …………. http://sf.net/projects/tmva • View SVN ………………… http://tmva.svn.sourceforge.net/viewvc/tmva/trunc/TMVA/ • Mailing list .……………….. http://sf.net/mail/?group_id=152074 • Tutorial TWiki ……………. https://twiki.cern.ch/twiki/bin/view/TMVA/WebHome • Active project • Currently 6 core developers, and 42 registered contributors at SF • >4864 downloads since March 2006 (without CVS/SVN checkouts and ROOT users) • current version. 4.0.1 (29 Jun 2009) • try to be fast on users requests bug fixes etc. • Integrated and distributed with ROOT since ROOT v5.11/03 (newest version 4.0.1 in ROOT production release 5.23) H. Voss ― Multivariate Analysis with TMVA
New(and future) TMVA Developments • MV-Regression • Code re-design allowing for: • boosting any classifier (not only decision trees) • combination of any pre-processing algorithms • classifier training in multi-regions • multi-class classifiers (will slowly creep in to some classifiers…) • cross-validation • New MVA-Technique • PDE-Foam (classification + regression) • Data preprocessing • Gauss-transformation • arbitrary combination of preprocessing • Numerous small add-ons etc.. H. Voss ― Multivariate Analysis with TMVA
TheTMVA Classifiers (Regressors) • Currently implemented classifiers : • Rectangular cut optimisation • Projective (1D) likelihood estimator • multidimensional likelihood estimators (PDE range search, PDE-Foam, k-Nearest Neighbour) (also as Regressor) • Linar discirminant analysis (Fisher, H-Matrix general Linear (also as Regressor) ) • Function Discriminant (also as Regressor) • Artificial neural networks (3 multilayer perceptron implementations (MLP also as Regressor with multi-dimensional target) ) • Boosted/bagged/randomised decision trees (also as Regressor) • Rule Ensemble Fitting • Support Vector Machine • (Plugin for any other MVA classifier (e.g. NeuroBayes)) • Currently implemented data preprocessing : • Linear decorrelation • Principal component analysis • Gauss transformation H. Voss ― Multivariate Analysis with TMVA
Example for Data Preprocessing: Decorrelation • Commonly realised for all classifiers in TMVA (centrally in DataSet class) • Removal of linear correlations by rotating input variables • using the “square-root” of the correlation matrix • using the Principal Component Analysis • Note that decorrelation is only complete, if • Correlations are linear • Input variables are Gaussian distributed SQRT derorr. PCA derorr. original H. Voss ― Multivariate Analysis with TMVA
“Gaussian-isation” • Improve decorrelation by pre-Gaussianisation of variables • First: transformation to achieve uniform (flat) distribution: Transformed variable k Measured value PDF of variable k The integral can be solved in an unbinned way by event counting, or by creating non-parametric PDFs (see later for likelihood section) • Second: make Gaussian via inverse error function: • Third: decorrelate (and “iterate” this procedure) H. Voss ― Multivariate Analysis with TMVA
“Gaussian-isation” Original Signal - Gaussianised Background - Gaussianised We cannot simultaneously Gaussianise both signal and background ?! H. Voss ― Multivariate Analysis with TMVA
Pre-processing • The decorrelation/gauss transform can be determined from either: • Signal + Background together • Signal only • Background only • For all but the “Likelihood” methods (1D or multi-dimensional) we need to decide on either of those. (TMVA default: signal+background) • For “Likelihood” methods where we test “Signal” hypothesis against “Background” hypothesis, BOTH are used. H. Voss ― Multivariate Analysis with TMVA
Rectangular Cut Optimisation • Simplest method: indiviudual cuts on each variable • cuts out a single rectangular volume from the variable space • Technical challenge: how to find optimal cuts ? • MINUIT fails due to non-unique solution space • TMVA uses: Monte Carlo sampling, Genetic Algorithm, Simulated Annealing • Huge speed improvement of volume search by sorting events in binary tree • Cuts usually benefit from prior decorrelation of cut variables H. Voss ― Multivariate Analysis with TMVA
Automatic, unbiased, but suboptimal Difficult to automate for arbitrary PDFs Easy to automate, can create artefacts/suppress information Projective (1D) Likelihood Estimator (PDE Approach) • Probability density estimators for each input variable combined in likelihood estimator (ignoring correlations) • Optimal approach if zero correlations (or linear decorrelation) • Otherwise: significant performance loss • Technical challenge: how to estimate the PDF shapes • 3 ways: parametric fitting (function)event countingnonparametric fitting • TMVA uses binned shape interpolation using spline functions • Or: unbinned adaptive Gaussian kernel density estimation • TMVA performs automatic validation of goodness-of-fit H. Voss ― Multivariate Analysis with TMVA
Multidimensional PDE Approach • Use a single PDF per event class (sig, bkg), which spans Nvar dimensions 2 different implementations of the same idea: • PDE Range-Search: count number of signal and background events in “vicinity” of test event preset or adaptive volume defines “vicinity” Carli-Koblitz, NIM A501, 576 (2003) • Improve simple event counting by use of kernel functions to penalise large distance from test event x2 H1 • Increase counting speed with binary search tree test event • Genuine k-NN algorithm (author R. Ospanov) Intrinsically adaptive approach Very fast search with kd-tree event sorting H0 x1 • PDE-Foam: self adaptive phase space binning divides space in hyperqubes as “volumes”… Regression: assume the function value distance weighted averaged over the “volume” H. Voss ― Multivariate Analysis with TMVA
Fisher’s Linear Discriminant Analysis (LDA) • Well known, simple and elegant classifier • LDA determines axis in the input variable hyperspace such that a projection of events onto this axis pushes signal and background as far away from each other as possible projection axis • Classifier response couldn’t be simpler: “Fisher coefficients” • Function discriminant analysis (FDA) • Fit any user-defined function of input variables requiring that signal events return 1 and background 0 • Parameter fitting: Genetics Alg., MINUIT, MC and combinations • Easy reproduction of Fisher result, but can add nonlinearities • Very transparent discriminator H. Voss ― Multivariate Analysis with TMVA
1 input layer k hidden layers 1 ouput layer ... 1 1 1 2 output classes (signal and background) . . . . . . . . . Nvar discriminating input variables i j Mk . . . . . . N M1 (“Activation” function) with: Nonlinear Analysis: Artificial Neural Networks • Achieve nonlinear classifier response by “activating” output nodes using nonlinear weights Feed-forward Multilayer Perceptron • Three different implementations in TMVA (all are Multilayer Perceptrons) • TMlpANN: Interface to ROOT’s MLP implementation • MLP: TMVA’s own MLP implementation for increased speed and flexibility • CFMlpANN: ALEPH’s Higgs search ANN, translated from FORTRAN Regression: uses the real valued output of ONE node H. Voss ― Multivariate Analysis with TMVA
Boosted Decision Trees (BDT) • DT: Sequential application of cuts splits the data into nodes, where the final nodes (leafs) classify an event assignal or background • BDT: combine forest of DTs, with differently weighted events in each tree (trees can also be weighted) • e.g., “AdaBoost”: incorrectly classified events receive larger weight in next DT • GradientBoost • “Bagging”: random poisson event weights resampling with replacement • “Randomised Forest”, like bagging and usaging of random sub-sets of variables • Boosting/bagging/randomising create set of “basis functions”: final classifier is linear combination of these improves stability • Bottom-up “pruning” of a decision tree • Remove statistically insignificant nodes to reduce tree overtraining H. Voss ― Multivariate Analysis with TMVA
Boosted Decision Trees (BDT) • DT: Sequential application of cuts splits the data into nodes, where the final nodes (leafs) classify an event assignalorbackground Decision tree after pruning Decision tree before pruning • Bottom-up “pruning” of a decision tree • Remove statistically insignificant nodes to reduce tree overtraining Randomised Forestsdon’t need pruning VERY ROBUST Some people even claim that any “boosted” decision trees don’t need to be pruned, but according to our experience, this is not always true. Regression: assume the function value of average in the leaf node H. Voss ― Multivariate Analysis with TMVA
Learning with Rule Ensembles Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003 • Following RuleFit approach by Friedman-Popescu • Model is linear combination of rules, where a rule is a sequence of cuts RuleFit classifier rules (cut sequence rm=1 if all cuts satisfied, =0 otherwise) normalised discriminating event variables Sum of rules Linear Fisher term • The problem to solve is • Create rule ensemble: use forest of decision trees • Fit coefficients am, bk: gradient direct regularization minimising Risk (Friedman et al.) • Pruning removes topologically equal rules” (same variables in cut sequence) One of the elementary cellular automaton rules (Wolfram 1983, 2002). It specifies the next color in a cell, depending on its color and its immediate neighbors. Its rule outcomes are encoded in the binary representation 30=000111102. H. Voss ― Multivariate Analysis with TMVA
Boosting classifier C(0)(x) classifier C(1)(x) classifier C(2)(x) classifier C(3)(x) classifier C(m)(x) re-weight re-weight re-weight Weighted Sample Weighted Sample Weighted Sample re-weight Weighted Sample Training Sample H. Voss ― Multivariate Analysis with TMVA
Adaptive Boosting (AdaBoost) classifier C(0)(x) classifier C(1)(x) classifier C(2)(x) classifier C(3)(x) classifier C(m)(x) re-weight re-weight re-weight Weighted Sample Weighted Sample Weighted Sample re-weight Weighted Sample • AdaBoost re-weights events misclassified by previous classifier by: Training Sample • AdaBoost weights the classifiers also using the error rate of the individual classifier according to: H. Voss ― Multivariate Analysis with TMVA
AdaBoost: A simple demonstration a) b) The example: (somewhat artificial…but nice for demonstration) : • Data file with three “bumps” • Weak classifier (i.e. one single simple “cut” ↔ decision tree stumps ) var(i) > x var(i) <= x B S Two reasonable cuts: a) Var0 > 0.5 εsignal=66% εbkg ≈ 0% misclassified events in total 16.5% or b) Var0 < -0.5 εsignal=33% εbkg ≈ 0% misclassified events in total 33% the training of a single decision tree stump will find “cut a)” H. Voss ― Multivariate Analysis with TMVA
AdaBoost: A simple demonstration b) The first “tree”, choosing cut a) will give an error fraction: err = 0.165 before building the next “tree”: weight wrong classified training events by ( 1-err/err) ) ≈ 5 the next “tree” sees essentially the following data sample: .. and hence will chose: “cut b)”: Var0 < -0.5 re-weight The combined classifier: Tree1 + Tree2 the (weighted) average of the response to a test event from both trees is able to separate signal from background as good as one would expect from the most powerful classifier H. Voss ― Multivariate Analysis with TMVA
AdaBoost: A simple demonstration Only 1 tree “stump” Only 2 tree “stumps” with AdaBoost H. Voss ― Multivariate Analysis with TMVA
Boosting Fisher Discriminant • Fisher Discriminant: • y(x) = A linear combination of the input variables • A linear combination of Fisher discriminant is again a Fisher discriminant. Hence “simple boosting doesn’t help” • However: • apply the “Fisher Discriminant” CUT in each step (or any other non-linear transformation” then it works! H. Voss ― Multivariate Analysis with TMVA
Support Vector Machine • There are methods to create linear decision boundaries using only measures of distances leads to quadratic optimisation processs • Using variable transformations, i.e moving into a higher dimensional space (i.e. using var1*var1 in Fisher Discriminant) allows in principle to separate with linear decision boundaries non linear problems • The decision boundary in the end is defined only by training events that are closest to the boundary • Support Vector Machine H. Voss ― Multivariate Analysis with TMVA
Support Vector Machines • Find hyperplane that best separates signal from background x2 support vectors • Best separation: maximum distance (margin) between closest events (support) to hyperplane • Linear decision boundary 1 Non-separable data 2 Separable data optimal hyperplane 3 • If data non-separable add misclassification costparameter C·ii to minimisation function 4 margin • Solution of largest margin depends only on inner product of support vectors (distances) quadratic minimisation problem x1 H. Voss ― Multivariate Analysis with TMVA
Support Vector Machines x3 x2 x2 x1 x1 x1 • Find hyperplane that best separates signal from background • Best separation: maximum distance (margin) between closest events (support) to hyperplane • Linear decision boundary Non-separable data Separable data • If data non-separable add misclassification costparameter C·ii to minimisation function (x1,x2) • Solution depends only on inner product of support vectors quadratic minimisation problem • Non-linear cases: • Transform variables into higher dimensional feature space where again a linear boundary (hyperplane) can separate the data • Explicit transformation doesn’t need to be specified. Only need the “scalar product” (inner product) x·x Ф(x)·Ф(x). • certain Kernel Functions can be interpreted as scalar products between transformed vectors in the higher dimensional feature space. e.g.:Gaussian, Polynomial, Sigmoid • Choose Kernel and fit the hyperplane using the linear techniques developed above • Kernel size paramter typically needs careful tuning! (Overtraining!) H. Voss ― Multivariate Analysis with TMVA
Data Preparation and Preprocessing • Data input format: ROOT TTree or ASCII • Supports selection of any subset or combination or function of available variables • Supports application of pre-selection cuts (possibly independent for signal and bkg) • Supports global event weights for signal or background input files • Supports use of any input variable as individual event weight • Supports various methods for splitting into training and test samples: • Block wise • Randomly • Periodically (i.e. periodically 3 testing ev., 2 training ev., 3 testing ev, 2 training ev. ….) • User defined separate Training and Testing trees • Preprocessing of input variables (e.g., decorrelation) H. Voss ― Multivariate Analysis with TMVA
Code Flow for Training and Application Phases Scripts can be ROOT scripts, C++ executables or python scripts (via PyROOT) TMVA tutorial H. Voss ― Multivariate Analysis with TMVA
MVA Evaluation Framework • TMVA is not only a collection of classifiers, but an MVA framework • After training, TMVA provides ROOT evaluation scripts (through GUI) Plot all signal (S) and background (B) input variables with and without pre-processing Correlation scatters and linear coefficients for S & B Classifier outputs (S & B) for test and training samples (spot overtraining) Classifier Rarity distribution Classifier significance with optimal cuts B rejection versus S efficiency Classifier-specific plots: • Likelihood reference distributions • Classifier PDFs (for probability output and Rarity) • Network architecture, weights and convergence • Rule Fitting analysis plots • Visualisation of decision trees H. Voss ― Multivariate Analysis with TMVA
Evaluating the Classifier Training • Projective likelihood PDFs, MLP training, BDTs, … average no. of nodes before/after pruning: 4193 / 968 H. Voss ― Multivariate Analysis with TMVA
Evaluating the Classifier Training • Classifier output distributions for test and training samples … H. Voss ― Multivariate Analysis with TMVA
Evaluating the Classifier Training • Optimal cut for each classifiers … Using the TMVA graphical output one can determine the optimal cut on a classifier output (working point in “statistical significance) H. Voss ― Multivariate Analysis with TMVA
Evaluating the Classifier Training • Background rejection versus signal efficiencies … Best plot to compare classifier performance If background in data non-uniform problem in training sample An elegant variable is the Rarity: transforms to uniform background. Height of signal peak direct measure of classifier performance H. Voss ― Multivariate Analysis with TMVA
Evaluating the Classifiers (taken from TMVA output…) Input Variable Ranking --- Fisher : Ranking result (top variable is best ranked) --- Fisher : --------------------------------------------- --- Fisher : Rank : Variable : Discr. power --- Fisher : --------------------------------------------- --- Fisher : 1 : var4 : 2.175e-01 --- Fisher : 2 : var3 : 1.718e-01 --- Fisher : 3 : var1 : 9.549e-02 --- Fisher : 4 : var2 : 2.841e-02 --- Fisher : --------------------------------------------- Better variable • How discriminating is a variable ? Classifier correlation and overlap --- Factory : Inter-MVA overlap matrix (signal): --- Factory : ------------------------------ --- Factory : Likelihood Fisher --- Factory : Likelihood: +1.000 +0.667 --- Factory : Fisher: +0.667 +1.000 --- Factory : ------------------------------ • Do classifiers select the same events as signal and background ? If not, there is something to gain ! H. Voss ― Multivariate Analysis with TMVA
Evaluating the Classifiers (taken from TMVA output…) Evaluation results ranked by best signal efficiency and purity (area) ------------------------------------------------------------------------------ MVA Signal efficiency at bkg eff. (error): | Sepa- Signifi- Methods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance: ------------------------------------------------------------------------------ Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189 MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260 LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251 PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192 RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390 1.092 HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093 BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099 CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000 Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682 ------------------------------------------------------------------------------ Testing efficiency compared to training efficiency (overtraining check) ------------------------------------------------------------------------------ MVA Signal efficiency: from test sample (from traing sample) Methods: @B=0.01 @B=0.10 @B=0.30 ------------------------------------------------------------------------------ Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873) MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873) LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872) PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881) RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848) HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868) BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911) CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715) Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677) ----------------------------------------------------------------------------- Better classifier Check for over-training H. Voss ― Multivariate Analysis with TMVA
Subjective Summary of TMVA Classifier Properties The properties of the Function discriminant (FDA) depend on the chosen function H. Voss ― Multivariate Analysis with TMVA
Regression Example Example: Target as a quadratic function of 2 variables • approximated by: • “Linear Regressor” • Neural Network H. Voss ― Multivariate Analysis with TMVA
NEW! TMVA Users Guide for 4.0 web page with quick reference summary of all training options! Available on http://tmva.sf.net TMVA Users Guide 97pp, incl. code examples arXiv physics/0703039 H. Voss ― Multivariate Analysis with TMVA
Some T o y E x a m p l e s H. Voss ― Multivariate Analysis with TMVA
The “Schachbrett” Toy (chess board) Event distribution • Performance achieved without parameter tuning: PDERS and BDT best “out of the box” classifiers • After some parameter tuning, also SVM und ANN(MLP) perform equally well Theoretical maximum Events weighted by SVM response H. Voss ― Multivariate Analysis with TMVA
More Toys: Linear-, Cross-, Circular Correlations • Illustrate the behaviour of linear and nonlinear classifiers Linear correlations (same for signal and background) Linear correlations (opposite for signal and background) Circular correlations (same for signal and background) H. Voss ― Multivariate Analysis with TMVA
How does linear decorrelation affect strongly nonlinear cases ? Original correlations SQRT decorrelation PCA decorrelation H. Voss ― Multivariate Analysis with TMVA
Weight Variables by Classifier Output • How well do the classifier resolve the various correlation patterns ? Linear correlations (same for signal and background) Cross-linear correlations (opposite for signal and background) Circular correlations (same for signal and background) Likelihood Likelihood - D PDERS Fisher MLP BDT H. Voss ― Multivariate Analysis with TMVA