340 likes | 495 Views
Experimental Particle Physics. Detector by function. Position: Beam Tracker Vertex Telescope Multi-wire Proportional Chambers (MWPCs) Energy: Zero Degree Calorimeter (ZDC) Charge: Quartz Blade PID: Hadron Absorber and Iron wall. Fig. From position to track.
E N D
Detector by function • Position: • Beam Tracker • Vertex Telescope • Multi-wire Proportional Chambers (MWPCs) • Energy: • Zero Degree Calorimeter (ZDC) • Charge: • Quartz Blade • PID: • Hadron Absorber and Iron wall Fig Pedro Parracho - MEFT@CERN 2008
From position to track That is the job for … reconstruction Chose start and finish point Try to fit track to targets Add middle points Check that all the groups have points in the track Pedro Parracho - MEFT@CERN 2008
Reconstructed event Pedro Parracho - MEFT@CERN 2008
Experimental Particle Physics • Chose a particle and a particular decay channel.(PDG) • From that it will depend what is more important for you in terms of detector, and tracks • For this presentation you’re going to see: Pedro Parracho - MEFT@CERN 2008
Choice of good events • You need to make sure that all the detectors you depend for your study were working correctly at the data taking time. Pedro Parracho - MEFT@CERN 2008
First mass spectrum Pedro Parracho - MEFT@CERN 2008
Cuts This is 90% of the Work… What these cuts are: IV PCA Track distance They make sense because: Δz • Daughter particles of a V0 decay • originate at the same point in space • The particles have decay lengths of 2.7 cm (becomes 72 cm in the • laboratory frame) Pedro Parracho - MEFT@CERN 2008
After the Cuts Pedro Parracho - MEFT@CERN 2008
Background Subtraction This is the other 90% of the Work… Combinatorial Fit The idea: • Take a “particle” that could be real, but that you are sure it is not. • Each track is from a different collision • The ditracks characteristics are according to the real ones • Take enough of them • Subtract their mass to your histogram Pedro Parracho - MEFT@CERN 2008
Acceptances • The result you “see” has been biased by the detector and by the analysis steps. • Now you must “unbias” so that you can publish a result comparable with other results. • This is again… 90% of the work • But after this you are done… You just have to write the thesis/article Pedro Parracho - MEFT@CERN 2008
Pause for questions Pedro Parracho - MEFT@CERN 2008
Multivariate analysis • Multivariate statistical analysis is a collection of procedures which involve observation and analysis of more than one statistical variable at a time. • Some Classification Methods : • Fisher Linear Discriminant • Gaussian Discriminant • Random Grid Search • Naïve Bayes (Likelihood Discriminant) • Kernel Density Estimation • Support Vector Machines • Genetic Algorithms • Binary Decision Trees • Neural Networks Pedro Parracho - MEFT@CERN 2008
Decision Trees Node A decision tree is a sequence of cuts. Choose cuts that partition the data into bins of increasing purity. Key idea: do so recursively. Leaf MiniBoone, Byron Roe Pedro Parracho - MEFT@CERN 2008
TMVA, what is it? • Toolkit for Multivariate Analysis • software framework implementing several MVA techniques • common processing of input data (decorrelation, cuts,...) • training, testing and evaluation (plots, log-file) • reusable output of obtained models (C++ codelets, text files) Pedro Parracho - MEFT@CERN 2008
Implemented methods • Rectangular cut optimisation • Likelihood estimator • Multi-dimensional likelihood estimator and k-nearest neighbor (kNN) • Fisher discriminant and H-Matrix • Artificial Neural Network (3 different implementations) • Boosted/bagged Decision Trees • Rule ensemble • Support Vector Machine (SVM) • Function Discriminant Analysis (FDA) Pedro Parracho - MEFT@CERN 2008
Advantages of TMVA • Distributed with ROOT • several methods under one 'roof‘ • easy to systematically compare many classifiers, • and find the best one for the problem at hand • common input/output interfaces • common evaluation of all classifiers in an objective way • plugin as many classifiers as possible • a GUI provides a set of performance plots • the final model(s) are saved as simple text files and reusable through a reader class • also, the models may be saved as C++ classes (package independent), which can be inserted into any application • it’s easy to use and flexible • easy to implement the chosen classifier in user applications Pedro Parracho - MEFT@CERN 2008
Logical Flow Pedro Parracho - MEFT@CERN 2008
Correlation Plots Pedro Parracho - MEFT@CERN 2008
Comparison of all the methods • In this plot we can see how good each of the methods is for our problem. • The best method seems to be the BDT (boosted decision trees) that is basically a method that expands the usual cut method to more dimensions Pedro Parracho - MEFT@CERN 2008
Methods output All the methods output a number (the output classifier) that represents how well the given event matches the background. Here we can see the distributions of this value for two chosen methods (the best: BDT and the worst: Function Discriminant Analysis). This plots can help us to pinpoint the cut value to chose for our study. Pedro Parracho - MEFT@CERN 2008
Where to cut • The TMVA produces this kind of plots, which are very useful to help deciding how pure the selected signal can be Pedro Parracho - MEFT@CERN 2008
Eye Candy Pedro Parracho - MEFT@CERN 2008
Eye Candy II Pedro Parracho - MEFT@CERN 2008
End Pedro Parracho - MEFT@CERN 2008
Backup Pedro Parracho - MEFT@CERN 2008
PID in NA60 This is the “muon part of NA60”: After the hadron absorber, only muons survive, and are tracked in the MWPCS back Pedro Parracho - MEFT@CERN 2008
Decision Trees 200 f(x) = 0 f(x) = 1 B = 10 S = 9 B = 1 S = 39 100 f(x) = 0 PMT Hits B = 37 S = 4 0 Energy (GeV) 0 0.4 Geometrically, a decision tree is an n-dimensional histogram whose bins are constructed recursively Each bin is associated with some value of the desired function f(x) MiniBoone, Byron Roe Pedro Parracho - MEFT@CERN 2008
Decision Trees 200 f(x) = 0 f(x) = 1 B = 10 S = 9 B = 1 S = 39 100 f(x) = 0 PMT Hits B = 37 S = 4 0 Energy (GeV) 0 0.4 For each variable find the best cut: Decrease in impurity = Impurity(parent) - Impurity(leftChild) -Impurity(rightChild) and partition using the best of the best Pedro Parracho - MEFT@CERN 2008
Decision Trees 200 f(x) = 0 f(x) = 1 B = 10 S = 9 B = 1 S = 39 100 f(x) = 0 PMT Hits B = 37 S = 4 0 Energy (GeV) 0 0.4 A common impurity measure is (Gini): Impurity = N * p*(1-p) where p = S / (S+B) N = S + B Pedro Parracho - MEFT@CERN 2008
How to use TMVA Pedro Parracho - MEFT@CERN 2008
Train the methods • Book a “factory” TMVA::Factory* factory = new TMVA::Factory(“<JobName>”, targetFile, ”<options>”) • Add Trees to the factory factory->AddSignalTree(sigTree, sigWeight); factory->AddBackgroundTree(bkgTreeA, bkgWeightA); • Add Variables factory->AddVariable(“VarName”, ‘I’) factory->AddVariable(“log(<VarName>)”, ‘F’) • Book the methods to use factory->BookMethod(TMVA::Types::<method enum>, “<MethodName>", “<options>") • Train, test and evaluate the methods factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); Pedro Parracho - MEFT@CERN 2008
Apply the methods • Book a “reader” TMVA::Reader *reader = new TMVA::Reader() • Add the variables reader->AddVariable(“<YourVar1>", &localVar1); reader->AddVariable(“log(<YourVar1>)", &localVar1); • Book Classifiers reader->BookMVA( “<YourClassifierName>", ”<WheightFile.weights.txt>” ); • Get the Classifier output reader->EvaluateMVA(“<YourClassifierName>") reader->EvaluateMVA("Cuts",signalEfficiency) Pedro Parracho - MEFT@CERN 2008