Data Analysis in Particle Physics (short review)

Data Analysis in Particle Physics (short review) G.Macharashvili HEPI, Tbilisi State University gogi maWaraSvili 2012, baTumi

G.Macharashvili Some basic concepts - All physical parameters, including speed of light, proton mass, electron charge, etc, are random numbers. So they all contain the uncertainty. Mathematical constants, such as π, e, etc, in contrast, are calculable with any precision. - Experimental results in general are presented in a conventional form: x (parameter value) ± σ (uncertainty). σ – has to be considered as a 'figure of merit' of an experiment. There is no way to present an experimental result without statistical consideration (statistical background). - So the term 'measurement' is not perfect. More suitable to use the term 'estimate', coming from the statistical analysis. Any estimation implies some degree of confidence. - Measurement uncertainty is independent of the measured value

G.Macharashvili Other exp. Something else ? Initial idea Theory model LOI Some board approval Manpower, management, funding Experiment project (Technical design report) Simulation: optimization, software testing, performance estimate, etc Simulation, optimization, l Simulation, optimization, l Engineering design Engineering software Building, testing, assembling Software development Building assemblingl Building assembling Running, on-line analysis Off-line analysis Theoretical interpretation Publication

G.Macharashvili Different types of analysis - on-line analysis (single pass in real time) - off-line analysis (multy pass in batch mode) - simulation

G.Macharashvili Preliminary data anlysis - a perfect hardware trigger device removes almost all events of unwanted topology. Nevertheless some 'bad' events are accepted by DAQ. Removing evidently useless events (e.g. containing 'noise') is the very first stage of analysis. It is most important in case of experiment on seeking rare decays (e.g. μ-e or μ-eγ conversions) ; - calibration - event reconstruction - dead-time estimate - normalization or event weight definition. - luminosity estimate, dead-time correction, overall detection efficiency estimate, etc.

G.Macharashvili Calibration - All particle detectors are sensitive to charged particle ionization losses with the exception of the cerenkov light detectors. Even neutral particles are detected via the charged secondaries of their interactions. The front-end electronics convert the electronic signals coming from detectors (e.g. MWPC, PM, etc) to some numbers. Calibration is the procedure to define an algorithm to convert signal numeric value to corresponding physical parameter (energy deposit, total energy, momentum, time-of-flight, etc ). Calibration can be done by simulation, dedicated measurements (e.g. em calorimeter calibration with electron beam), or using some kinematical constraints.

G.Macharashvili Event reconstruction Physical parameters definition - particle momentum (energy) estimate Magnetic spectrometer EM or hadronic calorimeter Silicon detector (stopping hadrons) - particle identification (methods ??) By energy deposit in a detector (silicon detector,calorimeter) Using the cerenkov radiation (C counters,...) - reliability tests

G.Macharashvili Event selection - measured parameters cuts - cuts on reconstructed parameters (e.g. energy/momentum/etc) - analysing some critical parameters (e.g. the Cerenkov signals) - some other selection criteria ...

G.Macharashvili Event selection Detector output signals are expressed in the energy deposits. (simulation)

G.Macharashvili Particle identification example ANKE experiment. The 3-layer Silicon vertex detctor. Proton/deuteron separation by energy deposit

G.Macharashvili Particle identification example - example: time-of-flight with known momentum - example: energy deposit with known energy

G.Macharashvili Deat-time analysis The dead-time is defined as a time fraction in which the whole detector is insensitive to accept an event. Say the dead-time value in an experiment is 10%, no matter where the dead-time comes from. It means that any sort of event coming from different detectors and triggered by its own trigger device with its specific rate (no matter whether it is low or high) losses 10%. Even more, if the dead-time varies in time, every sort of events loss their part of the same amount. The dead-time is one of the important parameters of a setup to be corrected to get the final results (e.g. cross-section) with right normalization. Sometimes (e.g. in asymmetry measurements) dead-time influence is eliminated.

G.Macharashvili Deat-time analysis example Time intervals between consecutive events are exponentialy distributed. The example plot shows that below ~80 μs some events are lost. It means that in most cases below 80 μs the detector is insensitive. To normalize the number of events one needs to 'fill in' the empty area. ANKE experiment

G.Macharashvili Kinematical analysis After events physical parameters are reconstructed (e.g. momentum, energy deposit, time-of-flight, particle identification, etc) the kinematical constraints are used to define some additional parameters. In general the following energy-momentum conservation equation is used: Pa + Pb = ∑ Pi where P's are 4-momenta of initial (beam, target) and final particles. P = (px,py,pz,E) and P^2 = m^2 - missing-mass method (shown below) - Dalitz plot - background subtraction (shown below) After background is subtracted one cannot access to individual event but only to the sample (number of events in the peak) defined by the background subtraction.

G.Macharashvili Fitting experimental data. An example Normal distribution of some parameter without background

G.Macharashvili Fitting experimental data. Standard χ2 distribution. The most probable value equals 1.

G.Macharashvili Kinematical analysis: Background subtraction (Missing-mass fitting) ANKE eperiment Neutron identification in pd-ppn breakup reaction when two protons are detected in the detector Background subtraction. After background is subtracted one cannot access to individual event but only to the sample (number of events in the peak) defined by the background subtraction (fitting). Fitting is done by root::TMinuit class.

G.Macharashvili Efficiency (acceptance) example The triangular distribution of the interaction point comes from the polarized cell target density distribution. In red the accepted events are shown. PAX Project Simulation result

G.Macharashvili Some special pdf's. Ep is the most probable energy deposit due to ionization. Equals to 2 MeV for m.i.p. (e.g. cosmic muon) in 1 cm scintillator.

G.Macharashvili Error propagation Y – random function of random variables Main rule: uncerainties have to be summed quadraticaly, correlations have to be accounted... Simple examples of uncorrelated variables

G.Macharashvili Systematic errors and uncertainties Systematic effects is a general category which includes effects such as background, selection bias, efficiency, resolution, variation of efficiency, dead-time, etc. The uncertainty in the estimation of such a systematic effect is called a systematic uncertainty. National Institute of Standards and Technology NIST Technical note 1297, 1994 edition

G.Macharashvili Systematic errors (uncertainties) Examples: 1. Energy measured in a calorimeter isgiven by: E = aS + b where S is a digitized signal and a and b are calibration parameters. The uncertainty on E has a random part due to the uncertainty on S and E has a systematic part due to uncertainties on a adnd b. Whereas a and b are used as constants and their discrepancy is applied systematicaly to all events. So we have to add (quadratically) a and b uncertainties. 2. Measurements are taken with a steel rule which was calibrated at a temperature of 15 C and the measurements are taken at 20 C. In this case we have some systematic error (it is actually a mistake) or bias which we can correct by accounting the thermal expantion coefficient. The measurement uncertainty does not change in this case.

G.Macharashvili Multivariate analysis

G.Macharashvili Multivariate analysis (MVA, root namespace TMVA) Linear case

G.Macharashvili Multivariate analysis (MVA, root namespace TMVA) Nonlinear case Nonlinear case: Artificial Neural Network (ANN) or Multilayer Perceptron (MLP)

G.Macharashvili Multivariate analysis (Neural Network) ANN structure

G.Macharashvili Multivariate analysis (Neural Network) Backpropagation ...

G.Macharashvili Multivariate analysis (Neural Network) u = W x - scaled signal changes the output from constant (W = 0) to extremly nonlinear (W >> 1)

G.Macharashvili Multivariate analysis (Neural Network) There are two different applications of neural nets, simple function approximation and classification. The net could be trained, for instance, to estimate the energy of a hadronic showers from the energies deposited in different cells of the hadronic calorimeter. The net could also be trained to separate electron from hadron showers. then it should produce a number close to 1 for electrons and close to 0 for hadrons. With the large number of parameters it is evident that the solution is not always unique. Networks with different parameters can perform the same function within the desired accuracy. Neural net is an approximation with undefined number of parameters and these parameters have not any physical meaning.

G.Macharashvili Simulation

G.Macharashvili Simulation. geant4 Geant4 features: - exp setup geometry description (human body model is available) - materials properties data base - particles data base (including hevy ions) - physiscs processes and decayes assigned to each particle - several visualization drivers are available

G.Macharashvili Geant4 (examples)

G.Macharashvili Analysis tools - root (CERN project) includes: Minuit, Pythia, TMVA, MLP, etc ; http://root.cern.ch http://tmva.sourceforge.net - Geant4 (CERN-KEK project) ; http://geant4.cern.ch - CLHEP (CERN project) ; - AIDA (Abstract Interface for Data Analysis) http://aida.freehep.org - ESME, LONG1D, BNL-ORBIT, ACCSYM (accelerator beam phase-space evolution study)

G.Macharashvili Some modern analysis methods and algorithms root - artifical neural networks - statistical analysis - principal components method - approximation, extrapolation, etc - vector/matrix algebra - 3-D geometry and affine transformations - random event generators - physical interaction generators. e.g. Pythia6/Pythia8 - … and many others

G.Macharashvili Coding languages - FORTRAN (obsolete) - c (used in low level software) - c++ (e.g. root, Geant4, CLHEP and many more) - java (e.g. AIDA) - shell scripts (depend on OS type)

G.Macharashvili Some useful references Books: - R.Frühwirth et al., Data Analysis for High-Energy Physics, Cambridge, 2000 - W.T.Eadie et al., Statistical Methods in Particle Physics,1971 - C.Grupen. Particle detectors. - R.J.Barlow. Statistics: A Gude to the Use of Statistical Methods in the Physical Science, Willey, 1989 - R.Rojas. Theory of Neural Networks, Springer, 1991 Sites: www.itl.nist.gov/div898/handbook (Engineering statistics) www.nu.to.infn.it/statistics and many many others ...

G.Macharashvili

Data Analysis in Particle Physics (short review)