Statistical Learning in Astrophysics

Statistical Learning in Astrophysics Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH Jens Zimmermann zimmerm@mppmu.mpg.de Statistical Learning? Three Classes of Learning Methods Applications in Physics Analysis Training the Learning Methods Examples Conclusion Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

# slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Some Events ExperimentalistsTheorists Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

# slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas First Analysis 0 2 4 6 x10 # formulas 0 2 4 6 x10 # slides Experimentalists Theorists Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

# formulas #formulas < 20 exp #formulas > 60 th 0 2 4 6 x10 # slides #slides > 40 exp #slides < 40 th 0 2 4 6 x10 all events #formulas > 60 #formulas < 20 rest th exp subset 20 < #formulas < 60 #slides < 40 #slides > 40 th exp Decision Trees 20 < #formulas < 60 ? Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Local Density Estimators Search for similar events that are already classified and count the members of the two classes. (e.g. k-Nearest-Neighbour) # slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Methods Based on Linear Separation Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! # slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

1 0 -1.8 +3.6 +3.6 -50 +20 +1.1 -1.1 +0.1 +0.2 # formulas # slides Neural Networks Train NN with two hidden neurons (gradient descent): Construct NN with two separating hyperplanes: 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

NN Training 8 hidden neurons = 8 separating lines signal Test-Error background Train-Error Training Epochs Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Easily separable but with noise? Without noise and separable by complicated boundary? # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Too high degree of polynomial results in interpolation but too low degree means bad approximation Training of Statistical Learning Methods Statistical Learning Method: From N examples infer a rule Important: Generalisation vs. Overtraining Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Classification Offline „Purification“ Gamma vs. Hadron MAGIC ~10µm ~300µm Regression XEUS X-ray CCD Applications in Physics Analysis Classification Online „Trigger“ H1 L2NN Charged Current Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

PCA FFT Symmetrie Fit Features Choose raw quantities Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

transfer direction ~10µm ~300µm electron potential • of reconstruction in µm: Neural Networks 3.6 classical methods k-Nearest-Neighbour 3.7 ETA 3.9 CCOM 4.0 Regression of the Incident Position of X-ray Photons XEUS Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Pileup vs. Single photon ? ? 99/67 99/52 pileup rejection [%] classical algorithm „XMM“ photon efficiency [%] Pileup Recognition – Setup Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

a SuperCuts + Neural Network 46.8 s SuperCuts 39.0 s MAGIC - Gamma/Hadron Separation Observation Mkn421 22.04.04 Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Random Forest C4.5 CART k-Nearest-Neighbour Maximum Likelihood Support VectorMachines Linear DiscriminantAnalysis Neural Networks Conclusion • Three classes of statistical learning methods • Decision Trees (Bagging) • Local Density Estimators • Linear Separation • Many applications in current astrophysics experiments and analysis • Compared to classical methods usually at least small improvements Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Our hypothesis should have the maximum probability given the data: Shannon MDLP Rissanen Theory of Communication: Minimum Description Length Principle Hypothesis H and Data D Bayes 18th century 1948 Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04 1990

Relationship is uniform convergence: Upper bound for the actual risk (Vapnik): h: VC Dimension of learning method (complexity) Create nested subsets of function spaces with rising complexity h1 < h2 < h3 Statistical Learning Theory: Structural Risk Minimization We have N training events with input xi and correct output yi empirical risk actual risk 1996 Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Separating hyperplane with maximum distance to each datapoint: Maximum margin classifier Found by setting up condition for correct classfication and minimizing which leads to the Lagrangian Necessary condition for a minimum is So the output becomes KKT: only SV have No! Replace dot products: The mapping to feature space is hidden in a kernel Non-separable case: Support Vector Machines Only linear separation? Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Finally Include Statistical Learning Theory: ~25 formulas on 19 slides Skip Theory: ~7 formulas on 16 slides # slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas Jens Zimmermann, Forschungszentrum Jülich, Astroteilchenschule 10/04

Statistical Learning in Astrophysics