Statistical Learning Methods in HEAP

Statistical Learning Methods in HEAP Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH Statistical Learning: Introduction with a simple example Occam‘s Razor Decision Trees Local Density Estimators Methods Based on Linear Separation Examples: Triggers in HEP and Astrophysics Conclusion C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Statistical Learning • Does not use prior knowledge „No theory required“ • Learns only from examples „Trial and error“ „Learning by reinforcement“ • Two classes of statistical learning: discrete output 0/1: „classification“ continuous output: „regression“ • Application in High Energy- and Astro-Physics: Background suppression, purification of events Estimation of parameters not directly measured C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

A simple Example: Preparing a Talk # slides 0 1 2 3 4 5 6 x10 ExperimentalistsTheorists 0 1 2 3 4 5 6 x10 # formulas Data base established by Jens duringYoung Scientists Meeting at MPI C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

# slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Discriminating Theorists from Experimentalists: A First Analysis 0 2 4 6 x10 # formulas 0 2 4 6 x10 # slides Experimentalists Theorists First talks handed in Talks a week beforemeeting C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Simple „model“, but no completeseparation Completely separable, but only via complicated boundary # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas First Problems New talk by Ludger: 28 formulas on 31 slides # slides 0 1 2 3 4 5 6 x10 At this point we cannot know which feature is „real“! Use Train/Test or Cross-Validation! 0 1 2 3 4 5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Training Set E Test Set # slides 0 1 2 3 4 5 6 x10 Overtraining Training epochs 0 1 2 3 4 5 6 x10 # formulas See Overtraining - Want Generalization Need Regularization Train Test Want to tune the parameters of the learning algorithm depending on the overtraining seen! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Training Set E Test Set # slides 0 1 2 3 4 5 6 x10 Training epochs Regularization will ensure adequate performance (e.g. VC dimensions):Limit the complexity of the model “Factor 10” - Rule: (“Uncle Bernie’s Rule #2”) 0 1 2 3 4 5 6 x10 # formulas See Overtraining - Want Generalization Need Regularization Train Test C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Yes! But not of much use. No! „No free lunch“-theorem Wolpert 1996 Philosophy: Occam‘s Razor • Pluralitas non est ponenda sine necessitate. • Do not make assumptions, unless they are really • necessary. • From theories which describe the same phenomenon equally well • choose the one which contains the least number of assumptions. 14th century First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error. C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

# formulas #formulas < 20 exp #formulas > 60 th 0 2 4 6 x10 # slides #slides > 40 exp #slides < 40 th 0 2 4 6 x10 all events Classify Ringaile: 31 formulas on 32 slides #formulas > 60 #formulas < 20 rest th exp subset 20 < #formulas < 60 #slides < 40 #slides > 40 th th exp Decision Trees 20 < #formulas < 60 ? Regularization: Pruning C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Local Density Estimators Search for similar events already classified within specified region, count the members of the two classes in that region. # slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

# formulas # slides 0 2 4 6 x10 0 2 4 6 x10 31 32 out= Maximum Likelihood Regularization: Binning Correlation gets lost completely by projection! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

k=2 out= k=3 out= k=4 out= k=5 out= k-Nearest-Neighbour k=1 out= # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Regularization: Parameter k For every evaluation position the distances to each training position need to be determined! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

5 4 3 1 y 7 # slides 0 1 2 3 4 5 6 x10 6 5 8 3 x 10 7 8 6 Small box: checked 1,2,4,9 out= 0 1 2 3 4 5 6 x10 # formulas Large box: checked all out= Range Search 1 x 2 3 y y 9 6 4 5 8 x x 10 7 9 2 10 Regularization: Box-Size Tree needs to be traversed only partially if box size is small enough! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

# slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas Methods Based on Linear Separation Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! LDA (Fisher discr.) C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

1 0 -1.8 +3.6 +3.6 arbitrary inputs and hidden neurons 0 1 2 3 4 5 6 x10 -50 +20 +1.1 -1.1 +0.1 +0.2 # formulas # slides 0 1 2 3 4 5 6 x10 Neural Networks Network with two hidden neurons (gradient descent): Regularization: # hidden neurons weight decay C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Separating hyperplane with maximum distance to each data point: Maximum margin classifier Found by setting up condition for correct classfication and minimizing which leads to the Lagrangian Necessary condition for a minimum is Output becomes No! Replace dot products: The mapping to feature space is hidden in a kernel Non-separable case: Support Vector Machines Only linear separation? C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Physics Applications: Neural Network Trigger at HERA H1 keep physics reject background C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Eff@Rej=95%: NN 99.6% SVM 98.3% k-NN 97.7% RS 97.5% C4.5 97.5% ML 91.2% LDA 82% Trigger for J/y Events H1 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Eff@Rej=80%: NN 74% SVM 73% C4.5 72% RS 72% k-NN 71% LDA 68% ML 65% Triggering Charged Current Events signal background C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Astrophysics: MAGIC - Gamma/Hadron Separation Photon Hadron Training with Data and MC Evaluation with Data vs. s = signal (photon) enhancement factor Random Forest: s = 93.3 Neural Net: s = 96.5 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

transfer direction ~10µm ~300µm electron potential s of reconstruction in µm NN 3.6 SVM 3.6 k-NN 3.7 RS 3.7 ETA 3.9 CCOM 4.0 Future Experiment XEUS: Position of X-ray Photons (Application of Stat. Learning in Regression Problems) XEUS C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Conclusion • Statistical learning theory is full of subtle details (models statistics) • Widely used statistical learning methods studied: • Decision Trees • LDE: ML, k-NN, RS • Linear separation: LDA, Neural Nets, SVM‘s • Neural Networks found superior in the HEP and Astrophysics applications (classification, regression) studied so far • Further applications (trigger, offline analyses) under study C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

k-NN RS 2 2 4 3 4 3 2 2 3 3 5 5 5 5 Fit Gauss NN a=s(-2.1x - 1)b=s(+2.1x - 1) out=s(-12.7a-12.7b+9.4) From Classification to Regression C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003

Statistical Learning Methods in HEAP

Statistical Learning Methods in HEAP

Presentation Transcript

Statistical Learning Methods for Information Retrieval

Statistical Methods in HEP

Statistical Methods

Statistical Learning Methods

Statistical Methods

Statistical Methods

Chapter 11 Supervised Learning: STATISTICAL METHODS

Performance of Statistical Learning Methods

Statistical Methods

STATISTICAL LEARNING METHODS FOR MICROSTRUCTURES

Statistical Learning Methods

Statistical methods in NLP

Statistical methods in NLP

Statistical Methods

Statistical Learning Methods in Natural Language Processing

Machine Learning and Multivariate Statistical Methods in Particle Physics

Statistical Methods in Clinical Trials

Statistical Methods

Statistical Methods Bayesian methods

Chapter 11 Supervised Learning: STATISTICAL METHODS

Statistical Methods

Statistical Methods