Durham Statistical Techniques Conference Summary II

Durham Statistical Techniques Conference Summary II Harrison B. Prosper Florida State University Workshop On Advanced Multivariate & Statistical Techniques Fermilab, 1 June 2002 Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Outline • Talks • Probability Density Estimation and Optimizing S/B, Sherry Towers • Estimation Higgs Mass with Neural Networks, Marcin Wolter • Support Vector Machines in Top Physics, Tony Vaiciulis • NN Optimization with Genetic Algorithms in Higgs Search, Frantisek Hakl & Elzbieta Richter-Was • Event Classification in Gamma Ray Air Showers Rudy Bock & Wolfgang Wittek • Multidimensional Methods, H.B.P • Comments • Summary Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Probability Density EstimationSherry Towers • Basic Idea (Parzen, 1960s) • Sherry Towers suggests: V is a local covariance matrix defined at each point Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Sherry Towers Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Comments It is hardly surprising that NN and PDE give similar results. NN approximates p(s|x) PDE approximates p(x|s) For NN, parameters are weights and threshold that must be tuned For PDE, parameters are data points that are given Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Sherry Towers Optimizing S/B discrimination Optimize S/B discrimination: reduce the number of variables! Add variables in decreasing order of discrimination power “In general case, variable deletion is safer than variable addition” Michael Goldstein Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

m1 m13 Measuring the Higgs Mass with NNMarcin Wolter • Exploit the ability of NN to encode mappings • Study a toy example in which the mapping is known: • Use as input x = (E1, E2, cos q12 ) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

NN in a Nutshell Minimize the empirical risk function with respect to w Solution (for large N) If t(x) = kd[1-I(x)], where I(x) = 1 if x is of class k, 0 otherwise D.W. Ruck et al., IEEE Trans. Neural Networks 1(4), 296-298 (1990) E.A. Wan, IEEE Trans. Neural Networks 1(4), 303-305 (1990) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Support Vector MachinesTony Vaiciulis Listen to Tony’s talk! Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

NN Optimized by Genetic Algorithms Frantisek Hakl et al. • Basic Idea • Evolve a population of neural networks, so that after many generations the “fittest” individuals have the desired properties. • Both the NN topology and parameters are evolved. • Get good results, comparable to those obtained with other algorithms Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Gamma/Hadron separation in atmospheric Cherenkov telescopes • Rudy Bock • multi-wavelength astrophysics • imaging Cherenkov telescopes (IACT-s) • image classification • methods under study • trying for a rigorous comparison Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Photomontage of the MAGIC telescope in La Palma (2000) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

R.K. Bock Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Multidimensional MethodsA Unified Perspective • Introduction • Thumbnail Sketch Of • Fisher Linear Discriminant (FLD) • Principal Component Analysis (PCA) • Independent Component Analysis (ICA) • Self Organizing Map (SOM) • Random Grid Search (RGS) • Probability Density Estimation (PDE) • Artificial Neural Network (ANN) • Support Vector Machine (SVM) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Introduction – i • One should distinguish the problem to be solved from the algorithm to solve it. • Typically, the problems to be solved, when viewed with sufficient detachment, are relatively few in number whereas algorithms to solve them are invented every day. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Introduction - iii • Problems that may benefit from multivariate analysis: • Discriminating signal from background • Selecting variables • Reducing dimensionality of the feature space • Finding regions of interest in the data • Simplifying optimization (by ) • Comparing models • Measuring parameters (e.g., tanb in SUSY) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Fisher Linear Discriminant • Purpose • Signal/background discrimination Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

x2 di x1 Principal Component Analysis • Purpose • Reduce dimensionality of data 1st principal axis 2nd principal axis Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

PCA algorithm in practice • Go from X = (x1,..xN)T to U = (u1,..uN)T in which lowest order correlations are absent. • Compute Cov(X) • Compute its eigenvaluesli and eigenvectors vi • Construct matrix T = Col(vi)T • Compute U = TX • Typically, one eliminates ui with smallest amount of variation Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Independent Component Analysis • Purpose • Find statistically independent variables. • Dimensionality reduction • Basic Idea • Assume X = (x1,..,xN)T is a linear sum X = AS of independent sources S = (s1,..,sN)T. Both A, the mixing matrix, and S are unknown. • Find a de-mixing matrix T such that the components of U = TX are statistically independent Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

ICA-Algorithm Given two densities f(U) and g(U) one measure of their “closeness” is the Kullback-Leibler divergence which is zero if, and only if, f(U) = g(U). We set and minimize K( f | g) (now called the mutual information) with respect to the de-mixing matrix T. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Comments • The Fisher linear discriminant (FLD), random grid search (RGS), probability density estimation (PDE), neural network (ANN) and support vector machine (SVM) are simply different algorithms to approximate the Bayes discriminant function D(X) = P(S|X)/P(B|X), or a function thereof. • It follows, therefore, that if a method is already close to the Bayes limit, then no other method, however sophisticated, can be expected to yield dramatic improvements. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Summary • Multivariate methods are useful if it is important to extract as much information from data as possible. • For classification problems, the common methods provide different approximations to the same mathematical quantity: The Bayes discriminant function. • It seems that no single method is uniformly the most powerful. So useful to study and compare a few of them in detail. • Big need: A way to compare N-d distributions. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002

Durham Statistical Techniques Conference Summary II

Durham Statistical Techniques Conference Summary II

Presentation Transcript

Statistical Techniques I

SUMMARY TECHNIQUES

CLs - Reporting Search Results Advanced Statistical Techniques in Particle Physics, Durham

Statistical Techniques I

Statistical Techniques

Statistical Techniques:

Statistical Techniques I

Statistical Techniques I

Conference Summary

Statistical Techniques I

Statistical Techniques I

Statistical Techniques I

Statistical Techniques I

Statistical Techniques I

Statistical Techniques I

Statistical Techniques I

Statistical Techniques I

Statistical Techniques I

Nonparametric Statistical Techniques

Statistical Techniques I

Statistical Techniques

Statistical Techniques I