260 likes | 400 Views
Durham Statistical Techniques Conference Summary II. Harrison B. Prosper Florida State University Workshop On Advanced Multivariate & Statistical Techniques Fermilab, 1 June 2002. Outline. Talks Probability Density Estimation and Optimizing S/B, Sherry Towers
E N D
Durham Statistical Techniques Conference Summary II Harrison B. Prosper Florida State University Workshop On Advanced Multivariate & Statistical Techniques Fermilab, 1 June 2002 Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Outline • Talks • Probability Density Estimation and Optimizing S/B, Sherry Towers • Estimation Higgs Mass with Neural Networks, Marcin Wolter • Support Vector Machines in Top Physics, Tony Vaiciulis • NN Optimization with Genetic Algorithms in Higgs Search, Frantisek Hakl & Elzbieta Richter-Was • Event Classification in Gamma Ray Air Showers Rudy Bock & Wolfgang Wittek • Multidimensional Methods, H.B.P • Comments • Summary Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Probability Density EstimationSherry Towers • Basic Idea (Parzen, 1960s) • Sherry Towers suggests: V is a local covariance matrix defined at each point Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Sherry Towers Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Comments It is hardly surprising that NN and PDE give similar results. NN approximates p(s|x) PDE approximates p(x|s) For NN, parameters are weights and threshold that must be tuned For PDE, parameters are data points that are given Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Sherry Towers Optimizing S/B discrimination Optimize S/B discrimination: reduce the number of variables! Add variables in decreasing order of discrimination power “In general case, variable deletion is safer than variable addition” Michael Goldstein Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
m1 m13 Measuring the Higgs Mass with NNMarcin Wolter • Exploit the ability of NN to encode mappings • Study a toy example in which the mapping is known: • Use as input x = (E1, E2, cos q12 ) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
NN in a Nutshell Minimize the empirical risk function with respect to w Solution (for large N) If t(x) = kd[1-I(x)], where I(x) = 1 if x is of class k, 0 otherwise D.W. Ruck et al., IEEE Trans. Neural Networks 1(4), 296-298 (1990) E.A. Wan, IEEE Trans. Neural Networks 1(4), 303-305 (1990) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Support Vector MachinesTony Vaiciulis Listen to Tony’s talk! Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
NN Optimized by Genetic Algorithms Frantisek Hakl et al. • Basic Idea • Evolve a population of neural networks, so that after many generations the “fittest” individuals have the desired properties. • Both the NN topology and parameters are evolved. • Get good results, comparable to those obtained with other algorithms Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Gamma/Hadron separation in atmospheric Cherenkov telescopes • Rudy Bock • multi-wavelength astrophysics • imaging Cherenkov telescopes (IACT-s) • image classification • methods under study • trying for a rigorous comparison Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Photomontage of the MAGIC telescope in La Palma (2000) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
R.K. Bock Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Multidimensional MethodsA Unified Perspective • Introduction • Thumbnail Sketch Of • Fisher Linear Discriminant (FLD) • Principal Component Analysis (PCA) • Independent Component Analysis (ICA) • Self Organizing Map (SOM) • Random Grid Search (RGS) • Probability Density Estimation (PDE) • Artificial Neural Network (ANN) • Support Vector Machine (SVM) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Introduction – i • One should distinguish the problem to be solved from the algorithm to solve it. • Typically, the problems to be solved, when viewed with sufficient detachment, are relatively few in number whereas algorithms to solve them are invented every day. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Introduction - iii • Problems that may benefit from multivariate analysis: • Discriminating signal from background • Selecting variables • Reducing dimensionality of the feature space • Finding regions of interest in the data • Simplifying optimization (by ) • Comparing models • Measuring parameters (e.g., tanb in SUSY) Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Fisher Linear Discriminant • Purpose • Signal/background discrimination Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
x2 di x1 Principal Component Analysis • Purpose • Reduce dimensionality of data 1st principal axis 2nd principal axis Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
PCA algorithm in practice • Go from X = (x1,..xN)T to U = (u1,..uN)T in which lowest order correlations are absent. • Compute Cov(X) • Compute its eigenvaluesli and eigenvectors vi • Construct matrix T = Col(vi)T • Compute U = TX • Typically, one eliminates ui with smallest amount of variation Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Independent Component Analysis • Purpose • Find statistically independent variables. • Dimensionality reduction • Basic Idea • Assume X = (x1,..,xN)T is a linear sum X = AS of independent sources S = (s1,..,sN)T. Both A, the mixing matrix, and S are unknown. • Find a de-mixing matrix T such that the components of U = TX are statistically independent Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
ICA-Algorithm Given two densities f(U) and g(U) one measure of their “closeness” is the Kullback-Leibler divergence which is zero if, and only if, f(U) = g(U). We set and minimize K( f | g) (now called the mutual information) with respect to the de-mixing matrix T. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Comments • The Fisher linear discriminant (FLD), random grid search (RGS), probability density estimation (PDE), neural network (ANN) and support vector machine (SVM) are simply different algorithms to approximate the Bayes discriminant function D(X) = P(S|X)/P(B|X), or a function thereof. • It follows, therefore, that if a method is already close to the Bayes limit, then no other method, however sophisticated, can be expected to yield dramatic improvements. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002
Summary • Multivariate methods are useful if it is important to extract as much information from data as possible. • For classification problems, the common methods provide different approximations to the same mathematical quantity: The Bayes discriminant function. • It seems that no single method is uniformly the most powerful. So useful to study and compare a few of them in detail. • Big need: A way to compare N-d distributions. Workshop on Advanced Multivariate & Statistical Techniques, Fermilab, 1 June 2002