1 / 30

Multivariate Analysis Past, Present and Future

Multivariate Analysis Past, Present and Future. Harrison B. Prosper Florida State University PHYSTAT 2003 10 September 2003. Outline. Introduction Historical Note Current Practice Issues Summary. Introduction. Data are invariably multivariate Particle physics ( h , f , E, f)

fai
Download Presentation

Multivariate Analysis Past, Present and Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate AnalysisPast, Present and Future Harrison B. Prosper Florida State University PHYSTAT 2003 10 September 2003 Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  2. Outline • Introduction • Historical Note • Current Practice • Issues • Summary Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  3. Introduction • Data are invariably multivariate • Particle physics (h, f, E, f) • Astrophysics (θ, f, E, t) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  4. Introduction – II A Textbook Example • Objects • Jet 1 (b) 3 • Jet 2 3 • Jet 3 3 • Jet 4 (b) 3 • Positron 3 • Neutrino 2 17 Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  5. Introduction – III • Astrophysics/Particle physics: Similarities • Events • Interesting events occur at random • Poisson processes • Backgrounds are important • Experimental response functions • Huge datasets Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  6. Introduction – IV • Differences • In particle physics we control when events occur and under what conditions • We have detailed predictions of the relative frequency of various outcomes Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  7. Time → Introduction – VAll we do is Count! • Our experiments are ideal Bernoulli trials • At Fermilab, each collision, that is, trial, is conducted the same way every 400ns • de Finetti’s analysis of exchangeable trials is an accurate model of what we do Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  8. Introduction – VI • Typical analysis tasks • Data Compression • Clustering and cluster characterization • Classification/Discrimination • Estimation • Model selection/Hypothesis testing • Optimization Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  9. Historical Note Karl Pearson (1857 – 1936) R.A. Fisher (1890 – 1962) P.C. Mahalanobis (1893 – 1972) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  10. Historical Note – Iris Data Iris Versicolor Iris Sotosa R.A. Fisher, The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, v. 7, p. 179-188 (1936) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  11. Iris Data • Variables • X1 Sepal length • X2 Sepal width • X3 Petal length • X4 Petal width • “What linear function of the four measurements will maximize the ratio of the difference between the specific means to the standard deviations within species?” R.A. Fisher Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  12. Fisher Linear Discriminant (1936) Solution: Which is the same, within a constant, as Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  13. Current Practice in Particle Physics • Reducing number of variables • Principal Component Analysis (PCA) • Discrimination/Classification • Fisher Linear Discriminant (FLD) • Random Grid Search (RGS) • Feedforward Neural Network (FNN) • Kernel Density Estimation (KDE) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  14. Current Practice – II • Parameter Estimation • Maximum Likelihood (ML) • Bayesian (KDE and analytical methods) • e.g., see talk by Florencia Canelli (12A) • Weighting • Usually 0, 1, referred to as “cuts” • Sometimes use the R. Barlow method Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  15. B = S = Cuts (0, 1 weights) Points that lie below the cuts are “cut out” 1 0 We refer to (x0, y0) as a cut-point Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  16. B = S = Grid Search Apply cuts at each grid point compute some measure of their effectiveness and choose most effective cuts Curse of dimensionality: number of cut-points ~ NbinNdim Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  17. 1 Signal fraction 0 1 0 Background fraction Random Grid Search Take each point of the signal class as a cut-point y n = # events in sample k = # events after cuts fraction = n/k x H.B.P. et al, Proceedings, CHEP 1995 Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  18. Example: DØ Top Discovery (1995) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  19. r(x,y) = constant defines the optimal decision boundary Optimal Discrimination Bayes Discriminant Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  20. FeedForward Neural Networks • Applications • Discrimination • Parameter estimation • Function and density estimation • Basic Idea • Encode mapping (Kolmogorov, 1950s). using a set of 1-D functions. Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  21. LQ Example: DØSearch for LeptoQuarks l l q LQ q g Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  22. Issues • Method choice • Life is short and data finite; so how should one choose a method? • Model complexity • How to reduce dimensionality of data, while minimizing loss of “information”? • How many model parameters? • How should one avoid over-fitting? Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  23. Issues – I I • Model robustness • Is a cut on a multivariate discriminant necessarily more sensitive to modeling errors than a cut on each of its input variables? • What is a practical, but useful, way to assess sensitivity to modeling errors and robustness with respect to assumptions? Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  24. Issues - III • Accuracy of predictions • How should one place “error bars” on multivariate-based results? • Is a Bayesian approach useful? • Goodness of fit • How can this be done in multiple dimensions? Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  25. Summary • After ~ 80 years of effort we have many powerful methods of analysis • A few of which are now used routinely in physics analyses • The most pressing need is to understand some issues better so that when the data tsunami strikes we can respond sensibly Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  26. FNN – Probabilistic Interpretation Minimize the empirical risk function with respect to w Solution (for large N) If t(x) = kd[1-I(x)], where I(x) = 1 if x is of class k, 0 otherwise D.W. Ruck et al., IEEE Trans. Neural Networks 1(4), 296-298 (1990) E.A. Wan, IEEE Trans. Neural Networks 1(4), 303-305 (1990) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  27. Self Organizing Map • Basic Idea (Kohonen, 1988) • Map each of K feature vectors X = (x1,..,xN)T into one of Mregions of interest defined by the vector wm so that all X mapped to a given wm are closer to it than to all remaining wm. • Basically, perform a coarse-graining of the feature space. Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  28. Support Vector Machines • Basic Idea • Data that are non-separable in N-dimensions have a higher chance of being separable if mapped into a space of higher dimension • Use a linear discriminant to partition the high dimensional feature space. Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  29. Independent Component Analysis • Basic Idea • Assume X = (x1,..,xN)T is a linear sum X=AS of independent sources S = (s1,..,sN)T. Both A, the mixing matrix, and S are unknown. • Find a de-mixing matrix T such that the components of U = TX are statistically independent Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

  30. Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

More Related