1 / 24

Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time. Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces Tree spaces No Tangent Plane Classification - Discrimination Mean Difference (Centroid method) Fisher Linear Discrimination Graphical Viewpoint (nonparametric)

Download Presentation

Object Orie’d Data Analysis, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Orie’d Data Analysis, Last Time • Mildly Non-Euclidean Spaces • Strongly Non-Euclidean Spaces • Tree spaces • No Tangent Plane • Classification - Discrimination • Mean Difference (Centroid method) • Fisher Linear Discrimination • Graphical Viewpoint (nonparametric) • Maximum Likelihood Derivation (Gaussian based)

  2. Classification - Discrimination Background: Two Class (Binary) version: Using “training data” from Class +1 and Class -1 Develop a “rule” for assigning new data to a Class Canonical Example: Disease Diagnosis • New Patients are “Healthy” or “Ill” • Determined based on measurements

  3. Classification - Discrimination Important Distinction: Classification vs. Clustering Classification: Class labels are known, Goal: understand differences Clustering: Goal: Find class labels (to be similar) Both are about clumps of similar data, but much different goals

  4. Classification - Discrimination Important Distinction: Classification vs. Clustering Useful terminology: Classification: supervised learning Clustering: unsupervised learning

  5. Fisher Linear Discrimination Graphical Introduction (non-Gaussian):

  6. Classical Discrimination Above derivation of FLD was: • Nonstandard • Not in any textbooks(?) • Nonparametric (don’t need Gaussian data) • I.e. Used no probability distributions • More Machine Learning than Statistics

  7. Classical Discrimination Summary of FLD vs. GLR: • Tilted Point Clouds Data • FLD good • GLR good • Donut Data • FLD bad • GLR good • X Data • FLD bad • GLR OK, not great Classical Conclusion: GLR generally better (will see a different answer for HDLSS data)

  8. Classical Discrimination FLD Generalization II (Gen. I was GLR) Different prior probabilities Main idea: Give different weights to 2 classes • I.e. assume not a priori equally likely • Development is “straightforward” • Modified likelihood • Change intercept in FLD • Won’t explore further here

  9. Classical Discrimination FLD Generalization III Principal Discriminant Analysis • Idea: FLD-like approach to > two classes • Assumption: Class covariance matrices are the same (similar) (but not Gaussian, same situation as for FLD) • Main idea: Quantify “location of classes” by their means

  10. Classical Discrimination Principal Discriminant Analysis (cont.) Simple way to find “interesting directions” among the means: PCA on set of means i.e. Eigen-analysis of “between class covariance matrix” Where Aside: can show: overall

  11. Classical Discrimination Principal Discriminant Analysis (cont.) But PCA only works like Mean Difference, Expect can improve by taking covariance into account. Blind application of above ideas suggests eigen-analysis of:

  12. Classical Discrimination Principal Discriminant Analysis (cont.) There are: • smarter ways to compute (“generalized eigenvalue”) • other representations (this solves optimization prob’s) Special case: 2 classes, reduces to standard FLD Good reference for more: Section 3.8 of: Duda, Hart & Stork (2001)

  13. Classical Discrimination Summary of Classical Ideas: • Among “Simple Methods” • MD and FLD sometimes similar • Sometimes FLD better • So FLD is preferred • Among Complicated Methods • GLR is best • So always use that • Caution: • Story changes for HDLSS settings

  14. HDLSS Discrimination Recall main HDLSS issues: • Sample Size, n < Dimension, d • Singular covariance matrix • So can’t use matrix inverse • I.e. can’t standardize (sphere) the data (requires root inverse covariance) • Can’t do classical multivariate analysis

  15. HDLSS Discrimination An approach to non-invertible covariances: • Replace by generalized inverses • Sometimes called pseudo inverses • Note: there are several • Here use Moore Penrose inverse • As used by Matlab (pinv.m) • Often provides useful results (but not always) Recall Linear Algebra Review…

  16. Recall Linear Algebra Eigenvalue Decomposition: For a (symmetric) square matrix Find a diagonal matrix And an orthonormal matrix (i.e. ) So that: , i.e.

  17. Recall Linear Algebra (Cont.) • Eigenvalue Decomp. solves matrix problems: • Inversion: • Square Root: • is positive (nonn’ve, i.e. semi) definite all

  18. Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: For

  19. Recall Linear Algebra (Cont.) • Easy to see this satisfies the definition of • Generalized (Pseudo) Inverse • symmetric • symmetric

  20. Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: Idea: matrix inverse on non-null space of linear transformation Reduces to ordinary inverse, in full rank case, i.e. for r = d, so could just always use this Tricky aspect: “>0 vs. = 0” & floating point arithmetic

  21. HDLSS Discrimination Application of Generalized Inverse to FLD: Direction (Normal) Vector: Intercept: Have replaced by

  22. HDLSS Discrimination Toy Example: Increasing Dimension

  23. Next Topics: • HDLSS Properties of FLD • Generalized inverses??? • Think about HDLSS, as on 04/03/02 • Maximal Data Piling • Embedding and Kernel Spaces

More Related