1 / 68

Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time. Si Z er Analysis Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces M-rep data on manifolds. Mildly Non-Euclidean Spaces.

aengstrom
Download Presentation

Object Orie’d Data Analysis, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Orie’d Data Analysis, Last Time • SiZer Analysis • Statistical Inference for Histograms & S.P.s • Yeast Cell Cycle Data • OODA in Image Analysis • Landmarks, Boundary Rep’ns, Medial Rep’ns • Mildly Non-Euclidean Spaces • M-rep data on manifolds

  2. Mildly Non-Euclidean Spaces Statistical Analysis of M-rep Data Recall: Many direct products of: • Locations • Radii • Angles I.e. points on smooth manifold Data in non-Euclidean Space But only mildly non-Euclidean

  3. Mildly Non-Euclidean Spaces Statistical Analysis of M-rep Data Recall: Many direct products of: • Locations • Radii • Angles Mathematical Summarization: Lie Groups and/or symmetric spaces

  4. Mildly Non-Euclidean Spaces Frechét mean of numbers: Frechét mean in Euclidean Space: Frechét mean on a manifold: Replace Euclidean by Geodesic

  5. Mildly Non-Euclidean Spaces Useful View of Manifold Data: Tangent Space Center: Frechét Mean Reason for terminology “mildly non Euclidean”

  6. Mildly Non-Euclidean Spaces Analog of PCA? Principal geodesics: • Replace line that best fits data • By geodesic that best fits the data • Implemented as PCA in tangent space • But mapped back to surface • Fletcher (2004) Ja-Yeon Jeong will demo in: Bladder – Prostate – Rectum example

  7. Mildly Non-Euclidean Spaces Interesting Open Problems: • Fully geodesic PGA? • E.g. data “just north of equator” on sphere • Gaussian Distribution on Manifold? • Analog of Covariance? • Simulation on Manifold?

  8. Mildly Non-Euclidean Spaces Aside: There is a mathematical statistics literature on “data on manifolds” • Ruymgaart (1989) • Hendriks, Janssen & Ruymgaart (1992) • Lee & Ruymgaart (1996) • Kim (1998) • Bhattacharya & Patrangenaru (2003) …

  9. Strongly Non-Euclidean Spaces Trees as Data Objects From Graph Theory: • Graph is set of nodes and edges • Tree has root and direction Data Objects: set of trees

  10. Strongly Non-Euclidean Spaces Motivating Example: • Blood Vessel Trees in Brains • From Dr. Elizabeth Bullitt • Segmented from MRIs • Very complex structure • Want to study population of trees • Data Objects are trees

  11. Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

  12. Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

  13. Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

  14. Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

  15. Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

  16. Strongly Non-Euclidean Spaces Statistics on Population of Tree-Structured Data Objects? • Mean??? • Analog of PCA??? Strongly non-Euclidean, since: • Space of trees not a linear space • Not even approximately linear (no tangent plane)

  17. Strongly Non-Euclidean Spaces Mean of Population of Tree-Structured Data Objects? Natural approach: Frechét mean Requires a metric (distance) On tree space

  18. Strongly Non-Euclidean Spaces Appropriate metrics on tree space: Wang and Marron (2004) • Depends on: • Tree structure • And nodal attributes • Won’t go further here • But gives appropriate Frechét mean

  19. Strongly Non-Euclidean Spaces PCA on Tree Space? Key Ideas: • Replace 1-d subspace that best approximates data • By 1-d representation that best approximates data Wang and Marron (2004) define notion of Treeline (in stucture space)

  20. Strongly Non-Euclidean Spaces PCA on Tree Space? Also useful to consider 1-d representations In the space of nodal attributes. Simple Example: Blood vessel trees • Just 4 nodes & simplified to sticks • For computational tractability

  21. Strongly Non-Euclidean Spaces 4 node Blood vessel trees - Raw Data

  22. Strongly Non-Euclidean Spaces First PC: Note flipping of root Some images were upside down

  23. Strongly Non-Euclidean Spaces First PC projection plot: Shows all data at one end or other, Nobody near the middle, Where tree was degenerate in movie

  24. Strongly Non-Euclidean Spaces Proposed applications in M-rep world: • Multifigural objects with some figures missing • Multi-object images with some objects missing • … Toy Example: hands with missing fingers

  25. Return to Big Picture Main statistical goals of OODA: • Understanding population structure • PCA, PGA, … • Classification (i. e. Discrimination) • Understanding 2+ populations • Time Series of Data Objects • Chemical Spectra, Mortality Data

  26. Classification - Discrimination Background: Two Class (Binary) version: Using “training data” from Class +1 and Class -1 Develop a “rule” for assigning new data to a Class Canonical Example: Disease Diagnosis • New Patients are “Healthy” or “Ill” • Determined based on measurements

  27. Classification - Discrimination Next time: go into Classification vs. Clustering Supervised vs. Un-Supervised Learning As now done on 10/25/05

  28. Classification - Discrimination Terminology: For statisticians, these are synonyms For biologists, classification means: • Constructing taxonomies • And sorting organisms into them (maybe this is why discrimination was used, until politically incorrect…)

  29. Classification (i.e. discrimination) There are a number of: • Approaches • Philosophies • Schools of Thought Too often cast as: Statistics vs. EE - CS

  30. Classification (i.e. discrimination) EE – CS variations: • Pattern Recognition • Artificial Intelligence • Neural Networks • Data Mining • Machine Learning

  31. Classification (i.e. discrimination) Differing Viewpoints: Statistics • Model Classes with Probability Distribut’ns • Use to study class diff’s & find rules EE – CS • Data are just Sets of Numbers • Rules distinguish between these Current thought: combine these

  32. Classification (i.e. discrimination) Important Overview Reference: Duda, Hart and Stork (2001) • Too much about neural nets??? • Pizer disagrees… • Update of Duda & Hart (1973)

  33. Classification Basics Personal Viewpoint: Point Clouds

  34. Classification Basics Simple and Natural Approach: Mean Difference a.k.a. Centroid Method Find “skewer through two meatballs”

  35. Classification Basics For Simple Toy Example: Project On MD & split at center

  36. Classification Basics Why not use PCA? Reasonable Result? Doesn’t use class labels… • Good? • Bad?

  37. Classification Basics Harder Example (slanted clouds):

  38. Classification Basics PCA for slanted clouds: PC1 terrible PC2 better? Still misses right dir’n Doesn’t use Class Labels

  39. Classification Basics Mean Difference for slanted clouds: A little better? Still misses right dir’n Want to account for covariance

  40. Classification Basics Mean Difference & Covariance, Simplest Approach: Rescale (standardize) coordinate axes i. e. replace (full) data matrix: Then do Mean Difference Called “Naïve Bayes Approach”

  41. Classification Basics Problem with Naïve Bayes: Only adjusts Variances Not Covariances Doesn’t solve this problem

  42. Classification Basics Better Solution: Fisher Linear Discrimination Gets the right dir’n How does it work?

  43. Fisher Linear Discrimination Other common terminology (for FLD): Linear Discriminant Analysis (LDA)

  44. Fisher Linear Discrimination Careful development: Useful notation (data vectors of length ): Class +1: Class -1: Centerpoints: and

  45. Fisher Linear Discrimination Covariances, for (outer products) Based on centered, normalized data matrices: Note: use “MLE” version of estimated covariance matrices, for simpler notation

  46. Fisher Linear Discrimination Major Assumption: Class covariances are the same (or “similar”) Like this: Not this:

  47. Fisher Linear Discrimination Good estimate of (common) within class cov? Pooled (weighted average) within class cov: based on the combined full data matrix:

  48. Fisher Linear Discrimination Note: is similar to from before I.e. covariance matrix ignoring class labels Important Difference: Class by Class Centering Will be important later

  49. Fisher Linear Discrimination Simple way to find “correct cov. adjustment”: Individually transform subpopulations so “spherical” about their means For define

  50. Fisher Linear Discrimination Then: In Transformed Space, Best separating hyperplane is Perpendicular bisector of line between means

More Related