1 / 46

Principal Component Analysis of Tree Topology

Yongdai Kim Seoul National University 2011. 6. 5. Principal Component Analysis of Tree Topology. Presented by J. S. Marron , SAMSI. Dyck Path Challenges. Data trees not like PC projections Branch Lengths ≥ 0 Big flat spots. Brain Data: Mean – 2 σ 1 PC1.

jamese
Download Presentation

Principal Component Analysis of Tree Topology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Yongdai Kim Seoul National University 2011. 6. 5 Principal Component Analysis of Tree Topology Presented by J. S. Marron, SAMSI

  2. Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots

  3. Brain Data: Mean – 2 σ1 PC1 Careful about values < 0

  4. Interpret’n: Directions Leave Positive Orthant • (pic here)

  5. Visualize Trees Important Note: Tendency Towards Large Flat Spots And Bursts of Nearby Branches

  6. Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model

  7. Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model Discussed by Dan

  8. Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model Discussed by Dan

  9. Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model Discussed by Lingsong

  10. Non-neg’ve Matrix Factorization • Ideas: • Linearly Approx. Data (as in PCA) • But Stay in Positive Orthant

  11. Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model

  12. Contents • Introduction • Proposed Method • Bayesian Factor Model • PCA • Estimation of Projected Trees

  13. Introduction • Given data , let bebranch length vectors. • Dimension p = # nodes in support (union) tree. • For tree , define tree topology vector , p-dimensional binary vector where • Goal: PCA method for

  14. Visualize Trees Important Note: Tendency Towards Large Flat Spots And Bursts of Nearby Branches

  15. Goal of Bayes Factor Model Model Large Flat Spots as yi = 0

  16. Proposed Method • Gaussian Latent Variable Model • Est. Corr. Matrix: Bayes Factor Model • PCA on Est’ed Correlation Matrix • Interpret in Tree Space

  17. Proposed Method • Gaussian Latent Variable Model

  18. Proposed Method • Estimation of the correlation matrix by Bayesian factor model • Estimate and by Bayesian factor model

  19. Proposed Method 3. PCA with an estimated correlation matrix • Apply the PCA to an estimated

  20. Proposed Method • Estimation of projected tree • Define projected trees on PCA directions • Estimate the projected trees by MCMC algorithm

  21. Bayesian Factor Model • Model • Priors • MCMC algorithm • Convergence diagnostic

  22. Bayesian Factor Model • Model

  23. Bayesian Factor Model • Prior • This prior has been proposed by Ghosh and Dunson(2009)

  24. Bayesian Factor Model • MCMC algorithm • Notation • Step 1. generate

  25. Bayesian Factor Model • MCMC algorithm • Step 2. generate where and

  26. Bayesian Factor Model • MCMC algorithm • Step 3. generate where and

  27. Bayesian Factor Model • MCMC algorithm • Step 4. generate where and

  28. Bayesian Factor Model • MCMC algorithm • Step 5. generate

  29. Bayesian Factor Model • Convergence diagnostic. • 100000 iteration of MCMC algorithm after 10000 burn-in iteration • 1000 posterior samples obtained at every 100 iteration • Trace plots, ACF (Auto Correlation functions) and histograms of the three selected s and a selected (Note ).

  30. Bayesian Factor Model • Convergence diagnostic: Three s • 100000 iteration of MCMC algorithm after 10000 burn-in iteration • 1000 posterior samples obtained at every 100 iteration • Trace plot, acf functions and histograms of the three selected s

  31. Bayesian Factor Model • Convergence diagnostic: A • 100000 iteration of MCMC algorithm after 10000 burn-in iteration • 1000 posterior samples obtained at every 100 iteration • Trace plot, acf functions and histograms of the three selected s(25%, 50%, 75%) and

  32. PCA • Scree plot

  33. Visualizing Modes of Variation

  34. Visualizing Modes of Variation

  35. Visualizing Modes of Variation

  36. Center Point, μ

  37. Approximately μ + 0.5 PC1

  38. Approximately μ + 1.0 PC1

  39. Approximately μ + 1.5 PC1

  40. Approximately μ + 2.0 PC1

  41. Center Point, μ

  42. Approximately μ - 0.5 PC1

  43. Approximately μ - 1.0 PC1

  44. Approximately μ - 1.5 PC1

  45. Approximately μ - 2.0 PC1

  46. Visualizing Modes of Variation • Hard to Interpret • Scaling Issues? • Promising and Intuitive • Work in Progress … • Future goals • Improved Notion of PCA • Tune Bayes Approach for Better Interpretation • Integrate with Non-Neg. Matrix Factorization • ……..

More Related