470 likes | 566 Views
Yongdai Kim Seoul National University 2011. 6. 5. Principal Component Analysis of Tree Topology. Presented by J. S. Marron , SAMSI. Dyck Path Challenges. Data trees not like PC projections Branch Lengths ≥ 0 Big flat spots. Brain Data: Mean – 2 σ 1 PC1.
E N D
Yongdai Kim Seoul National University 2011. 6. 5 Principal Component Analysis of Tree Topology Presented by J. S. Marron, SAMSI
Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots
Brain Data: Mean – 2 σ1 PC1 Careful about values < 0
Interpret’n: Directions Leave Positive Orthant • (pic here)
Visualize Trees Important Note: Tendency Towards Large Flat Spots And Bursts of Nearby Branches
Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model
Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model Discussed by Dan
Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model Discussed by Dan
Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model Discussed by Lingsong
Non-neg’ve Matrix Factorization • Ideas: • Linearly Approx. Data (as in PCA) • But Stay in Positive Orthant
Dyck Path Challenges • Data trees not like PC projections • Branch Lengths ≥ 0 • Big flat spots • Alternate Approaches: • Branch Length Representation • Tree Pruning • Non-negative Matrix Factorization • Bayesian Factor Model
Contents • Introduction • Proposed Method • Bayesian Factor Model • PCA • Estimation of Projected Trees
Introduction • Given data , let bebranch length vectors. • Dimension p = # nodes in support (union) tree. • For tree , define tree topology vector , p-dimensional binary vector where • Goal: PCA method for
Visualize Trees Important Note: Tendency Towards Large Flat Spots And Bursts of Nearby Branches
Goal of Bayes Factor Model Model Large Flat Spots as yi = 0
Proposed Method • Gaussian Latent Variable Model • Est. Corr. Matrix: Bayes Factor Model • PCA on Est’ed Correlation Matrix • Interpret in Tree Space
Proposed Method • Gaussian Latent Variable Model
Proposed Method • Estimation of the correlation matrix by Bayesian factor model • Estimate and by Bayesian factor model
Proposed Method 3. PCA with an estimated correlation matrix • Apply the PCA to an estimated
Proposed Method • Estimation of projected tree • Define projected trees on PCA directions • Estimate the projected trees by MCMC algorithm
Bayesian Factor Model • Model • Priors • MCMC algorithm • Convergence diagnostic
Bayesian Factor Model • Model
Bayesian Factor Model • Prior • This prior has been proposed by Ghosh and Dunson(2009)
Bayesian Factor Model • MCMC algorithm • Notation • Step 1. generate
Bayesian Factor Model • MCMC algorithm • Step 2. generate where and
Bayesian Factor Model • MCMC algorithm • Step 3. generate where and
Bayesian Factor Model • MCMC algorithm • Step 4. generate where and
Bayesian Factor Model • MCMC algorithm • Step 5. generate
Bayesian Factor Model • Convergence diagnostic. • 100000 iteration of MCMC algorithm after 10000 burn-in iteration • 1000 posterior samples obtained at every 100 iteration • Trace plots, ACF (Auto Correlation functions) and histograms of the three selected s and a selected (Note ).
Bayesian Factor Model • Convergence diagnostic: Three s • 100000 iteration of MCMC algorithm after 10000 burn-in iteration • 1000 posterior samples obtained at every 100 iteration • Trace plot, acf functions and histograms of the three selected s
Bayesian Factor Model • Convergence diagnostic: A • 100000 iteration of MCMC algorithm after 10000 burn-in iteration • 1000 posterior samples obtained at every 100 iteration • Trace plot, acf functions and histograms of the three selected s(25%, 50%, 75%) and
PCA • Scree plot
Visualizing Modes of Variation • Hard to Interpret • Scaling Issues? • Promising and Intuitive • Work in Progress … • Future goals • Improved Notion of PCA • Tune Bayes Approach for Better Interpretation • Integrate with Non-Neg. Matrix Factorization • ……..