140 likes | 221 Views
Incorporating uncertainty in distance-matrix phylogenetics. Wally Gilks Leeds University Tom Nye Newcastle University Pietro Li ò Cambridge University. Isaac Newton Institute December 17, 2007. Distance-based methods. Larger trees Faster algorithms Less model-dependent
E N D
Incorporating uncertainty in distance-matrix phylogenetics Wally Gilks Leeds University Tom Nye Newcastle University Pietro Liò Cambridge University Isaac Newton Institute December 17, 2007
Distance-based methods • Larger trees • Faster algorithms • Less model-dependent • Genome-scale evolutionary rearrangements
Agglomerative distance methods • NJ(Saitou and Nei, 1987) • BioNJ(Gascuel, 1997) • Weighbor(Bruno et al, 2000) • MVR(Gascuel, 2000) • FastME(Desper and Gascuel, 2004)
A B C Variance models • Independent distances • Ordinary Least Squares (OLS) • Weighted Least Squares (WLS) • NJ, Weighbor, FastME • Correlated distances • shared evolutionary paths (Chakraborty, 1977) • computed from shared sequences: BioNJ • induced by estimation process (we show) • Generalised Least Squares (GLS) • Hasegawa (1985), Bulmer (1991),MVR A
Two types of tree Ultrametric time tree Non-ultrametric divergence tree Time (mya) Divergence = “true distance” = integrated rate of evolution = path length Divergence 0 more evolution
Which tree type to assume? • Ultrametric tree makes stronger assumptions • Different methods for estimating each type • But both types are in principle correct! • Our method coherently integrates both types • Produces rooted tree, no need for outgroup
An agglomerative stage time tree divergence tree Time (mya) Divergence E C E A C A 0 D B D B
Divergence additivity divergence tree and for X = C,D,… E C A D B
parameters mean zero Distances are estimated divergences Regression model divergence tree and for X = C,D,… E C A D B
time tree Time (mya) E C A parameter 0 D B mean zero uncorrelated Divergences are distorted times Random effects model
controls noise function of clade A structure clade A size shared node A elapsed time Chakraborty (1977) Nei et al (1985) Bulmer (1991) controls distortion variance parameters Variance assumptions
Estimation • Time tree and divergence tree are estimated simultaneously • by GLS (Hasegawa, 1985; Bulmer, 1991) • Choose most recent agglomeration always • Estimated divergences become the distances for the next stage • Variance formula accommodates estimation-induced correlations
Notes • Can estimate variance parameters s2 and n • Computationally efficient algorithm • same time-complexity as BioNJ • we call it StatTree
Simulations 16 taxa, unbalanced topology, 100 simulations