300 likes | 314 Views
Investigating the impact of misspecified substitution models on protein sequence analysis in reconstructing evolutionary history. Study highlights distortions induced by model errors and varying site rates across genes, emphasizing the importance of correct model selection. The intricate interplay of co-evolution, lineage-specific variations, and heterotachy in RNA polymerase evolution is explored, shedding light on challenges and opportunities for accurate reconstruction.
E N D
Pete Lockhart Massey University Allan Wilson Centre, New Zealand
Can we reconstruct the evolutionary history of ancient divergences from analyses of protein sequences?
If the substitution model is misspecified it can:(1) reduce reconstruction accuracy(and not favour a particular topology)
Felsenstein (1978) B A C D D C A B out Hendy&Penny(1989)
If the substitution model is misspecified it can: (2) induce topological distortion
Biosynthesis of chlorophyll and bacteriochlorophyll PNAS (1996) 93, pp. 1930-1934
chlL bchL chlL ? bchX nifH
Asymmetrical rate variation (XTSRV) rRNA, EF-1, -tubulin, RPBI, actin e.g. Embly and Hirt (1998) Current Opinion in Genetics and Development 8, 624-629; Philippe et al. (2000) Proc R Soc Lond B 267, 1213-1221; Inagaki et al. (2004) MBE 1340-1349; Guo and Stiller (2005) MBE 22, 2166-2178
Eukaryotic RNA Polymerase II Evolution core functions co-evolution opportunistic interactions Guo and Stiller (2005) MBE 22, 2166-2178
“in different lineages, co-evolution of proteins canalizes the evolution of a protein in different directions” Lopez et al. 2002 MBE 19, 1-7 ..”some of the EF-1 auxillary functions may have been lost/weakened during the reductive evolution of microsporidia” Inagaki et al 2004 MBE 21, 1340-1349
Rates Across Sites (Uzzell and Corbin 1971) fast etc slow slow =20 =5 =1 =0.1 Yang (1994) =0.5
N3 An alternative model Fitch and Markowitch (1970) plant animal N4 ~ only 10% sites variable at any given time
S01 0 1 S10 covarion Tuffley and Steel (1998) Huelsenbeck (2002) S01 slow 0 1 S10 S01 off on fast 0 1 S10 S01 faster 1 0 S10 off on
covarion R1 S11 R2 R2 S11 S11 S11 S11 R3 S11 R4 Galtier (2001)
“the number of variable positions can be different between lineages (Germot and Philippe 1999), suggesting that a constant c is a limitation of the covarion model…” Lopez, Casane & Philippe (2002) Mol Biol Evol 19: 1-7
increased ratein B&C A B C D increased pvar in B&C A B C D A B C D A B C D Sys Biol (2005) 54, 948-951
B B D A on on A D C long branch attraction induced topological distortion C mixtures can also be used to simulate changing pvar D B A C
Simulation: 0-0.3 invariable sites switch on in B+C (TS98) • assume ancestral pvar = 0.2 • x = point where we increase pvar in B+C • simulate with seqgen-cov.exe http://www.liv.ac.uk/~matts/covarion.html • reconstruct with PAUP* (assuming simple model), report support for each of 3 unrooted trees • AB|CD, AD|BC, AC|BD D C B A 0.3 0.3 0.4 0.4 0.1 0.1 0.02 0.02
Summary • Lineage specific differences in structural and functional constraint will affect which sites vary and how many of them vary • Lineage specific changes in proportions of variable sites motivated the concept of heterotachy • Simulations suggest that a relatively small increase in the proportion of variable sites in non adjacent lineages is a problem for reconstruction accuracy
Ellen Nisbet • Chris Howe • Bill Martin • Nicole Gruenheit • Mike Steel • PLG organisers • Microsoft
slow fast Heterotachy fast slow Lopez et al. 2002 MBE 19, 1-7
Philippe et al. BMC Evolutionary Biology 2005, 5:50