140 likes | 227 Views
Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07. How many characters are needed to reconstruct the true tree?. Mareike Fischer and Mike Steel. The Problem. Given: Sequence of characters (e.g. DNA) Wanted: Reconstruction of the ‘true’ tree
E N D
Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07 • How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Mareike Fischer
The Problem • Given: Sequence of characters (e.g. DNA) • Wanted: Reconstruction of the ‘true’ tree • Solution: Maximum Parsimony, Maximum Likelihood, etc. • But: Is the sequence long enough for a reliable • reconstruction? Mareike Fischer
Previous Approaches • Churchill, von Haeseler, Navidi (1992) • 4 taxa scenario • Observations: • The probability of reconstructing the true tree increases with the length of the interior edge. • “Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.” more characters Rec. Prob. int. edge Mareike Fischer
Previous Approaches • 2. Yang (1998) • 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length • 5 different tree-shapes were investigated • Observations: The optimal length for the interior edge ranges between 0.015 and 0.025. ‘Farris Zone’: MP better • Rec. Prob. ‘Felsenstein Zone’: ML better • Tree length Mareike Fischer
Our Approach • Limitation: Most previous approaches are based on simulations. • Our approach: Mathematical analysis of influence of branch lengths on tree reconstruction. • We investigate MP first and consider other methods afterwards. Mareike Fischer
Already known Steel and Székely (2002): Here, the number k of characters needed to reconstruct the true tree grows at rate . y y x y y But what happens if we fix the ratio (y:=px), and then take the value of x that minimizes k? Mareike Fischer
Our Approach • Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, • 2-state symmetric model. px px x px px Mareike Fischer
Main Result For ‘reliable’ MP reconstruction: k grows at least at rate p2 For the optimal value of x, k grows at rate p2 Mareike Fischer
Idea of Proof: 1. Applying the CLT Note that the true tree T1 will be favored over T2if and only if Zk>0. Set Xi i.i.d., and . Then (by CLT) Mareike Fischer
Idea of Proof: 2. The Hadamard Representation Since the Xiare i.i.d., μk and σk depend only on k and the probabilities P(X1=1) and P(X1=-1). These probabilities can using the ‘Hadamard Representation’: can be used Thus, for fixed p, the ratio to find a value of x that minimizes k. (Here, θ=e-2x.) Note that P(X1=1) and P(X1=-1) only depend on x and p. Mareike Fischer
Summary and Extension • For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p2. • Can other methods do better (e.g. rate p)? • No! [Can be shown using the ‘Hellinger distance’.] Mareike Fischer
Outlook • Questions for future work: • What happens when you approach the ‘Felsenstein Zone’? • What happens in general with different tree shapes or more taxa? Mareike Fischer
Thanks… • … to my supervisor Mike Steel, • … to the Newton Institute for • organizing this great conference, • … to the Allan Wilson Centre • for financing my research, • … to YOU for listening or at least waking up early enough to read this message . Mareike Fischer
The only true tree… • … is a Christmas tree . • Merry Christmas! • (And it does not even require reconstruction!) Mareike Fischer