1 / 23

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks. Luay Nakhleh Department of Computer Sciences UT Austin. Who’s Involved. UT CS : Tandy Warnow, Luay Nakhleh UT BIO : Randy Linder UNM CS : Bernard Moret. Why Networks?. Lateral gene transfer (LGT)

waite
Download Presentation

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin

  2. Who’s Involved • UT CS: Tandy Warnow, Luay Nakhleh • UT BIO: Randy Linder • UNM CS: Bernard Moret

  3. Why Networks? • Lateral gene transfer (LGT) • Ochman estimated that 755 of 4,288 ORF’s in E.coli were from at least 234 LGT events • Hybridization • Estimates that as many as 30% of all plant lineages are the products of hybridization • Fish • Some frogs

  4. Phylogenetic Networks • Rooted, directed, acyclic graphs that actually model the evolutionary process • “tree” nodes and “network” nodes • Time constraints

  5. Separate Analysis • Analyze individual genes separately • Reconcile the resulting phylogenies • As opposed to combined analysis in which the datasets are combined (via concatenation) and the combined dataset is then analyzed

  6. SPR Distances Among Gene Trees A B C D E SPR Distance 1 A B C D E A B C D E

  7. Maddison’s Method Given two gene datasets • Construct two gene trees T1 and T2 • If SPR(T1,T2)=0 • Return a tree • If SPR(T1,T2)=1 • Return a network with one reticulation event Open problem: extend to reconstructing a network with m reticulation events

  8. Challenges (1) Computational • Computing SPR distances is of unknown computational complexity (probably hard)

  9. Solving the Computational Challenge • Galled-networks: reticulation events are independent • For two gene trees T1 and T2 on n leaves we can • Decide whether SPR(T1,T2)=m in O(mn) time, and • Construct network N from T1 and T2 in O(mn) time

  10. Challenges (2) Systematic • Obtaining the correct gene trees in practice is very hard (due to missing data, inaccuracy of tree reconstruction methods, wrong assumptions, etc.)

  11. Solving the Systematic Challenge: Our MethodSpNet Given the sequences of two genes I & II on a set of species • Run MP or ML on gene I and obtain a set U1 of trees, represented by its consensus tree t1 • Run MP or ML on gene II and obtain a set U2 of trees, represented by its consensus tree t2 • Find binary trees T1 and T2, that refine t1 and t2, respectively, and such that SPR(T1,T2)=1 • Build network N from T1 and T2

  12. SpNet: Running Time • We have a linear-time algorithm for the single hybrid case (implementation and experimental results are available as well) • We are working on the general case of arbitrary number of reticulation events

  13. Experimental Study • Generated random networks on 10 and 20 taxa, with 0, 1, and 2 hybrids • Evolved sequences under the GTR+Gamma model of evolution with invariant sites • We studies the topological accuracy based on the splits defined by the model and inferred network

  14. Evaluation Criteria • Detection Quality • How often did the method infer the correct number of hybrids in the model phylogeny? • Reconstruction Quality • What is the topological accuracy of the inferred phylogeny?

  15. Methods • SpNet(i): Our method where we contract i edges • NNet: The method of Bryant and Moulton • NJ

  16. Detection Quality of SpNetModel Phylogeny: 20-taxon Tree

  17. Detection Quality of SpNetModel Phylogeny: 20-taxon 1-hybrid network

  18. Detection Quality of SpNetModel Phylogeny: 20-taxon 2-hybrid network

  19. Reconstruction QualityModel Phylogeny: 20-taxon tree

  20. Reconstruction QualityModel Phylogeny: 20-taxon 1-hybrid network

  21. Reconstruction QualityModel Phylogeny: 20-taxon 1-hybrid network

  22. Conclusions • Considering a set of “good” trees rather than a single optimal tree is advantageous in network reconstruction • Separate analysis approaches outperform combined analysis approaches

  23. Ongoing research • Using other techniques for obtaining unresolved trees (e.g., Bayesian analyses, bootstrapping, etc.) • Detection vs. reconstruction – visualization and clustering techniques may also be useful (collaboration with St John) • Refining unresolved networks • DCM-like network reconstruction

More Related