1 / 29

Modified Mincut Supertrees

Modified Mincut Supertrees. Roderic Page University of Glasgow. Tree of Life. About 1.7 million species described. What we have so far: TreeBASE database (15,000 taxa) Ribosomal Database Project (RDP II) (20,000 sequences) The Tree of Life Project (11,000 taxa).

zalman
Download Presentation

Modified Mincut Supertrees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modified Mincut Supertrees Roderic Page University of Glasgow

  2. Tree of Life About 1.7 million species described. What we have so far: • TreeBASE database (15,000 taxa) • Ribosomal Database Project (RDP II) (20,000 sequences) • The Tree of Life Project (11,000 taxa)

  3. Recent interest in the Tree of Life NSF sponsored “Tree of Life” workshops (2000-2001) $US 10 million “to construct a phylogeny for the 1.7 million described species of Life” announced February 15th 2002 Assembling the Tree of Life: Science, Relevance, and Challenges AMNH, New York, May 2002 European initiative (ATOL) under FP6

  4. Problem: how to build the tree of life Solutions: • Find one or more “magic markers” that will allow us to recover the whole tree in one go (problems: combinability and complexity) • Assemble big tree from many smaller trees derived from many kinds of data (supertrees)

  5. Tree terminology d a b c leaf { a,b } edge internal node cluster { a,b,c } root { a,b,c,d }

  6. Nestings and triplets d a b c Nestings {a,b} <T {a,b,c,d} {b,c} <T {a,b,c,d} Triplets (bc)d bc|d

  7. Supertree d a b c a b c b c d = + T T 1 2 supertree

  8. Some desirable properties of a supertree method(Steel et al., 2000) • The supertree can be computed in polynomial time • A grouping in one or more trees that is not contradicted by any other tree occurs in the supertree

  9. 1 2 3 MRP (Matrix Representation Parsimony) Homo sapiens 1 1 1 Pan paniscus 1 1 1 Gorilla gorilla 1 1 0 Pongo pygmaeus 1 0 0 Hylobates 0 0 0 3 2 1 • NP-hard • Can generate many solutions

  10. Aho et al.’s algorithm (OneTree) Aho, A. V., Sagiv, Y., Syzmanski, T. G., and Ullman, J. D. 1981. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput. 10: 405-421. Input: set of rooted trees 1. If set is compatible (i.e., will agree on a tree), output that tree. 2. If set is not compatible, stop!

  11. a b a a b b a, b c a, b, c, d a, b, c d d c c a b c b c d Aho et al.’s OneTree algorithm T T 1 2 supertree

  12. Mincut supertrees Semple, C., and Steel, M. 2000. A supertree method for rooted trees. Discrete Appl. Math. 105: 147-158. • Modifies OneTree by cutting graph • Requires rooted trees (no analogue of OneTree for unrooted trees) • Recursive • Polynomial time

  13. b a c e d a b c d e a b c d T T 1 2 S { T , T } 1 2 Semple and Steel (2000)

  14. Collapsing the graph(Semple and Steel mincut algorithm) This edge has maximum weight b a,b 2 1 1 c a c 1 1 1 e d e d 1 1 max S S / E { T , T } { T , T } { T , T } 1 2 1 2 1 2

  15. Cut the graph to get supertree a,b a b c d e 1 c 1 e d 1 max S / E { T , T } { T , T } 1 2 1 2 supertree

  16. My mincut supertree implementationdarwin.zoology.gla.ac.uk/~rpage/supertree • Written in C++ • Uses GTL (Graph Template Library) to handle graphs (formerly a free alternative to LEDA) • Finds all mincuts of a graph faster than Semple and Steel’s algorithm

  17. A counter example: two input trees... a c b b a c y 1 x 1 y 2 x 2 y 3 x y 3 4

  18. Mincut gives this (strange) result • Disputed relationships among a, b, and c are resolved • x1, x2, and x3 collapsed into polytomy c x 1 x 2 x 3 b a y 1 y 2 y 3 y 4

  19. S { T , T } 1 2 Problem:Cuts depend on connectivity(in this example it is a function of tree size) y4 x3 y1 x2 b y2 x1 y3 c a

  20. So, mincut doesn’t work • But, Semple and Steel said it did • My program seems to work • Argh!!! What is happening….?

  21. What mincut does… …and does not do • Mincut supertree is guaranteed to include any nesting which occurs in all input trees • Makes no claims about nestings which occur in only some of the trees • “Does exactly what it says on the tin™”

  22. Modifying mincut supertree • Can we incorporate more of the information in the input trees? • Three categories of information • Unanimous (all trees have that grouping) • Contradicted (trees explicitly disagree) • Uncontradicted (some trees have information that no other tree disagrees with)

  23. Uncontradicted informationassume we have k input trees a and b co-occur in a tree a and b nested in a tree n c a b a b c - n = 0  uncontradicted (if c = k then unanimous) c - n > 0  contradicted

  24. Uncontradicted informationassume we have k input trees a and b in a fan a and b co-occur in a tree a and b nested in a tree f n c a b a b a b c - n -f = 0  uncontradicted (if c = k then unanimous) c - n - f > 0  contradicted

  25. Classifying edges S { T , T } 1 2 y x 1 1 y y 1 2 x x y 2 1 2 y y x 3 4 2 x 3 b y b 4 y x 3 3 c a a c Uncontradicted Uncontradicted but adjacent to contradicted Contradicted

  26. Modified mincut • Species a, b, and c form a polytomy • x1, x2, and x3 resolved as per the input tree modified mincut a b c x 1 x 2 x 3 y 1 y 2 y 3 y 4

  27. 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 If no tree contradicts an item of information, is that information always in the supertree? (23)5 (12)5 (45)1 (34)1

  28. 1 2 3 4 5 No!Steel, Dress, & Böcker 2000 • The four trees display (12)5, (23)5, (34)1, and (45)1 • No tree displays (IK)J or (JK)I for any (IJ)K above • Triplets are uncontradicted, but cannot form a tree

  29. Future directions • Improve handling of uncontradicted information • Add support for constraints • Visualising very big trees • Better integration into phylogeny databases (www.treebase.org) darwin.zoology.gla.ac.uk/~rpage/supertree

More Related