1 / 31

How to See a Tree for a Forest? Combining Phylogenetic Trees – Reasons, Methods, and Consequences

How to See a Tree for a Forest? Combining Phylogenetic Trees – Reasons, Methods, and Consequences. Tanya Y. Berger-Wolf DIMACS and UIC.

kare
Download Presentation

How to See a Tree for a Forest? Combining Phylogenetic Trees – Reasons, Methods, and Consequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to See a Tree for a Forest?Combining Phylogenetic Trees – Reasons, Methods, and Consequences Tanya Y. Berger-Wolf DIMACS and UIC The affinities of all the beings of the same class have sometimes been represented by a great tree… As buds give rise by growth to fresh buds, and these if vigorous, branch out and overtop on all sides many a feeble branch, so by generation I believe it has been with the great Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever branching and beautiful ramifications. Charles Darwin, 1859

  2. Phylogeny Reconstruction Orangutan Gorilla Chimpanzee Human

  3. Phylogeny Reconstruction Process • Get an estimate of evolutionary distance between species • Treat the species as a set of points with pairwise distance measure • Find a tree that optimizes{parsimony, likelihood, function of your choice}on that set of points

  4. Phylogeny Reconstruction Problems • Get an estimate of evolutionary distance between species • DNA not sufficient for deep evolution and too simple • Genomes are better but no good distance measures • Other types of data are subjective and no good models • Constraints on possible topologies • Treat the species as a set of points with pairwise distance measure • Species are sampled not at the same level and frequency so some points are “more equal than others” • Large datasets: efficient storage, query, and representation • Find a tree that optimizes{parsimony, likelihood, function of your choice}on that set of points

  5. Computational Pitfalls • Resulting optimization problems are hard • No good bounds • Existing heuristics expensive on large datasets • Same score – many topologies • True tree is unknown ⇓ When to stop and what to return?

  6. A A B C + C B D D E E A B = C D E Consensus Methods Consensus is what many people say in chorus but do not believe as individuals Abba Eban (1915 - 2002), Israeli diplomat In "The New Yorker," 23 Apr 1990

  7. A A A B B B C C C D D D E E E A B C D E Consensus Methods: StrictMcMorris et al. (83) AB CD ABCDABCDE AB ABC DEABCDE BCD ABCDABCDE Strict: contains clades common to all trees

  8. A A B B C C D D E E AB CD ABCD AB ABC DE BCD ABCD A B C D E Consensus Methods: MajorityMargush & McMorris (81), McMorris et al. (83), Barthelemy & McMorris (86) A B C D E AB CD ABCDABCDE AB ABC DEABCDE BCD ABCDABCDE Majority: contains clades common to majority

  9. Stopping Maximum Parsimony(joint work with T.Williams, B.M.E.Moret, U.Roshan, T.Warnow) If return Majority Consensus of the top scoring trees how early can we stop without changing the outcome? What stopping criteria?

  10. Majority consensus ofbest and second bestso far Majority consensus ofoptimal trees (PAUP*) Output consensus tree Experiment Design ATTCGGAAGCGATAGCTGAATCGATCGATCGTATTACGTTAGCTAGTATGCAGCGGAG Biological dataset Run parsimony ratchet (PAUP*)500 iterations, 5 repetitionsSave the tree at each iteration … Optimal - best scoring treesin all repetitions

  11. Results

  12. Results

  13. C(SC) = C(Ti) C(SCi) = C(Tj) = C(Tj) C(Ti) = C(SCi-1)C(Ti) i-1 k i    i=1 j=1 j=1 Online Consensus: Strict Running time for a new tree - θ (n) and is optimal

  14. c єC(M) if and only if |C(Ti) s.t. c є C(Ti)| > k — C(Mi) C(Mi-1) C(Ti) ∩ 2 ∩ — Online Consensus: Majority • Maintain the set of clades so far with counters • Update counters for the previous majority and the new tree • Use good implementation of a dictionary data structure (Amenta et al, 2003) Running time for a new tree - θ (n) and is optimal

  15. Conclusions • No need to work hard to get good enough trees? • Work to get “good” (?) trees, not optimal • Stopping criteria • Consensus is not the best representation. What else? • This is a wide open research area

  16. Using a Different Path:Heterogeneous Data(joint work with Tandy Warnow)

  17. Heterogeneous Data Molecular data: DNA and genomes

  18. Heterogeneous Data Paleontological, morphological, geographical, historical data

  19. Data As Constraints Constraints, not distance! • Positive: these species are together(phylogenetic trees, presence of a morphological character) • Negative: these species are not together (above + geography, fossils) • Temporal: these events happened in this order (fossils, history) • Frequency: this even happens more often than another (adaptation mechanisms)

  20. A A B B C C D D E E AB CD ABCD AB ABC DE A A A A B B B B C C C C D D D D E E E E Consensus Methods: Greedy A B C D E AB CD ABCDABCDE AB ABC DEABCDE BCD ABCDABCDE Greedy: resolves majority by adding compatible clades

  21. A A B B C C D D E E ABC AB CD ABCD ABCDE AB CD ABCD ABCDE AB ABCD AB ABC ABCD ABCDE CD BCD DE Consensus Methods: AMTPhillips & Warnow (95) A B C D E AB CD ABCDABCDE AB ABC DEABCDE BCD ABCDABCDE Asymmetric Median Tree: maximum (weighted) collection of compatible clades

  22. Consensus of Positive Constraints Formalize constraint, go through existing consensus methods, see if satisfies or can be extended Partially from Steel et al. 2000

  23. Consensus of Negative Constraints • a and b are separated by C • C is closer to a than b – same as positive

  24. More Conclusions • Existing methods are insufficient • (Consensus with respect to temporal, frequency constraints) • Developing new methods that preserve 4 types of constraints • Network phylogeny • Error measure and evaluation of quality • This is a wide open research area

  25. Thank you Work was supported by the National Science Foundation postdoctoral fellowship grant EIA 02-03584 • "A little inaccuracy sometimes saves a ton of explanation." • - H. H. Munro (Saki) (1870-1916) • "The significant problems we face cannot be solved at the same level of thinking we were at when we created them." • - Albert Einstein (1879-1955)

  26. Controlled Breeding(joint work with Cris Moore and Jared Saia) Given an initial population of animals design a mating strategy that achieves a breeding goal (within shortest time)

  27. Controlled Breeding: Background • Conservation Biology and Agriculture • Breeding strategies: designed and evaluated empirically or using stochastic time-step modeling • Empirical evaluation – too slow! • Stochastic modeling – mathematically and biologically inappropriate. • Classic algorithm design problem

  28. Breeding All Possible Animals Givenk binary strings of length nDesign an algorithm that Produces all possible strings With the smallest expected # matings Greedy: mate two animals with the highest probability of producing new Upper bound: 2.32•2n

  29. Breeding a Target Animal Givenk strings of length nDesign an algorithm that Produces a target string With the smallest expected # matings Alg 1: breed for one trait at a time O(n lg n) Alg 2: breed the animals closest to the targetO(n2)

  30. Algorithm: One Trait at a Time AddOneTrait (11…100...0, 00…010…0) x = 11…100…0 y = 00…010…0 While (y has < i+1 ones) do Mate x and y twice y = string with 1 in bit (i+1) Return y • The Algorithm (e1,e2,…,en) • x = e1 • For x = 2..n do • x = AddOneTrait(x,ei)

  31. More Realistic Breeding • Gender • Variable probability of outcome • Deaths • Minimize number of generations • Goal: maximum diversity • On-line: maintain the distribution

More Related