1 / 22

BNFO 602 Phylogenetics

BNFO 602 Phylogenetics. Usman Roshan. Summary of last time. Models of evolution Distance based tree reconstruction Neighbor joining UPGMA. Why phylogenetics?. Study of evolution Origin and migration of humans Origin and spead of disease Many applications in comparative bioinformatics

bao
Download Presentation

BNFO 602 Phylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BNFO 602 Phylogenetics Usman Roshan

  2. Summary of last time • Models of evolution • Distance based tree reconstruction • Neighbor joining • UPGMA

  3. Why phylogenetics? • Study of evolution • Origin and migration of humans • Origin and spead of disease • Many applications in comparative bioinformatics • Sequence alignment • Motif detection (phylogenetic motifs, evolutionary trace, phylogenetic footprinting) • Correlated mutation (useful for structural contact prediction) • Protein interaction • Gene networks • Vaccine devlopment • And many more…

  4. Maximum Parsimony • Character based method • NP-hard (reduction to the Steiner tree problem) • Widely-used in phylogenetics • Slower than NJ but more accurate • Faster than ML • Assumes i.i.d.

  5. Maximum Parsimony • Input: Set S of n aligned sequences of length k • Output: A phylogenetic tree T • leaf-labeled by sequences in S • additional sequences of length k labeling the internal nodes of T such that is minimized.

  6. Maximum parsimony (example) • Input: Four sequences • ACT • ACA • GTT • GTA • Question: which of the three trees has the best MP scores?

  7. Maximum Parsimony ACT ACT ACA GTA GTT GTT ACA GTA GTA ACA ACT GTT

  8. Maximum Parsimony ACT ACT ACA GTA GTT GTA ACA ACT 2 1 1 3 3 2 GTT GTT ACA GTA MP score = 7 MP score = 5 GTA ACA ACA GTA 2 1 1 ACT GTT MP score = 4 Optimal MP tree

  9. Optimal labeling can be computed in linear time O(nk) GTA ACA ACA GTA 2 1 1 ACT GTT MP score = 4 Finding the optimal MP tree is NP-hard Maximum Parsimony: computational complexity

  10. Local optimum Cost Global optimum Phylogenetic trees Local search strategies

  11. Local search for MP • Determine a candidate solution s • While s is not a local minimum • Find a neighbor s’ of s such that MP(s’)<MP(s) • If found set s=s’ • Else return s and exit • Time complexity: unknown---could take forever or end quickly depending on starting tree and local move • Need to specify how to construct starting tree and local move

  12. Starting tree for MP • Random phylogeny---O(n) time • Greedy-MP

  13. Greedy-MP Greedy-MP takes O(n^2k^2) time

  14. For each edge we get two different topologies Neighborhood size is 2n-6 Local moves for MP: NNI

  15. Neighborhood size is quadratic in number of taxa Computing the minimum number of SPR moves between two rooted phylogenies is NP-hard Local moves for MP: SPR

  16. Local moves for MP: TBR • Neighborhood size is cubic in number of taxa • Computing the minimum number of TBR moves between two rooted phylogenies is NP-hard

  17. Local optima is a problem

  18. Iterated local search: escape local optima by perturbation Local search Local optimum

  19. Iterated local search: escape local optima by perturbation Local search Local optimum Perturbation Output of perturbation

  20. Iterated local search: escape local optima by perturbation Local search Local optimum Perturbation Local search Output of perturbation

  21. ILS for MP • Ratchet • Iterative-DCM3 • TNT

More Related