1 / 23

Department of Computer Science University of Texas at Austin

A heuristic approach to Maximum Weighted Quartet Compatibility. Rezwana Reaz. Department of Computer Science University of Texas at Austin. Gene trees and species tree. Species tree – pattern of branching of species lineages via speciation.

nascha
Download Presentation

Department of Computer Science University of Texas at Austin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A heuristic approach to Maximum Weighted Quartet Compatibility RezwanaReaz Department of Computer Science University of Texas at Austin

  2. Gene trees and species tree • Species tree – pattern of branching of species lineages via speciation. • Gene tree – A phylogenetic tree that depicts how a singlegene has evolved in a group of related species.

  3. Discordance Species tree • Gene trees don’t necessarily show the same branching pattern as their containing species tree D C A B Gene tree

  4. Reasons of Discordance • Duplication and loss • Horizontal Gene Transfer • Incomplete Lineage Sorting/ Deep Coalescence

  5. Deep Coalescence • Gene copies fail to coalesce in the speciation point. Gene copies at a single locus extends deeper than the speciation events • Coalescence theory visualizes the process as if it operated backwards in time. Population size Generation Courtesy: ShamsuzzohaBayzid

  6. Discordance by Deep Coalescence D C B A Courtesy: ShamsuzzohaBayzid

  7. Two competing approaches gene 1gene 2 . . . gene k . . . Analyze separately . . . Summary Method Estimating Species Tress from Multiple Genes Species Concatenation Courtesy: Tandy Warnow

  8. Estimating Species Tress from Multiple Genes • Existing summary methods: • MP-EST • MRP • Greedy etc. • In this project, we have developed a new technique to estimate species tree from a set of estimated gene trees • using Quartet decomposition of the gene trees

  9. Motivation Anomalous Gene Tree (Degnan and Rosenberg, 2009) • Most likely gene tree topology is different from the species tree topology. DecomposeGene Trees into Quartets ? “when there are only four species, with one lineage sampled from each, the most likely unrootedgene tree topology has the sameunrooted topology as the species tree”. [Allman et al., 2011]

  10. Estimating Species Tree from True Gene Trees N genes True gene tree for gene 1 True gene tree for gene 2 True gene tree for gene N …. Quartet Decomposition …. Q1 Q2 QN Estimate species tree for every 4 species For every 4 species, take the most frequent gene tree topology as the species tree Q Combine unrooted 4-taxon species trees ST

  11. Estimating Species Tree from Estimated Gene Trees Phase 1: Generate Weighted Quartets N genes Bootstrap gene trees for gene 1 Bootstrap gene trees for gene 2 Bootstrap gene trees for gene N …. Quartet Decomposition Compute bootstrap support values …. Q1 Q2 QN (q11, b11), (q12, b12), .. (q21, b21), (q22, b22), .. (qN1, bN1), (qN2, bN2), .. …. Combine into one set & calculate weights (q1, w1), (q2, w2), (q3, w3), .. weight = average bootstrap support value over N genes.

  12. Estimating Species Tree from Estimated Gene Trees Phase 2:Supertree Construction • Problem: Maximum Weighted Quartet Compatibility (MQC) • Input: A set Q of quartets q1, q2, …,qk with positive weights, w1, w2, …, wkrespectively on a set of taxal . • Output: Tree T on the set of taxal so as to maximize the sum of the weights of the satisfied quartets on T. 1 1 2 3 3 2 4 2 3 4 4 5 5 2 4 5 4 We proposed a method WQFM (Weighted Quartet FM) 1 2 3 ST

  13. Experimental Dataset • 37-taxon Mammal Dataset • 200 genes and 500 bp • For each gene, 200 bootstrap replicate trees • Under moderate ILS Phase 1: Generate Weighted Quartets Phase 2:Supertree Construction using WQFM

  14. Results Other results are obtained from “Statistical Binning” project.

  15. Proposed Method Input :A set Q of quartets q1, q2, …,qkon a set of taxal with positive weights, w1, w2, …, wkrespectively . Output :Tree T on the set of taxal so as to maximize the sum of the weights of the satisfied quartets onT. A divide and conquer approach 1 2 1 2 3 4 q1 : ((1, 2), (3, 4)) Q : q2: ((1, 3), (2, 4)) q3: ((2, 3), (4, 5)) 3 2 3 4 4 5 q4 : ((1, 2), (5, 6)) q5 : ((2, 3), (4, 5)) Input Quartets q6 : ((1, 3), (5, 6)) Recursively subdivide P and Q P : { 1, 2, 3, 4, 5, 6} Set of Taxa

  16. Proposed Method A divide and conquer approach ((1, 2), (3, 4)) ((1, 2), (5, 6)) ((1, 3), (2, 4)) ((3, 4), (5, 6)) Q 3 4 ((2, 3), (4, 5)) ((1, 3), (5, 6)) P { 1, 2, 3, 4, 5, 6} ((1, 3), (2, X)) {4, 5, 6, } X P2 {1, 2, 3, } X P1 (( X, 4), (5, 6)) ((1, 2), (3, X)) {1, 2, } Y {3, X, } Y {4, X, } Y {5, 6, } Y 5 5 3 1 1 4 Y Y Y Y Y Y 6 6 X 2 2 X We call this method WQFM (Weighted Quartet FM)

  17. Future Work • To analyze the approach • For various simulated and biological dataset • Under different model conditions • -Varying amount of ILS • -Varying number of genes • -Varying sequence length

  18. Acknowledgement Dr. Shel Swenson -helping me with generating weighted quartets from a set of bootstrap gene trees. Md. ShamsuzzohaBayzid- for helpful discussions, suggestions and helping me with setting up the experimental pipeline.

  19. Thanks! Any Question

  20. Partition Score 1 3 1 1 1 2 4 5 2 5 3 5 5 4 4 6 6 6 3 2 3 3 2 4 q5 q2 q1 q6 q4 q3 satisfied violated deferred satisfied deferred Pa = { } 1 , 2 Pb = { } 3 , 6 , 4 , 5 Partition Score = Sum of weights of satisfied – Sum of weights of violated

  21. Gain 1 1 3 1 1 2 3 2 5 5 5 4 4 6 5 6 6 4 3 3 2 3 4 2 q1 q4 q2 q6 q5 q3 deferred deferred satisfied satisfied deferred satisfied Pa = { } 1 , 2 Pb = { } 3 , 6 , 4 , 5 Partition Score = Sum of weights of satisfied – Sum of weights of violated Gain (3) = Partition Score (after moving 3) – Partition Score (before moving 3)

  22. Bipartition Method MFM (Modified FM) Bipartition Algorithm Gain(1) =-1 Gain(2) =-1 Gain(3) = 2 Gain(4) =-1 Gain(5) =-3 Gain(6) =-2 Max Cumulative Gain = 2 3 3 1 1 4 4 2 4 5 2 5 6 6 Gain(1) =-4 Gain(2) =-2 Gain(4) = 0 Gain(5) =-4 Gain(6) =-4 Gain(6) =2 1 3 3 1 2 4 4 2 5 5 6 6 Gain(1) =-2 Gain(2) =-2 Gain(5) =-3 Gain(6) =-3 Gain(5) =-1 Gain(6) =-2 1 2 1 4 3 2 3 5 5 4 6 6 Gain(1) =-1 Gain(5) =-2 Gain(6) =-3 1 2 3 5 4 6

  23. Bipartition Method 4 1 5 2 6 3 Initial Partition for Next iteration Rollback Iterations continue until Maximum Cumulative Gain is Zero

More Related