160 likes | 295 Views
Towards Identifying Lateral Gene Transfer Events. L. Addario -Berry, M. Hallett , J. Lagergren Presented By: Jeff Mathew. Roadmap. Key terms τ -transfer problem H-moves and I-moves algorithm Tree generation for simulation Experimental results Conclusions and future work. LGT = HGT
E N D
Towards Identifying Lateral Gene Transfer Events L. Addario-Berry, M. Hallett, J. Lagergren Presented By: Jeff Mathew
Roadmap • Key terms • τ-transfer problem • H-moves and I-moves algorithm • Tree generation for simulation • Experimental results • Conclusions and future work
LGT = HGT Root of scenario tree must correspond to root of gene tree The scenario tree is connected and respects the direction of evolution implied by the arcs of T and S. Lateral transfer scenario
α-activity • An α-active scenario for a gene tree and species tree allows at most alpha copies of a gene to simultaneously exist in the genome of an ancestral taxon. • Authors focus on 1-active scenarios though intractability results have been proved earlier for α≥ 1.
τ-transfer problem • Input: Species tree S, gene tree T, integer τ • Output: A τ* lateral transfer scenario for S and T, τ* ≤τ • Intractability result • The decision version of the α-Active, τ-Transfer Problem (does there exist a α-active scenario with cost ≤ τ?) is NP-complete. • τ is the number of lateral transfer events needed to explain the difference between S and T
Algorithm • 2 Phase approach • Phase 1 • While H-fat or I-fat vertices remain • Perform H-fat move or I-fat move • At the end of phase 1, we are guaranteed that the scenario is 1-active. What about cycles? • Phase 2 • Remove minimum number of LGT events from each candidate to make it acyclic. • Running Time: 24τ n2
Simulating species trees • Create random species tree S on n-leaves. Θ(log n) expected depth • S is supposed to reflect the actual evolutionary relationships between taxa • S is ultrametric. Therefore, edge-weights correspond to time. • Randomly assign weights to every edge such that every root-to-leaf path has weighted sum 1.
Simulating gene trees • Begin with generated ultrametric species tree • Lateral transfer events occur according to a Poisson process with mean rate λ • Moving from root to leaves, for each vertex x0 with children x1 and x2, examine both edges • If the Poisson process provides us with a lateral transfer event along (x0, x1), we add it and point it to a randomly chosen edge alive at that point in time. • Else add a speciation event for x1 • Repeat the analysis for (x0, x2)
Degenerate Cases • Simulation can result in plausible biological events that are not detectable by the algorithm. • Useless transfers: LGTs that don’t change the gene tree • Transfer-loss events: One child of a node is a LGT event. Another child is a loss event.
Ω = number of repetitions • τ = true number of LGT events • τ‘ = minimum cost LGT scenario found by algorithm • λ = mean rate of LGTs from Poisson process Results
Finding the saturation point • The point when the average τ‘ stops increasing. • Random trees from a large pool were chosen as gene trees and species trees • Trials suggest that saturation point is slightly above n/2, i.e., when τ > n/2, the algorithms stops detecting new LGT events • Thus, if τ’> n/2, the correspondence between T and S via LGT events is not very meaningful.
Ω = number of repetitions • τ = true number of LGT events • τ‘ = minimum cost LGT scenario found by algorithm • λ = mean rate of LGTs from Poisson process Results
Ω = number of repetitions • τ = true number of LGT events • τ‘ = minimum cost LGT scenario found by algorithm • λ = mean rate of LGTs from Poisson process Results
Ω = number of repetitions • τ = true number of LGT events • τ‘ = minimum cost LGT scenario found by algorithm • λ = mean rate of LGTs from Poisson process Results
Conclusions • Empirically verified feasibility of the τ-transfer algorithm • Degenerate events such as transfer-loss events that result in over-estimates of transfers occur with low probability • Achieved near-optimal scenarios when λis low enough not to cause saturation • The cycle elimination phase of the algorithm is extremely rare in practice implying a O(22τ n2) running time.
Future work and open problems • Use weighted gene trees and species trees • Species trees are nearly ultra-metric while gene trees are not • Do fast algorithms exist when the input is a set of gene trees with no species tree? • Tractability on larger phylogenies • Can we consider gene duplication, lateral gene transfers, and other events simultaneously? • Can we use probabilistic models that assign likelihood events to various events and optimize over such models in a tractable manner?