240 likes | 394 Views
Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination. Author: Dan Gusfield Presentation by: C. Badri Narayanan . Agenda. Main Problem – Root-Unknown galled-tree problem Solving Optimal Root-Unknown Galled-Tree Problem.
E N D
Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination Author: Dan Gusfield Presentation by: C. Badri Narayanan
Agenda • Main Problem – Root-Unknown galled-tree problem • Solving Optimal Root-Unknown Galled-Tree Problem
Root-Unknown Galled-Tree problem Given a set of sequences (say, M), find a galled-tree with minimum number of recombinations, if one exists else output none Let’s see the approach previously taken
Points Considered in Theorem(s) • Only single-crossover recombinations are considered • The algorithm will be extended to multiple crossover recombinations Before seeing the approach let’s consider some definitions
Definition of Terms • Trivial Component: A node with no edges • Component (a.k.a. Connected/Non-Trivial Component): For any pair of nodes there is at least one path between those nodes • Reduced galled-tree: If no gall contains a character site from a trivial component
Previous Approaches – A Roadmap • To construct a galled-tree for M with known ancestral sequence (say, A) Focus on each non-trivial component separately from incompatibility graph For each component in the incompatibility graph, determine the site arrangement on a gall Connect the galls in a tree structure Place the sites from the trivial components
Difficulties for Unknown Ancestral Sequence • For any two sequences S & S’ (in M), the conflict and incompatibility graphs may be different • How do we know which (ancestral) sequence will allow a galled-tree
Optimal Galled-Tree • If a galled-tree that minimizes the number of recombinations over all galled-trees for a set of sequences (say, M) and over all choices of ancestral sequence then it is called “Optimal Galled-Tree” • The ancestral sequence of an optimal galled-tree is called an “optimal ancestral sequence”
Author’s Approach: Theorem on Galled Trees – Finding An Ancestral Sequence If there is a galled-tree for M with some ancestral sequence, then there is an optimal galled-tree for M where the (optimal) ancestral sequence is one of the sequences in M
Proof for the Theorem T – optimal galled-tree for M A – ancestral sequence for T Every gall must have at least three edges branching off of it
Proof continued…. Path P in T from root to some leaf z which doesn’t contain any recombination nodes • Zz – sequence labeling z where Zz is in M • Make Zz as the ancestral sequence & reverse the directions of all edges on path P
Main Problem contd.. • Each such reversal of edges changes the direction of mutation on edges • The reversal of edges don’t change > Labels on edges in T > Recombination node on a gall • The modified tree T’ also derives M
Main Problem contd.. • Ancestral sequence of T’ is Zz which is a member of M • T’ also contains same number of galls and hence T’ is also optimal • Running time is O(n2 m + n4) where n – number of sequences m – length of binary sequence
Solving Optimal Root-Unknown Galled-Tree Problem • M – can be derived on a galled-tree; T* -an optimal galled-tree for M • A* - an optimal ancestral sequence
Connecting galls of T* Assumptions Every node v on a gall Q in T* is incident with exactly one edge; The other end is off of Q (a.k.a. “off-edge”) Off-edge may be directed into or out of a node (say, x)
Connecting Galls of T* • Transform T* to T’ (conceptually) as follows • Node 00100 (say, x) is incident with 2 edges • A new edge (say, y) is introduced • Connect the 2 original edges (that were initially out of x) from y • T’ specifies how galls of T* are connected to each other but does not show the internal arrangement of the sites on any gall
Connecting Galls of T* • If x is root of T* then create a new root and connect it with an edge to x • Contract each gall Q in T* to a single node (say, q) and make all edges undirected
Algorithmic Construction of T’ • Find a family of splits SP(T) • C1 & C2 are obtained from the incompatibility graph • The leaf nodes for the tree (on the right side of the figure) are determined by the sites that have unique combination of characters
Extensions to Complex Biological Phenomena & Structured Recombination • Site-Arrangement algorithm for gall Q corresponding to component C • Let M(C ) be matrix M restricted to sites in C
Extensions to Complex Biological Phenomena & Structured Recombination • For each distinct sequence X in M(C ): • Let M(C, X) be M(C ) after removal of all rows with sequence X • If there is an undirected perfect phylogeny T(C) for M(C,X) where all sites on C are contained in one path whose end sequences can be recombined (with single-crossover) to create sequence X then output the pair (X, T(C ))
Extensions to Complex Biological Phenomena & Structured Recombination • Step 2 of above algorithm is modified for multiple-crossover recombination • To determine if X can be created by a multiple-crossover recombination of Su(C) and Sy(C), starting with Su(C) • Let Su(C) and Sy(C) denote two sequences
Extensions to Complex Biological Phenomena & Structured Recombination • Algorithm: • i = 1; Z = Su(C) • do{ • Find longest substring of Z starting at position i that matches a substring X starting at position i • If none, return no else • Set i to position past the right end of those matching substrings • If Z = Su(C) then set Z = Sy(C) else Z = Su(C) } • Return yes
Extensions to Complex Biological Phenomena & Structured Recombination The above algorithm produces a multiple-crossover galled-tree for M