1 / 38

Constructing a level-2 phylogenetic network from a dense set of input triplets

Constructing a level-2 phylogenetic network from a dense set of input triplets. Leo van Iersel 1 , Judith Keijsper 1 , Steven Kelk 2 , Leen Stougie 12 (1) Technische Universiteit Eindhoven (TU/e) (2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam Email: S.M.Kelk@cwi.nl

zinnia
Download Presentation

Constructing a level-2 phylogenetic network from a dense set of input triplets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel1, Judith Keijsper1, Steven Kelk2, Leen Stougie12 (1) Technische Universiteit Eindhoven (TU/e) (2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam Email: S.M.Kelk@cwi.nl Web: http://homepages.cwi.nl/~kelk

  2. Triplet-based methods (1) w y x z w y x z w y x z Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor solution algorithm w z x y

  3. Triplet-based methods (2) w y x y x z w y z Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor solution z w x algorithm w z x y

  4. Triplet-based methods (2) w y x z w y Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.) Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor solution z w x algorithm x y z w z x y

  5. From trees to networks… • The algorithm of Aho et al. (1981) can be used to construct trees from rooted triplets. • But…what if the algorithm fails? Why might the algorithm fail? • Possible reason 1: The underlying evolution is tree-like, but the input triplets contain errors. • Possible reason 2: The triplets are correct, but the underlying evolution is not tree-like. Biological phenomena such as hybridization, horizontal gene transfer, recombination and gene duplication can lead to evolutionary scenarios that are not tree-like! • Response: try and construct not phylogenetic trees, but phylogenetic networks

  6. From trees to networks (2) x y z x z y • For example, suppose the input is {xy|z, xz|y}. z y x (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

  7. From trees to networks (2) x z y • For example, suppose the input is {xy|z, xz|y}. x y z z y x (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

  8. From trees to networks (2) x y z • For example, suppose the input is {xy|z, xz|y}. z y x z y x (Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

  9. Level-k phylogenetic networks root (only one!) A level-k phylogenetic network is a rooted, directed acyclic graph where every biconnected component (in the underlying undirected graph) contains at most k recombination vertices. split-vertex z y x leaf- vertex recombination-vertex

  10. Level-1 Networks • A set of input triplets is dense iff, for every subset of 3 species, there is at least one triplet corresponding to those 3 species. • Therefore, a dense set of input triplets for n species contains O(n3) triplets. • Jansson & Sung (2006) showed: Given a dense set of triplets T for a set L of species, it is possible to determine in polynomial-time whether a level-1 phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.) • They later showed, together with Nguyen, how to do this in time linear in |T|. They also showed that, in the non-dense case, the problem is NP-hard. • But what about level-2 networks, and higher?

  11. Here is an example of a level-2 network. Main result: Given a dense set of triplets T for a set L of species, it is possible to determine in time O(|T|3) whether a level-2 phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.)

  12. Algorithm, basic idea • The basic idea behind Aho’s algorithm for trees is that we are able to determine, recursively, which species belong to which of the two subtrees hanging from some root vertex. • For the level-1 and level-2 networks if there again exists such a clear dichotomy, we iterate on the two subsets. root Sub- network Sub- network

  13. Algorithm, basic idea • The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex. • For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form Sub-network Sub-network Sub-network Sub-network Sub-network

  14. Algorithm, basic idea • The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex. • For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form Find the partition of the species (leaves) into the subnetworks Find the blue backbone network Treat each of the partition elements (sub-networks) as leaves to be hanged on the backbone Recurse on the subnetworks Sub-network Sub-network Sub-network Sub-network Sub-network

  15. Algorithm, high-level idea • For level-2 networks the idea is similar: Find the partition of the species (leaves) into the subnetworks There is a complication in level-2 Find the blue backbone network! There are more level-2 backbone forms Treat each of the partition elements (sub-networks) as (meta-)leaves to be hanged on the backbone Recurse on the subnetworks Sub-network Sub-network Sub-network Sub-network Sub-network

  16. Definition: inducing new triplet sets from partitions of the leaf set • Suppose I have a partition P = {P1, P2, …, Pt} of the leaf set L. • Suppose I have a dense set of triplets T on the leaf set L. • Let T’ be a new triplet set on leaf set {q1, q2,…, qt} defined as follows: • qiqj|qk is in T’ if and only if i≠j≠k and there exists a triplet xy|z in T such that x is in Pi, y is in Pj and z is in Pk • Then we say that T’ is the triplet set induced by the partition P of L. • Critically: if T is dense, then T’ is also dense. • In some sense this can be perceived as a ‘coarsening’ of the input set.

  17. Definition: simple level-2 networks Lemma: There are exactly 4 different backbone networks A simple level-2 network is any network obtained by “hanging leaves” off one of the above structures.

  18. A picture description of the simple level-2 algorithm Here the leaves {a,b,c,d,e,f,g,h} have been ‘hung’ from structure 8a, to yield a simple level-2 network.

  19. Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub-networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm • Guess the right “recombination leaf” • Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar

  20. Suppose we can correctly ‘guess’ that leaf g hangs directly below a recombination node If we remove g, and all triplets that contain g, then we know that a level-1 network must be possible on this new set of triplets (because now fewer recombination nodes are needed)

  21. Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub-networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm • Guess the right “recombination leaf” • Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar • Guess the right “caterpillar set”

  22. Caterpillar set • A caterpillarset with respect to a dense triplet set T is the set of leaves of a caterpillar subgraph of a network consistent with T Caterpillar The empty set is also a caterpillar set

  23. Suppose we subsequently guess that the caterpillar with h now hangs below a recombination node in the new network. If we remove the h-caterpillar, and all triplets that contain leaves of it, then we know that a level-0 network must be possible on this new set of triplets (because now even fewer recombination nodes are needed.)

  24. Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub-networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm • Guess the right “recombination leaf” • Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar • Guess the right “caterpillar set” • Remove it and remove the triplets that contain any element of this set • Construct the unique tree for the remaining triplets [Jansson&Sung 2006]

  25. In such a case the resulting tree is UNIQUE (J&S).

  26. So now we have a tree. We are going to guess how to add the h-caterpillar back in, and then guess how to add leaf g back in.

  27. Adding the h-caterpillar back in.

  28. And finally adding leaf g back in. g

  29. Level-2 network algorithm Assume some oracle gives us the partition of the leaves into sub-networks Treat each subnetwork as a leaf and construct a simple level-2 network The simple level-2 network algorithm • Guess the right “recombination leaf” • Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar • Guess the right “caterpillar set” • Remove it and remove the triplets that contain any element of this set • Construct the unique tree for the remaining triplets [Jansson&Sung 2006] • Insert the caterpillar set and the recombination leaf in the tree in the correct way For each pair of guesses try all 4 backbone structures

  30. Simple level-2 algorithm Theorem: The simple level-2 network algorithm works in O(|T|^3)

  31. SN-sets to partition the set of leaves • Jansson & Sung introduced the SN-set to partition the set of leaves • SN-sets are special subsets of the leaves L, and are defined w.r.t. T • All sets containing just a single leaf, are SN-sets. • Any other SN-set is any subset of leaves obtained by taking the closure of some subset S of the leaves L w.r.t. the following operation If x,y є S and xz|y є T or yz|x є T then zє S The SN-set that is equal to the total leaf set L, is called the trivial SN-set. An SN-set that is non-trivial, and is not a strict subset of any other non-trivial SN-set, is called a maximal SN-set. • (If the network is a tree there are 2 maximal SN-sets: one the set of leaves of the subtree right and the other the set of leaves of the subtree left of the root)

  32. Definition: maximal SN-set • Jansson and Sung proved that the set of maximal SN-sets indeed partition the leaf set L. So no two maximal SN-sets overlap, and they completely cover the set of input leaves. • All SN-sets and all maximal SN-sets can be found in polynomial-time. • Jansson & Sung solved the level-1 problem by observing that each maximal SN-sets hangs as a ‘meta-leaf’ on the level-1 backbone network; each maximal SN-set can completely be separated from the rest of the network by removing just one edge • There are maximal SN-sets in level-2 networks that can hang under more than one edge!!!!

  33. Definition highest cut-edge • In a phylogenetic network N, a cut-edge (x,y) is an edge whose removal disconnects the undirected graph. • A cut-edge (x,y) is said to be a trivial cut edge iff y is a leaf. • A cut-edge (x,y) is said to be highest iff there is no cut-edge (p,q) such that there is a directed path from q to x in N.

  34. Fact. Let (x,y) be a highest cut-edge and let L’ be the set of leaves reachable from y. Let L* be a strict subset of L’. Then L* is not a maximal SN-set. • Proof: the set of leaves reachable from a highest cut-edge (x,y), is itself an SN-set. Clearly for any two leaves p,q in L’ and leaf r outside L’ there cannot be triplets pr|q and qr|p: the edge (x,y) forms a bottleneck. Thus pq|r must exist. x y p r q L’ p q r So: each maximal SN-set can be expressed as the union of the leaves reachable by one or more highest cut-edges.

  35. Central Theorem (simplified). Suppose there is a dense triplet set T consistent with some simple level-2 network N. Then there exists a level-2 network N’ (not necessarily simple) such that, with the exception of perhaps one maximal SN-set with respect to T, every maximal SN-set appears below a single cut-edge in N’. The remaining, ‘odd-one-out’ maximal SN-set (if it exists) will be equal to the union of leaves below two cut-edges. In other words: there exists at most one maximal SN-set which is the union of the leaves below two highest cut-edges, whereas all other SN-sets consist of the leaves below one highest cut-edge

  36. The algorithm • Determine the maximal SN-sets • Guess the right SN-set to be split • Treat the max SN-sets and the two split sets as leaves {S1,S2,…,Sq} • Adapt T to a new triplet set T’: SiSk|Sh є T’ if and only if there exist xєSi, yєSk,zєShs.t. xy|z є T • Construct a simple level-2 network for T’ • Recursively find the sub-networks for the sets S1,S2,…,Sq

  37. Conclusions & open problems • So we know how to efficiently construct level-2 networks from dense triplet sets. What’s next? • Applicability: how useful is it? • Initial implementation: programming and fine-tuning • Improving running time: in the spirit of the “SN-tree” of J&S&N • Complexity: what about level-3 and higher? • Bounds: worst-case, best-case scenarios • Building all networks • Properties of output networks as function of input • Different triplet restrictions • Confidence: how good are the solutions? • Exponential-time exact algorithms for NP-hard problems

More Related