230 likes | 426 Views
Constructing evolutionary trees from rooted triples. Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University. An evolutionary tree. A rooted tree Each leaf represents one species. Internal nodes are unlabelled. (inferred common ancestors). a. b. c. d. e. f.
E N D
Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University
An evolutionary tree • A rooted tree • Each leaf represents one species. • Internal nodes are unlabelled. (inferred common ancestors) a b c d e f
A (rooted) triple (triplet) • An evolutionary tree of 3 species. • A constraint in an evolutionary tree construction problem. • (c(ab)): lca(b,c)=lca(c,a)lca(a,b)lca : lowest common ancestor : “is an ancestor of “ • a,b should be closer than a,c or b,c. a b c
A tree compatible with triples • Given a set of triples, construct a tree satisfying all the triples. • If such a tree exists, the problem is polynomial time solvable. [Aho et al, 1981]
Two conflicting triples Three conflicting triples (pairwise compatible) Incompatible (conflicting) triples
Two optimization problems • The maximum consensus tree: • the tree satisfying maximum number of triples. • NP-hard [Jansson, 2001][Wu, to appear] • A new heuristic algorithm [this paper] • The maximum compatible set: • The compatible species subset of maximum cardinality. • NP-hard [this paper]
Previous heuristicBest-One-Split-First • If a species x is split from a set V, all triples (x(v1v2)), v1 and v2 in V, will be satisfied. • Repeatedly split one species from the set. Choose the split species greedily.
{a,b,d} c c is split {a,d} b c b is split a d b c c is chosen, and the two triples is satisfied.
Previous heuristicMin-Cut-Split-First • Construct an auxiliary graph: • Vertex: species • Each edge is labeled by a set: for each triple (x(yz)), x is in the label set of edge (y,z).
a min-cut, triple (c(bd)) is conflicting • A bipartition corresponds to a split in the tree. • The label in the cut of the bipartition corresponds to the triples conflicting the split. • Repeatedly find the bipartition with minimum cut.
Previous heuristicBest-Pair-Merge-First • Instead of top-down splitting, BPMF uses the bottom-up merging strategy. • Starting from sets of singleton, we repeatedly merge the sets step by step. • Scoring functions are used to evaluate which pair should be merged in each step.
{a} {b} {c} {d} {a,d} {b} {c} b a a b a d d c c d {a,d} {b,c} {a,d,b,c}
An exact algorithm for MCTT • Dynamic programming • F(V)=max{F(V1)+F(V2)+W(V1,V2)}, taken among all bipartition (V1,V2) of V. • F(V): # of satisfied triples over V. • W(V1,V2): # of (x(v1v2) for x not in V and v1, v2 in V1, V2 respectively. • Computed with cardinality from small to large.
Our new heuristic algorithm (DPWP) • Derived from the exact algorithm. • The number of subsets of each cardinality is limited by a parameter K. • When K=infinity, it is just the exact algorithm. • Time-quality trade-off. • The time complexity is O(n2k2(n3+k)). • Sorry, there is a mistake in the paper.
The MCST problem • Given triples over species set S, find a subset U of S such that all given triples over U is compatible and |U| is maximum. • We show the problem is NP-hard. • Transformed from the Feedback Vertex Set problem.
The feedback vertex set problem • Feedback vertex set: a vertex subset containing at one vertex of each cycle of the given directed graph. • In other words, removing a feedback vertex set results in an acyclic digraph.
Concluding remarks • What is the approximation ratio? • The Best-One-Split-First algorithm is a 3-approximation algorithm, • The larger K give us better solution, but we do not know the theoretic bound of the ratio.