230 likes | 381 Views
The (Supertree) of Life: Procedures, Problems, and Prospects. Presented by Usman Roshan. Supertree Methods. Input: Set of trees Output: Tree leaf-labeled by where is the set of leaves of . Why supertree methods?.
E N D
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan
Supertree Methods • Input: Set of trees • Output: Tree leaf-labeled by where is the set of leaves of . • Why supertree methods?
Motivation (1) • Supertree methods are used as part of divide-and-conquer method to solve NP-hard problems on large datasets
Motivation (2) • Supertree methods are used when we have missing data
Types of supertree methods (1) • Direct methods (e.g. strict consensus supertrees, MinCutSupertrees)
Types of supertree methods (2) • Indirect methods (e.g. MRP, average consensus)
Definitions • Contraction: • Restriction: • If then contains
Optimization problems • Subtree Compatibility: Given set of trees ,does there exist tree ,such that, (we say contains ). • NP-hard (Steel 1992) • Special cases are poly-time (rooted trees, DCM) • MRP: also NP-hard
Limitations of supertree methods Three desirable properties: • P1: Method can be applied to any unordered set of input trees • P2: Renaming the species does not change the constructed supertree • P3: If the input trees are compatible, then the output tree is one of the “parent trees”. There is no supertree method that can satisfy P1-P3 when the input trees are unrooted; however, for rooted trees an extension of BUILD satisfies P1-P3.
Rooted subtrees (BUILD)(Aho et al 1981) • Input: Set of rooted trees • Output: Tree that contains
BUILD (2) - Definitions • Cluster: Set of taxa in a rooted subtree • A different representation of rooted phylogenetic trees • Let C(T) be the clusters of tree T. In this example C(T) = {{1,2}, {3,4}, {1,2,3,4},{1,2,3,4,5}} • We write (IJ)K in T, if I,J are in some cluster of T which doesn’t contain J; e.g. (12)3, (34)5 are in T
BUILD (3) - Algorithm • Initialize C as set of input taxa • If |C|=1 return C, else compute graph • Let C’ be the sets of taxa in the connected components of G. If |C’| = 1 then is incompatible, else set C = C C’, and repeat step (2) on each new cluster in C’.
Compatible source trees • For compatible source trees, MRP or BUILD can be used; however, the strict consensus of MRP trees (or the strict consensus supertree) may not be compatible with the input. • BUILD has been extended to output all parent trees; also shown that source trees have a unique parent tree iff BUILD constructs a binary tree.
Incompatible source trees (1) For incompatible source trees two strategies: • Resolve incompatibilities by using quartet methods or removing troublesome taxa. • Use an appropriate algorithm such as MRP or MinCutSupertrees; the latter is an extension of BUILD so that it always outputs a tree.
Incompatible source trees (2) Desirable property • P1: If at least one tree contains (IJ)K and no source tree contains (IK)J or (JK)L, then the output tree must contain (IJ)K No method can satisfy P1; however, the condition: if all source trees contain (IJ)K then output must contain (IJ)K can be satisfied.
Supertree criticism • Do not take biomolecular sequences into account • Dataset non-independence • MRP: Favors larger source trees because they contribute more characters; may also favor unbalanced source trees • Direct methods: Cannot incorporate support values in the source trees (except for MinCutSupertrees), and cannot compute support values in the supertree (unlike MRP)
Applications of supertrees • Systematics – MRP is the standard method used by biologists • Evolutionary models • Rates of cladogenesis • Evolutionary patterns • Biodiversity and conservation
Bright future for supertree construction • Despite increase in phylogenetic data, species are poorly characterizes at the molecular level; thus, giving rise to problems from taxon sampling (non-random sampling), long branch attraction, and missing data • ML analysis: Genes evolve under different models • Non-molecular data