280 likes | 391 Views
1. 2. Phy logenetic S ignal with I nduction and non- C ontradiction: the PhySIC method for building supertrees http:/atgc.lirmm.fr/SuperTree/PhySIC. Vincent Berry 1 , V. Ranwez 2 , A. Criscuolo 1,2 , P.-H. Fabre 2 , S. Guillemot 1 , C. Scornavacca 1,2 , E.J.P. Douzery 2
E N D
1 2 Phylogenetic Signal with Induction and non-Contradiction: the PhySIC method for building supertreeshttp:/atgc.lirmm.fr/SuperTree/PhySIC Vincent Berry1,V. Ranwez2,A. Criscuolo1,2, P.-H. Fabre2, S. Guillemot1, C. Scornavacca1,2, E.J.P. Douzery2 Funded by ACI IMPBIO & BIOSTIC LR CNRS - Université Montpellier 2 France
Introduction: use of supertrees Supertrees are useful for • producing well-resolved large phylogenies to provide a framework for broad comparative studies (Gittleman et al 2004) • Quantitative studies of input-tree congruence, identifying outlier taxa by tree-supertree distance measures (Willkinson et al 2004) • Exploring and identifying agreement and disagreement among sets of input trees. The aim is then to reveal conflicts rather than resolving them. Conflict are ultimately resolved from additional data or analyses (Willkinson et al 2001) • Identifying where limited overlap between the leaf sets of the input trees is an obstacle in their amalgamation, thereby guiding further research (Sanderson et al 1996, Arné et al 2007). PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Introduction : dealing with conflicts D C B A C B D A Dealing with topological contradictions (“conflicts”) among source trees : • Voting methods (MRP,MMC,CLANN,…) resolve conflicts based on a voting procedure (optimization approach) • Vetomethods (Strict Consensus, Build,SMAST): do not favor any resolution in case of conflict (consensus approach) PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Veto methods • Proceed from an axiomatic approach: proposed supertrees satisfy specified theoretical properties • Goal: obtain a reliable, if incomplete, picture of how the source trees fit together • Motivation: • Full congruence with the source trees can be necessary for further applications such as phylogeography, divergence time estimations, etc. • Avoid as much as possible the inference of non-supported novel clades, unlike in some existing voting methods PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Overview • Some relevant properties for reliable inference • Decomposition of a tree into triplets • Identifying a tree • Property of Induction (PI) • Property of non-Contradiction (PC) • Algorithms (sketch) • BUILD - Aho • PhySICPC • PhySICPI • Biological case study: Primate supertree • Conclusion & prospects PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Axiomatic approach: important properties Reliable factsare those that can beinducedfrom testimonies and that arenotincompatiblewith any other. PhySIC: Phylogenetic Signal with Induction and non-Contradiction
d c b a c d b e c b a d c b d b a d c a tr(T1) tr(T2) ed|c eb|d eb|c bd|c bc|d ac|d ab|d ab|c Decomposition of trees in building stones Triplets (rooted triples): subtrees on 3 taxa T2 T1 ac|d PhySIC: Phylogenetic Signal with Induction and non-Contradiction
d c b a c d b a c b a d c b • RidentifiesT iff • T displays R • AND every tree T’ displaying R contains all the clades of T X R identifies T R’ does not identify T ab|c ab|d Properties of interest: identification • A tree Tdisplays a set R of triplets • iff R tr(T) • In such a case R is said to be compatible :all triplets of R can be combined into a tree T bc|d ab|c PhySIC: Phylogenetic Signal with Induction and non-Contradiction
T d c b a ab|d and ac|d are induced Properties of interest: identification • RidentifiesT yet R does not contain all triples of tr(T): additional triples are induced by those present in R c b a d c b R bc|d ab|c PhySIC: Phylogenetic Signal with Induction and non-Contradiction
PI d c b a c b a d c b a R d c b a d b a ab|c ab|d ac|d? cd|b? ab|c ab|d ac|d? bc|d? ab|c ab|d Relevant properties: induction (PI) • We want to infer reliablesupertrees: not making arbitrary inferences we only accept supertrees T such that tr(T) is present in the dataR or induced by hypotheses in R PhySIC: Phylogenetic Signal with Induction and non-Contradiction
T R ab|c bc|d ab|d ac|d ad|c bd|c Supertree method ? R identifies T d c b a c d b a Focusing on a coherent subset of hypotheses • There is no chance that practical data exactly identifies a (super)tree: • Lack of overlap between the source trees: missing data • Errors due to gene specific evolution, systematic errors in the source tree inference (long branch attraction, estimated model of evolution) • However, there is a chance that part of the underlying “correct” tree appears uncorrupted in the data: find a subset R’ of R identifying a tree (ie, a subtree of the underlying tree) PhySIC: Phylogenetic Signal with Induction and non-Contradiction
PC dc b a T R’ identifies T Relevant properties: non-contradiction • We search for a subset of R identifying a tree T • But we want to be reliable: no clade contradicted by the data we don’t accept hypotheses that are in direct contradiction with discarded hypotheses we reject subsets R’ obtained by keeping xy|z and removing xz|y. R’ R ab|c ab|d bc|d ac|d bd|c ad|c We focus onR(T), the triplets of R resolved by T PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Link between the properties: • R(T) identifiesT is equivalent to • T satisfies PC: (property of non-contradiction) for any triplet ab|c displayed by T, R(T) induces neither bc|a nor ac|b and • T satisfies PI: (property of induction)every triplet ab|c displayed by T is induced by R(T) • Given a supertree T and a collection of source trees, PI and PC can be checked in polynomial time. • A given supertree can be modified in polynomial time so that it verifies PI and PC. • Why not designing a supertree method proposing supertrees satisfying PI and PC from the start :the PhySIC method (Phylogenetic Signal withInduction and non-Contradiction) PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Overview • Relevant properties for a veto method (reliable facts) • Decomposition of a tree into triplets • Tree identification • Property of Induction (PI) • Property of non-Contradiction (PC) • Algorithms (sketch) • BUILD - Aho • PhySICPC • PhySICPI • Biological case study: Primate supertree • Conclusion & prospects PhySIC: Phylogenetic Signal with Induction and non-Contradiction
d c b a c b a d c b a c a c a d b b b d {a,b,c} c {a,b} a b Algorithmic ideas: BUILD (Aho et al 81) R bc|dab|c PhySIC: Phylogenetic Signal with Induction and non-Contradiction
R2 a c bc|d bd|c ac|dad|c ab|c ab|d d b R1 a c ab|c ac|b bc|dab|dac|d d c b a c d b a d c b a d b c a d b d c b a d {a,b,c} a c b Algorithmic ideas: limits of BUILD • Returns a tree only when R is compatible. PhySIC: Phylogenetic Signal with Induction and non-Contradiction
a c d b a c d c b a c d b a d b c d b a Algorithmic ideas: PhySICPC R bc|dbd|c ac|d ad|c ab|c ab|d R’ bc|dbd|c ac|d ad|c ab|c ab|d Idea: temporarily forget the direct contradictions • At each iteration, if there is a single connected component • Check if using R’ leads to several connected components • If so, check that the tree will satisfy PC w.r.t. R. • Or else, propose a multifurcation on those taxa • We thus obtain a more resolved tree satisfying PC: contradictions affecting basal clades do not always imped deeper clades to be obtained PhySIC: Phylogenetic Signal with Induction and non-Contradiction
c b a c f e a b c e f {a,b} c {e,f} R a c ab|c ef|c f e b Algorithmic ideas:limits of BUILD (2) • When the graph contains several connected components, it is necessary tocheck that the triplets we are about to create are really induced by R • Branches that create triplets not induced by R are collapsed (use graph algorithms) ef|a ?? PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Algorithmic ideas - a summary • A supertree draft is proposed by PhySICPC ensuring PC • If a clade is not « strong enough » the corresponding branch is collapsed by PhySICPI ensuring also PI • Physic is a polynomial-time supertree method: • Decomposition of the input forest into triplets O(kn3) • Creation of a tree satisfying PCO(n4) • Collapsing edges displaying triplets not induced by the source trees:O(n4) the algorithm requires O(kn3+n4) computing time PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Overview • Relevant properties for a veto method • Decomposition of a tree into triplets • Tree identification • Property of Induction (PI) • Property of non-Contradiction (PC) • Algorithms (intuitive presentation) • BUILD Aho • PhySICPC • PhySICPI • Biological case study: Primate supertree • Conclusion & prospects PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Primate case study: source trees • ADRA2B and IRBP study (Poux et al. 04, 06) • SINEs (Roos et al. 04) • Branches with bootstrap support <50% are collapsed Anthropoids PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Primate case study: PC & PI in action Source trees Platyrrhines are unresolved due to a conflict (PC) Arbitrary resolution among Anthropoids is removed (PI) ADRA2B PhySICPC PhySIC IRBP PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Labels indicating source of problems • PhySIC can tell the reason for multifurcations proposed: • Lack of overlap or information in the source trees (i) • Local contradictions between the source trees (c) this guides correction/completion of source trees and primary data PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Pointing out “problems” in other supertrees • eg, MRP is known to have some indesirable features: • inferring “novel clades” not supported by any input tree (Bininda-Emonds & Bryant 98, Goloboff & Pol 01, Goloboff 05) • being affected by a size-bias, i.e. when two trees conflict on the resolution of a clade, the tree with the smallest local sampling is ignored(Purvis 95, Bininda-Emonds & Bryant 98, Goloboff 05) • favoring source tree that are more unbalanced(Wilkinson et al 01) • A supertree already built from a collection of source trees by an usual supertree method, can be reanalyzed in the light of PI & PC to identify problems on some dubious nodes. PhySIC: Phylogenetic Signal with Induction and non-Contradiction
PC 2 1 1 Primate case study: MRP tree analyzed Source trees MRP supertree filtered MRP supertree ADRA2B IRBP PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Online server: http://atgc.lirmm.fr/SuperTree/PhySIC Contact: Vincent.Lefort@lirmm.fr PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Conclusion & prospects appearing in the november issue of Syst.Biol. • PI and PC properties • PhySIC method(http://atgc.lirmm.fr/SuperTree/PhySIC) • Supertrees satisfying PI and PC (exact) and as much resolved as possible (heuristics) • Proposes very reliable supertrees: identified by the data (low type-I err) • Polynomial-time method • Localization of conflicts and areas with insufficient overlap • Enables to check/correct supertrees built by other methods (MRP, …). • Further developments: • Producing more resolved trees satisfying PC et PI • Filtering triplets based on their frequencies • Coupling with a database (TreeBase, …) PhySIC: Phylogenetic Signal with Induction and non-Contradiction
Thanks Emmanuel Douzery Vincent Ranwez Alexis Criscuolo Sylvain Guillemot Pierre-Henri Fabre Celine Scornavacca Vincent Lefort Equipe Méth. et Algor. pour la bioinf. LIRMM Equipe Phylogénie Moléculaire ISEM PhySIC: Phylogenetic Signal with Induction and non-Contradiction