170 likes | 263 Views
Clustering of Phylogenetic Trees by Clique Partitioning. Walker Pett Divya Mistry. Motivation. Problems with classical methods Produce a single consensus tree based on some scoring metric Poorly resolved or hard to interpret Problem with current methods (e.g. K-means)
E N D
Clustering of Phylogenetic Trees by Clique Partitioning Walker Pett DivyaMistry
Motivation • Problems with classical methods • Produce a single consensus tree based on some scoring metric • Poorly resolved or hard to interpret • Problem with current methods (e.g. K-means) • Produces multiple trees, but relies on classical single-tree approach
Our Work • Graph-theory approach (Clique Partitioning) • Avoid creating single-tree consensus • Reduce set of trees to their non-trivial bipartitions • Use bipartitions to create compatibility graph • Find all the cliques to identify possible consensus trees • Provide the minimal set of these consensus trees that represent the whole input • Clique Partitioning is NP-Hard
Notations & Definitions • Phylogenetictree • Leaf labeled tree. All internal nodes of degree 3 • Bipartitions of a tree T • Tree T on taxaS(s1,s2,…,sn) • Removing an edge gives two subtrees • Set of leaves of these subtrees create a bipartition A|B • A tree is uniquely defined by its set of bipartitions S(T) a b c d A={a , b} B={c , d}
Compatible Bipartitions • Bipartitions are in S(T) for some tree T • A|B and C|D are compatible iff one of AC, AD, BC, BD is empty (Hamel and Steel, 1996) • Trivial Bipartition • For a bipartition A|B, |A|1 or |B|1 • All trivial bipartitions are compatible with any other bipartition. • Non-trivial Bipartition • All bipartitions that aren’t trivial, are considered non-trivial
Set of non-trivial bipartitions of trees • Given set of trees R = T1,T2,…,Tm • unique non-trivial bipartitions of R • Compatibility graph • Graph with BP(R) as vertex set • Edge exists for compatible vertices • Incompatibility graph • Complement of compatibility graph
Clique • Clique Partitioning Problem (CPP) • Partitioning compatibility graph into minimal number of cliques • Minimal Graph Coloring Problem (Minimal GCP) • Using the least number of colors, assign a color to each vertex s.t. no two adjacent vertices have same color • We transform CPP to Minimal coloring through incompatibility graph
Incompatibility graph with minimum coloring Compatibility graph
Goal • Use Minimal coloring to solve CPP using incompatibility graph. • Use this transformation to reduce set R to minimal set of consensus trees C such that BP(R) = BP(C)
Algorithm • Establish an order to the vertices to solve graph coloring • Greedy heuristics • Largest Degree Ordering (LDO) • Order by number of vertices adjacent to a vertex • Saturation Degree Ordering (SDO) • Order by number of differently colored adjacent vertices • Combine LDO/SDO for best results (Al-Omari and Sabri, 2006)
Find all non-trivial bipartitions of tree set R • Can be done in O(n) for one tree. • Traversal of 2n-1 internal nodes. Each time, save the bipartition set. • For R = T1,…,Tm finding BP(R) would take O(mn) • Construct incompatibility graph • Compare each bipartition with every other • O(|BP(R)|2), so O(m2n2) • Coloring of incompatibility graph • O(|BP(R)|3), so O(m3n3) (Bhaskar and Samad, 2006)
Results • Compare quality of our results by comparing percentage of bipartitions accounted for by the best consensus tree from each method • MR = majority rule single-tree consensus method. MR produces a tree from all bipartitions that are present in at least 50% of all trees. • Camp, Caesal, PEVCCA1, PEVCCA2 have been used by Weeks et.al. (2001), Moret et.al. (2001), Cosner et.al. (2000), and Van de Peer et.al. (1999) to evaluate tree clustering methods.
Discussion • For all four datasets, our method produced consensus tree that is always more resolved than the popular Majority Rule consensus • Tree produced from our method is identical to the one produced by Extended Majority Rule • Majority Rule consensus with adding bipartitions compatible with the MR consensus tree • Additionally, our method produced tree that had information present in original input but absent in single-tree consensus method
Trees produced using our method is at least as informative as single-tree consensus, and usually more informative considering previous point • Our method is similar to Multipolar Consensus method proposed by Bonnard et.al. (2006); however, we improve by choosing more sophisticated coloring heuristic.
Conclusion • Efficient method for computing minimal consensus of a set of trees • Compatibility graph using unique non-trivial bipartitions • Minimal partition of this graph into cliques • CPP solved by transforming it to GCP and then employing polynomial time greedy heuristic • Our method produces a set with fewest trees that retain all of the info in original input set • Better than single-tree consensus • Interest to biologist: • visualize the result of phylogenetic analysis in simplest and most informative way.
Acknowledgement • Dr. Oliver Eulenstein for support and guidance in introducing the problem and directing to some earlier work in CPP.