90 likes | 346 Views
Maximum Parsimony (MP) Algorithm. MP Algorithm. Character-based algorithm – does not use distances, but utilizes the character information in sequences A criticism of distance-based methods is that they do not exploit the structure of the sequences (collapse them to a number – the distance)
E N D
Maximum Parsimony (MP) Algorithm
MP Algorithm • Character-based algorithm – does not use distances, but utilizes the character information in sequences • A criticism of distance-based methods is that they do not exploit the structure of the sequences (collapse them to a number – the distance) • Main philosophy is “economy of substitutions” – find the tree that requires the fewest mutations (maximum parsimony)
MP Algorithm • The strategy • explore a number of possible trees • report the tree with smallest score (most parsimonious) • Need to be able to solve two problems • small parsimony problem -- given a candidate tree compute its parsimony score • large parsimony problem -- generate efficiently viable candidate trees (cannot generate all – tree explosion)
Small Parsimony Problem • Given a candidate tree, compute its parsimony score • Consider a candidate tree for one-site sequences • s1 = A s2 = T s3 = T s4 = G s5 = A A G T T Final Score = 3 A T A G A T T G A
Solving Small Parsimony Problem • explore the tree bottom-up (from leaves to interior) • for each internal node one level up • if the “labels” at the two child nodes have no symbols in common assign as label at this node the sum of both labels • penalize the tree one unit • if the “labels” at the two child nodes do have • symbols in common, label with common portion • no penalty A G C A G C G G T A G
Solving Small Parsimony Problem • For n-site sequences run the algorithm in parallel for each site and add up the parsimony scores for all sites • Consider a candidate tree for the following sequences • s1 = ATC s2 = ACC s3 = GTA s4 = GCA C T A T A G A C T Final Score = 4 T C A C ATC ACC GTA GCA
Solving Large Parsimony Problem • Generate efficiently viable candidate trees (cannot try all) • Branch-and-bound approach • create a possible tree by some method; calculate its score • start building a tree from scratch; discarding trees that cost more than current best
Solving Large Parsimony Problem • Branch-and-bound approach http://artedi.ebc.uu.se/course/X3-2004/Phylogeny/Phylogeny-TreeSearch/Phylogeny-Search.html
MP Summary • Character-based algorithm – uses the sequence data • Produces unrooted trees • Economy of substitution – best tree is one that requires fewest number of substitutions • Examines a number of possible trees in search for best one