Distances

Distances Correction for multiple changes

Distance Calculations • A 5 page excerpt from the book “Molecular Systematics” is on the course web page, as a PDF file • DNA distances are more easily analyzed • Only 4 letter alphabet • More directly affected by mutation

Underlying mechanisms • Jukes-Cantor model • Assumes all substitutions are equally probable • Uncorrected distance D = 1-proportion unchanged • Corrected distance = • Felsenstein,

Tree Searching Using Optimality Criteria Maximum Parsimony and Maximum Likelihood Methods

Searching for the Best Tree • A score for each tree can be evaluated using an objective function that uses the multiple alignment as a fixed parameter and varies the tree topology and branch lengths • Goal is to find the tree with the optimum score, which is defined as the “best” tree

Computational Problem • for small ntaxa, can evaluate the score for all trees and then pick the tree that gives the best score • as ntaxa increases full evaluation rapidly becomes impossible, number of trees and complexity of calculation for each tree both increase, so heuristics must be used

Distance Based • Can try to minimize the total tree length (Minimum Evolution = ME) by varying the internal branch lengths • This is a calculation that has to be performed for each tree topology, it is not an algorithm for constructing the tree

Maximum Parsimony • character based, not distance based • for a given tree all the character states of each homologous character can be reconstructed with some minimum number of changes on any given tree • if you sum the number of changes over all characters, you get tree length

you want to find the tree with the lowest score. This is called a Maximum Parsimony tree because it is based on the idea that the explanation that requires the fewest changes is the best • no analytical approach for this process, so you need algorithm that will a) evaluate tree length as fast as possible and b) search the tree-space with a high likelihood of evaluating the shortest tree

Three Options • Exhaustive - simply evaluates the length of every tree, therefore guaranteed to find the shortest tree(s) • Branch and Bound - searches tree space, but stops constructing a family of trees once length exceeds a pre-existing minimum, guaranteed to find shortest tree

Heuristic - constructs an approximately shortest tree, then does a series or rearrangements, evaluating length in each case, selecting the shortest tree from among the rearrangements, and iterating until a shorter tree is not found • usually works well, but certain data sets will give an incorrect answer

Informative Sites • sites at which at least two character states appear at least twice • reason - single appearance of any character state is most parsimoniously explained as a change at the end of the graph

Example • consider a four taxon set of data, three possible trees, one character • Taxon 1 = G • Taxon 2 = A • Taxon 3 = A • Taxon 4 = G

Work Through on Board

Homoplasy • when you are considering more than one character, they may not all be consistent with the same tree • principle of maximum parsimony says that you pick the tree with the lowest number of homoplasies, multiple independent origins of a character state

Add to Worked Example

More Complex Trees • there is an algorithm for finding the lowest score attributable to any distribution of character states on any bifurcating tree • trace back from terminal taxa to each node, define the nodal state as the intersection set of the two descendants, unless the intersection is null, in which case, define as the union

each time a union is required, that adds to the score, because one descendant of the union must have changed • Repeat the process going from scored nodes to unscored nodes • for each tree, perform the same analysis for all characters and sum the scores; that number is the tree score • the tree with the shortest score is most parsimonious

Heuristic Search • For exhaustive search or branch and bound the search algorithm covers all possible trees • For heuristic search need to define a non-exhaustive search algorithm • Most commonly used is tree bisection-reconnection (TBR)

Can bisect any tree at any of the branches, creating two sub-trees, then reconnect by joining any pair of branches from each tree • If all the trees that are generated by a cycle of TBR are not shorter than the parent tree, then the parent tree is accepted as the shortest tree • If one of the TBR-generated trees is shorter, then it is taken as the next candidate shortest tree, and is in turn subjected to a round of TBR analysis

Distances

Distances

Presentation Transcript

Distances

Distances

Cosmic Distances

Measuring Distances

Cosmic Distances

Great Distances

Astronomical distances

Cosmological Distances

Astronomical distances

Finding distances

Distances

Sight Distances

Edit Distances

Distances

Astronomical Distances

Distances...

Astronomical Distances

Astronomical Distances