Comparative methods: Using trees to study evolution

Comparative methods: Using trees to study evolution

Some uses for phylogenies • Character evolution • Ancestral states • Trends and biases • Correlations among characters • Molecular evolution • Evidence of selection • “Key innovations” • Diversification rate

Why reconstruct character evolution? • Can evaluate homology

How do we know that bat and bird wings are not homologous?

Why reconstruct character evolution? • Can evaluate homology • Can determine character-state polarity

Why reconstruct character evolution? • Can evaluate homology • Can determine character-state polarity • Can evaluate the “selective regime” when a character evolved

Bee to bird poll. Adaptation supported Was the ancestor bird pollinated when red flowers evolved?Look at pollinators

Bee to bird poll. Not an adaptation Alternative result

A third possibility Bee to bird poll. Consistent with adaptation

Why reconstruct character evolution? • Can evaluate homology • Can determine character-state polarity • Can evaluate the “selective regime” when a character evolved • Can recreate ancestral genes/proteins

Dinosaur Rhodopsin • Chang et al. (MBE 2002)

Character optimization using parsimony • Pick the reconstruction that minimizes the “cost” • What do you do if more than one most-parsimonious reconstruction • ACCTRAN/DELTRAN • Consider all • What character-state weights should you use?

Cost-change graph(Ree and Donoghue 1998: Syst. Biol. 47:582-588)

Stability to gain:loss weights

What gain:loss weight to use? • If you believe gains are more common (hence weighted less) you will find more gains (and vice versa) • So how can you use a tree to establish if there is a gain:loss bias?

Wing loss and re-evolution? • Whiting et al. (Nature 2003)

A likelihood approach • Developed (in parallel) by Mark Pagel and Brent Milligan in 1994 • Continuous time Markov model • Select the rate of gains (0->1) and rate of losses (1->0) that maximizes the likelihood of the data given a sample tree (and branch lengths)

Transition rate matrix To From

Logic • Calculate the likelihood of the data for a given value of q1 and q2 • Modify q1 and q2 to find a pair of values that maximizes the probability of the data

Probabilities summed across all possible ancestral states 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0

How much of the likelihood contributed by each state at each node

Are gain and loss rates different? • Likelihood ratio test • Model 1: gains and losses free to vary independently • Model 2: gains and losses equal • How many degrees of freedom?

Ree and Donoghue, 1999

The likelihood method • Provides a method for using the data to evaluate gain:loss bias • Takes account of branch lengths • Still sensitive to taxon sampling

Suppose this taxon contains 5000 species 1 1 0 0 0 1 1 1 0 0 Suggests that the rate of losses is low

Suppose this taxon contains 5000 species 1 1 0 0 0 1 1 1 0 0 Suggests that the rate of gains is low

After equalizing the number of species of each type

Correlated evolution • Look at pairs of traits (where one trait can be an environment) • Body size and range size • Warning coloration and gregariousness • Fleshy fruit and dioecy • Do these traits evolve non-independently?

Causes of non-independence • Developmental “connectedness” • Adaptation (Correlated evolution has been claimed to be the best evidence for evolution by natural selection)

Non-phylogenetic (“tip”) method • Count species • Do a chi-square test

Hypothetical tree Eyes g b g g b b Fur d d p p d p 150 100

Proposed solutions for discrete characters • Do a chi-square test of changes rather than tip-states (various approaches) - Ridley; Sillen-Tullberg • Use a Monte Carlo approach to ask if changes of the dependent variable are biased relative to expectations from changes placed on the tree at random - W. Maddison

Non-phylogenetic (“tip”) method

Maddison test Probability that this pattern or a more extreme pattern could arise without fruit type affecting seed number is ca. 8%.

Problems with the Maddison test • Requires one to define dependent and independent characters • Does not take account of branch-length • Very sensitive to inclusion/exclusion of species

Maximum likelihood approach(Pagel and Milligan)

Procedure • Estimate the set of rates in the q-matrix that maximize the likelihood of the data and calculate that likelihood • Constrain the matrix so that it represents independence (q12 = q34; q13 = q24; q21 = q43; q31 = q42) and repeat the calculation • Use a likelihood ratio test to evaluate significance

Issues to consider • Rejection of independence does not tell you what kind of non-independence you have • You need reasonable branch lengths • Sampling matters (if perhaps less than parsimony)

Comparative methods: Using trees to study evolution