Spherical Cows Grazing in Flatland: Constraints to Selection and Adaptation

Spherical Cows Grazing in Flatland: Constraints to Selection and Adaptation Mark Blows University of Queensland Bruce Walsh (jbwalsh@u.arizona.edu) University of Arizona

Geometry and Biology Geometry has a long and important history in biology Fisher's (1918) original orthogonal variance decomposition D'Arcy Thompson (1917) On Growth and Form Fisher's (1930) geometric model for the adaptation of new mutations Wright (1932)-Simpson (1944) concept of a phenotypic adaptive topography Lande-Arnold (1983) estimation of quadratic fitness surfaces

A “spherical cow” -- an overly-simplified representation of a complex geometric structure When considering adaptation, the appropriate geometry of the multivariate phenotype (and the resulting vector of breeding values) needs to be used, otherwise we are left with a misleading view of both selection and adaptation.

R. A. Fisher Geometric models for the adapativeness of new mutations One of the first considerations of the role of geometry in evolution is Fisher’s work on the probability that a new mutation is adaptive (has higher fitness than the wildtype from which it is derived) Fisher (1930) suggested that the number of independent traits under selection has important consequences for adaptation Fisher used a fairly simple geometric argument to make this point

The (2-D) geometry behind Fisher’s model d = distance between z and q Phenotype of mutant Optimal (highest) Fitness value in phenotypic space q z r wildtype is here d New phenotypes for a random mutation that are a (random) distance r from the wildtype Fitness contour for wildtype The probability the new mutation is adaptive is simply the fraction of the arc of the circle inside of the fitness contour of the starting phenotype. Function of r, d, and n

p r n x = 2 d Z 1 1 2 p p = exp ( ° y = 2) dy = 1 ° er f ( x ) f av 2 º where x Fisher asked if we have a mutation that randomly moves a distance r from the current position, what is the chance that an advantageous mutation (increased fitness) occurs. If there are n traits under selection, Fisher showed that this probability is given by Note that p decreases as x increases. Thus, increasing n results in a lower chance of an adapative mutation

0.5 p µ ∂ r n p = 1 ° erf ( x ) = 1 ° erf f av 0.4 2 d 0.3 Prob(Adaptive mutation) 0.2 0.1 0.0 2.5 3.0 0 0.5 1.0 1.5 2.0 r n1/2 / [2d]

Extension’s of Fisher’s model M. Kimura A. Orr S. Rice

d p r ' 1 : 85 ¢ opt n Orr showed that the optimal mutation size was x ~ 0.925, or Kimura and Orr offered an important extension of Fisher’s model: Fisher simply consider the probability that the mutation was favorable The more relevant issue is the chance that the new mutation is fixed. Favorable mutations might be rarer, but have higher probability of fixation. For example, as r -> 0, Prob(Favor) -> 0.5, but s -> 0, and probability (fixation) -> neutral value (1/2N)

Orr further showed that there is a considerable cost to complexity (dimensions of selection n) with the rate of adaptation (favorable mutation rate times fixation probability) declining significantly faster that 1/n. Thus, the constraint on dimensionality may be much more severe than originally suggested by Fisher.

Two spherical cow assumptions! Phenotype of mutant q z r Equal (and spherical) fitness contours for all traits d Equal (and spherical) distribution of mutational effects Fitness contour for wildtype Fisher’s model makes simplifying geometric assumptions

Rice significantly relaxes the assumption of a spherical fitness surface around a single optimal value The probability of adaptation on these surfaces depends upon their ``effective curvature'', roughly the harmonic mean of the individual curvatures. Recalling that the harmonic mean is dominated by small values, it follows that the probability of adaptation is likewise dominated by those fitness surfaces with low curvature (weak selection). However, on such surfaces, s is small, and hence the fixation probability small.

Multivariate Phenotypes and Selection Response Now let’s move from the geometry of adaptive mutations to the evolution of a vector of traits, a multivariate phenotype For univariate traits, the classic breeders’ equation R= h2 S relates the within-generation change S in mean phenotype to the between-generation change R (the response to selection)

R = G Ø ° 1 R = G P S ° 1 Ø = P S R = Var(A) Var-1(P) S Defining the selection gradient b by Russ Lande The Multivariate Breeders’ Equation Lande (1979) extended the univariate breeders’ equation R = h2 S to the response for a Vector R of traits yields the Lande Equation

Since S is the vector of covariances and P the covariance matrix for z, it follows that ° 1 Ø = P S is the vector of regression coefficients for predicting fitness w given phenotypes zi, e.g., n X w = a + Ø z + e i i i i =1 The selection gradient b Robertson & Price showed that S = Cov(w,z), so that the selection differential S is the covariance between (relative) fitness and phenotypic value

G, b, and selective constraints A non-zero bi means that selection is acting directly to change the mean of trait i. The selection gradient b measures the direction that selection is trying to move to population mean to maximally improve fitness Multiplying b by G results in a rotation (moving away from the optimal direction) as well as a scaling (reducing the response). Thus, G imposes constraints in the selection response,

Thus G and b both describe something about the geometry of selection The vector b is the optimal direction to move to maximally increase fitness The covariance matrix G of breeding values describes the space of potential constraints on achieving this optimal response Treating this multivariate problem as a series of univariate responses is incredibly misleading

Edwin Abbott Abbott, writing as A Square, 1884 The problems working with a lower- dimensional projection from a higher-dimensional space

The misleading univariate world of selection For a single trait, we can express the breeders’ equation as R = Var(A)* b. Consider two traits, z1 and z2, both heritable and both under direct selection Suppose b1 = 2, b2 =-1, Var(A1) = 10, Var(A2) = 40 One would thus expect that each trait would respond to selection, with Ri = Var(Ai)* bi

µ ∂ µ ∂ µ ∂ 10 0 2 20 R = G Ø = = 0 40 ° 1 ° 40 µ ∂ µ ∂ µ ∂ 10 20 2 0 R = G Ø = = 20 40 ° 1 0 What is the actual response? Not enough information to tell --- need Var(A1, A2). However, with a different covariance,

The notion of multivariate constraints is not new Dickerson (1955) -- genetic variation in all of the components of a selection index, but no (additive) variation in the index itself. Lande also noted the possibility of constraints There can be both phenotypic and genetic constraints Singularity of P: Selection cannot independently act on all components Singularity of G: Certain combinations of traits show no additive variance

T x y cos ( q ) = jj x jj jj y jj If the covariance matrix is not singular, how can we best quantify its constraints (if any) One simple measure is the angle q between the vectors of desired (b) and actual (R) responses Recall that the angle between two vectors x and y is simply given by If the inner product of b and R is zero, q = 90o, and there is an absolute constraint. If q = 0o, the response and gradient point in exactly the same direction (b is an eigenvector of G)

The plot is for the first of our examples, where G = √ ! T R Ø ° 1 q = cos jj R jj jj Ø jj µ ∂ 10 0 0 40 µ ∂ 2 Ø = ° 1 Note here that q = 37o, even thought there is no covariance between traits and hence this reduces to two univariate responses. The constraint arises because much more genetic variation in trait 2 (the weaker-selected trait)

Drosophila serrata Constraints and Consequences Thus, it is theoretically possible to have a very constrained selection response, in the extreme none (G is a zero eigenvalue and b is an associated eigenvector) This is really an empirical question. At first blush, it would seem incredibly unlikely that b “just happens” to be near a zero eigenvector of G However, selection tends to erode away additive variation for a trait under constant selection

Emma Hine Stephen Chenoweth Empirical study from Mark’s lab: Cuticular hydrocarbons and mate choice in Drosophila serrata

Cuticular hydrocarbons • D. serrata

For D. serrata, 8 cuticular hydrocarbons (CHC) were found to be very predictive of mate choice. Laboratory experiments measured both b for this vector of 8 traits as well as the associated G matrix. While all CHC traits had significant heritabilities, the covariance matrix was found to be ill-conditioned, with the first two eigenvalues (g1, g2) accounting for roughly 78% of the total genetic variation. Computing the angles between each of these two eigenvalues and b provides a measure of the constraints in this system.

0 1 0 1 0 1 0 : 232 0 : 319 ° 0 : 099 0 : 132 0 : 182 ° 0 : 055 B C B C B C B C B C B C 0 : 255 0 : 213 0 : 133 B C B C B C B C B C B C 0 : 536 ° 0 : 436 ° 0 : 186 B C B C B C g = g = Ø = B C B C B C 1 2 0 : 449 0 : 642 ° 0 : 133 B C B C B C B C B C B C 0 : 363 ° 0 : 362 0 : 779 B C B C B C @ A @ A @ A 0 : 430 ° 0 : 014 0 : 306 0 : 239 ° 0 : 293 ° 0 : 465 q(g1, b) = 81.5o q(g2, b) = 99.7o Thus much (at least 78%) of the usable genetic variation is essentially orthogonal to the direction b that selection is trying to move the population.

Schluter (1996) suggested that we can, as he observed that populations tend to diverge along the direction given by the first principal component of G (its leading eigenvector) Evolution along “Genetic lines of least resistance” Assuming G remains (relatively) constant, can we relate population divergence to any feature of G? Schluter called this evolution along “genetic lines of least resistance”, noting that populations tend to diverge in the direction of gmax, specifically the angle between the vector of between-population divergence in means and gmax was small.

µ ∂ t π ( t ) ª M V N π ; ¢ G 2 N e Evolution along gmax There are two ways to interpret Schluter’s observation. (i) such lines constrain selection, with departures away from such directions being difficult (ii) such lines are also the directions on which maximal genetic drift is expected to occur Under a simple Brownian motion model of drift in the vector of means is distributed as, Maximal directions of change correspond to the leading eigenvectors of G.

Looking at lines of least resistance in the Australian rainbow fish (genus Melanotaenia )

Megan Higgie Katrina McGuigan

Two sibling species were measured, both of which have populations differentially adapted to lake vs. stream hydrodynamic environments The vector of traits were morphological landmarks associated with overall shape (and hence potential performance in specific hydrodynamic environments) Here, there was no b to estimate, rather the divergence vector d between the mean vector for groups (e.g., the two species, the two environments within a species, etc.) To test Schluter’s ideas, the angle between gmax and different d’s we computed.

Divergence between species, as well as divergence among replicate hydrodynamic populations within each species, followed Schluter's results (small angular departures from the vector d of divergent means and gmax). However, hydrodynamic divergence between lake versus stream populations within each species were along directions that were quite removed from gmax(as well as the other eigenvectors of G that described most of the genetic variation). Thus, the between- and within-species divergence within the same hydrodynamic environment are consistent with drift, while hydrodynamic divergence within each species had to occur against a gradient of very little genetic variation. One cannot rule out that the adaptation to these environments resulted in a depletion of genetic variation along these directions. Indeed, this may indeed be the case.

Beyond gmax : Using Matrix Subspace Projection to Measure Constraints Schluter’s idea is to examine the angle between the leading eigenvector of G and the vector of divergence More generally, one can construct a space containing the first k eigenvalues, and examine the angle between the projection of b onto this space and b This provides a measure on the constraints imposed by a subset of the useable variation

An advantage of using a subspace projection is that G is often ill-conditioned, in that lmax / lmin is large. In such cases (as well as others!) estimation of G may result in estimates of eigenvalues that are very close to zero or even negative. Negative estimates arise due to sampling (Hill and Thompson 1978), but values near zero may reflect the true biology in that there is very little variation in certain dimensions.

One can extract (estimate) a subspace of G that accounts for the vast majority of useable genetic variation by, for example, taking the leading k eigenvectors. It is often the case that G contains several eigenvalues whose associated eigenvectors account for almost no variation (i.e, lmax / tr(G) ~ 0) . In such cases, most of the genetic variation resides on a lower-dimensional subspace.

A = ( g ; g ; ¢ ¢ ¢ ; g ) 1 2 k ° ¢ ° 1 T T P = A A A A r oj ° ¢ ° 1 T T p = P Ø = A A A A Ø r oj To do this, first construct the matrix A of the first k eigenvalues The projection matrix for this subspace is given by Thus, the projection of b into this subspace is given by the vector Note that this is the generalization of the projection of one vector onto another

The constraints imposed within this subspace is given by the angle between p, the projection of b into this space, and b. For the Drosophia serrata CHC traits involved in mate choice., the first two eigenvalues account for roughly 80\% of the total variation in G. The angle q between b and the projection p of b into the subspace of the genetic variance is 77.1o Thus the direction of optimal response is 77o away from the genetic variation described by this subspace (which spans 78% of the total variance).

Looked at 9 CHC involved in mate choice in Drosophila bunnanda Anna Van Homrigh How typical is this amount of constraint? The estimated G for these traits had 98% of the total genetic variation in the first five PCs (the first four had 95% of the total variance). The angle between b and its projection into this 5-dimensional subspace was 88.2o. If the first four PCs were considered for the subspace, the projection is even more constrained, being 89.1o away for b. When the entire space of G is considered, the resulting angle between R and b is 67o

T T ¢ G = ° G Ø Ø G = ° R R Evolution Under Constraints or Evolution of Constraints? G both constrains selection and also evolves under selection. Over short time scales, if most alleles have modest effects, G changes due to selection generating linkage disequilibrium. The within-generation change in G under the infinitesimal model is

¢ G = ¢ æ ( A ; A ) = ° R R ij i j i j Thus, the (within-generation) change in G between traits i and j is The net result is that linkage disequilibrium increases any initial constraints. A simple way to see this is to consider selection on the index I = S zibi Selection on this index (which is the predicted fitness) results in decreased additive variance in this composite trait (Bulmer 1971).

Thus, as pointed out by Shaw et al. (1995), if one estimates G by first having several generations of random mating in the laboratory under little selection, existing linkage disequilibrium decays, and the resulting estimated G matrix may show less of a constraint than the actual G operating in nature (with its inherent linkage disequilibrium).

Why so much variation? It is certainly not surprising that little usable genetic variation may remain along a direction of persistence directional selection. What is surprising, however, is that considerable genetic variation may exist along other directions. The quandary is not why is there so little usable variation but rather why is their so much?

Quantitative genetics is in the embarrassing position as a field of having no models that adequately explain one of its central observations -- genetic variation (measured by single-trait heritabilities) is common and typically in the range of 0.2 to 0.4 for a wide variety of traits. As Johnson and Barton (2005) point out, the resolution of these issues likely resides in more detailed considerations of pleiotropy, wherein new mutations influence a number of traits (back to Fisher’s model!)

Once again, it is likely we need to move to a higher dimensional space to reasonably account for observations based on a projection into one dimension (i.e., standing heritability levels for a trait). The final consideration with pleiotropy is not just the higher-dimensional fitness surface for the vector of traits they influence but also the distributional space of pleiotropic mutations themselves.

The “deep” nature of G Is the covariance structure G itself some optimal configuration for certain sets of highly-correlated traits? Has there been selection on developmental processes to facilitate morphological integration (the various units of a complex trait functioning smoothly together), which in turn would result in constraints on the pattern of accessible mutations under pleiotropy (Olson and Miller 1958, Lande 1980)?

Developmental systems are networks

The second feature that studied regulatory/ metabolic networks showed is that the degree distribution (probability distribution that a node is connected to k other others) follows a power law P(k) ~ k-g First, they are small-world graphs, which means that the mean path distance between any two nodes is short. The members live in a small world (Bacon, Erdos numbers) Some apparently general features of Biological networks Graphs with a power distribution of links are called scale-free graphs. Scale-free graphs show they very important feature that they are fairly robust to perturbations. Most randomly-chosen nodes can be removed with little effect on the system.

Our spherical cow may in reality have a very non-spherical distribution of new mutation phenotypes around a current phenotype.

Spherical Cows Grazing in Flatland: Constraints to Selection and Adaptation