1 / 1

Estimating Divergence Times and Migration Using Polymorphism Data

This study develops an Approximate Bayesian Computation approach to estimate population parameters, including divergence times and migration, using polymorphism data from two closely related species of Lepidoptera. The approach overcomes limitations of existing methods and demonstrates high accuracy and power in detecting migration.

raheem
Download Presentation

Estimating Divergence Times and Migration Using Polymorphism Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. a. A gene genealogy for a recent divergence time without migration b. A gene genealogy for an old divergence time without migration c. A gene genealogy for an old divergence time with migration Na T T N1 N2 M T 1 2 3 a c 1 2 3 a c 1 2 3 a c b b b Excess of shared polymorphisms (occurring along the red branch) and few fixed sites (purple branch). Few shared polymorphisms (none here) and an excess of fixed sites. Excess of shared polymorphism and few fixed sites. Estimated from coalescent simulations Posterior distribution Calculated explicitly Prior distributions on parameters 1 S1=1S2=2Sshared=1Sfixed=1 2 3 a b c Abstract Population divergence times are of interest in many contexts, from human genetics to conservation biology. These times can be estimated from polymorphism data. However, existing approaches make a number of assumptions (e.g., no recombination within loci or no migration since the split) that limit their applicability. To overcome these limitations, we developed an Approximate Bayesian Computation approach to estimate population parameters for a simple split model, allowing for migration as well as intralocus recombination. Application to simulated data suggests that the approach provides fairly accurate estimates of population sizes and divergence times and has high power to detect migration since the split. We illustrate the potential of the method by applying it to polymorphism data from five highly recombining loci surveyed in two closely related species of Lepidoptera (Papilio glaucus and P. canadensis). Fig-2. Effects of divergence time and migration on polymorphism data Estimating divergence times and testing for migration using multi-locus polymorphism dataCéline Becqueta, Andrea S. Putnamb, Peter Andolfattob, and Molly PrzeworskiaDept. of Human Genetics, Chicago, IL, USA, 60637a; University of California at San Diego, La Jolla, CA, 92093b Examples of genealogical histories for three sequences sampled from each of two closely related populations, under different models. The patterns of polymorphism and divergence expected under each model are indicated below. For simplicity, we present a single genealogy, but for recombining loci, there may be many histories within a single region (i.e. there is an ancestral recombination graph, rather than a tree). The vertical branches represent ancestral lineages for the six sequences; they are colored according to whether a mutation would lead to a fixed, shared or unique polymorphism in the sample (see Figure 1). In c, gene flow occurred (yellow line), thus sequence 3 was sampled in population one but its ancestor came from population two. Fig-3. Performance on a small simulated data set Mean of the divergence time (a) and the ratio of ancestral to current population size (b). The estimates are based on polymorphism data from ten simulated loci of 1 kb, generated with: a sample size of 20 individuals from each population, the population mutation rates θ1=θ2=θa=.001, T=5x104 generations and M=5. Each vertical line refers to a data set (Y-axis) , the red line indicates the true value and the X-axis range corresponds to the range of the prior distribution. As can be seen, the divergence times tend to be over-estimated, while the ancestral population size estimates are more accurate. Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable within and between extant populations (see Figure 1). For example, the number of alleles shared between very closely related species depends on the time at which the species split and whether gene flow occurred since the split (see Figure 2). Thus, polymorphism data can be used to estimate the demographic parameters describing the history of two incipient species (see Figure 3). Here, we consider a simple model in which two populations split T generations ago and the number of migrants exchanged between them is M per generation. Na, N1 and N2 are the effective population sizes for the ancestral, first and second descendant populations, respectively. We denote the set of parameters by a. Our goal is to estimate the posterior distribution of agiven the data. Rather than using all the data to estimate these parameters, we summarize the data for each locus by four statistics known to be sensitive to the parameters of interest (see Figure 1 for details). Given a genealogy, the probability of obtaining these statistics can be calculated explicitly. We therefore take the following approach to obtain an estimate of the posterior distribution of the parameters: Specifically, we pick a set of parameters independently from prior distributions, then simulate a genealogical history for each locus and calculate u= p(D|G,a). We then weight the values of the parameters by u to obtain an estimate of their posterior probability. Application to two Papilio species We applied our method to data from five highly recombining loci sampled in two species of Lepidoptera (Papilio glaucus and P. canadensis). These two species are known to exchange migrants and experience high levels of recombination. In order to examine the sensitivity to assumptions about migration, we compared the parameter estimates obtained in models with and without gene flow: the time of divergence appears to be under-estimated and ancestral population size over-estimated when migration is ignored (see Table 2). Fig-4. Ranges of P. glaucus and P. canadensis. A narrow hybrid zone forms where the ranges meet. Female mimetic morph of P. glaucus is shown with yellow morphs. Fig-1. Summary statistics used for estimation An example of polymorphism data at a locus in three sequences sampled from each of two populations. The horizontal lines represent aligned sequences; the colored squares, disc and ovals stands for segregating sites. We use the following summaries of the polymorphism data at each locus: the number of segregating sites specific to sample one (S1), specific to sample two (S2), shared between samples from both populations (Sshared) and fixed in either population sample (Sfixed). Future directions Our current method is relatively slow when using data from multiple loci because it is searching a huge space of possible histories and parameters. We would like to speed up the method and extend it to more complex models. To do so, we will need to account for two sources of variance: in the genealogies and the parameters. We therefore plan to generate many genealogies for the same set of parameters in order to improve the accuracy of our estimate of p(D|a) and use Markov Chain Monte Carlo in order to better explore the parameter space. Keystone Symposia. Genome Sequence Variation. Jan 08 – Jan 13, 2006

More Related