1 / 35

Non-stationary population genetic models with selection: Theory and Inference

Non-stationary population genetic models with selection: Theory and Inference. Scott Williamson and Carlos Bustamante. Cornell University. Inferring natural selection from samples. Statistical tests of the neutral theory (lots) Methods for detecting selective sweeps (lots)

jody
Download Presentation

Non-stationary population genetic models with selection: Theory and Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-stationary population genetic models with selection:Theory and Inference Scott Williamson and Carlos Bustamante Cornell University

  2. Inferring natural selection from samples • Statistical tests of the neutral theory (lots) • Methods for detecting selective sweeps (lots) • Parametric inference: estimating selection parameters, etc. • Quantification of selective constraint, deleterious mutation

  3. The demography problem • Many existing methods assume random mating, constant population size • These assumptions don’t apply in most natural populations • The effect of demography can mimic the effect of natural selection

  4. Natural selection and population growth • Inferring selection from the frequency spectrum while correcting for demography • The McDonald-Kreitman test: does recent population growth cause you to misidentify negative selection as adaptive evolution?

  5. 7019 4424 4961 5286 1972 2188 3529 163 975 A G G C T T A A A 1 A T G C T C G A A 2 G T G T T C A C G 3 A G G C T C A A G 4 A G A C C C G A A 5 1 2 1 1 1 4 2 1 3 The frequency spectrum: an example The frequency spectrum Site Sequence Count Frequency class: Frequency class Ancestral Derived

  6. Natural selection and the frequency spectrum Equilibrium neutral and positively selected frequency spectra Neutral 2Ns=2 Count Frequency class

  7. Natural selection and the frequency spectrum Equilibrium neutral and negatively selected frequency spectra Neutral 2Ns=-2 Count Frequency class

  8. Natural selection vs. demography Non-stationary neutral and equilibrium selected frequency spectra Population growth, neutral Equilibrium, 2Ns=-2 Count Frequency class

  9. How do we distinguish selection from demography? • McDonald-Kreitman approach: • Use a priori information to classify changes as “neutral” (e.g. synonymous, non-coding) or “potentially selected” (e.g. non-synonymous) • Putatively neutral changes are treated as a standard for patterns of neutral evolution in a particular sample • Potentially selected sites are compared to the neutral standard Can we develop a neutral standard for the frequency spectrum?

  10. Comparing frequency spectra for different classes of mutation Observed frequency spectra • This talk: • Likelihood ratio test of neutrality at potentially selected sites, using information from the neutral sites • Biologically meaningful measure of the difference between the two spectra Putatively neutral Potentially selected Count Frequency class

  11. Comparing frequency spectra for different classes of mutation Observed frequency spectra A model-based approach: • Fit a neutral demographic model to estimate demographic parameters Putatively neutral Potentially selected Count • Given those parameter estimates, fit a selective demographic model to estimate selection parameters, test hypotheses Frequency class

  12. Comparing frequency spectra for different classes of mutation Observed frequency spectra • Requirements: • Demographic model • Frequency spectrum predictions from the model under neutrality • Frequency spectrum predictions from the model subject to natural selection Putatively neutral Potentially selected Count Frequency class

  13. Theory: population growth model 2-epoch model  NC Population size NA =NA/NC time now Model parameters: ,

  14. Theory: predicting the frequency spectrum Definitions: xi Number of sites in frequency class i f(q,t;) Distribution of allele frequency, q, at time t n Sample size Predictions:

  15. Theory: the distribution of allele frequency Poisson Random Field approach (Sawyer and Hartl 1992): • Use single-locus diffusion theory to predict the distribution of allele-frequency • If sites are independent (i.e. in linkage equilibrium) and identically distributed, then the single-locus theory applies across sites To get f, we need to solve the diffusion equation:

  16. Theory: time-dependent solution, neutral case The forward equation under neutrality: Kimura’s (1964) solution, given some initial allele frequency, p:

  17. Theory: time-dependent solution, neutral case Applying Kimura’s solution to the 2-epoch model: ancestral mutations Kimura’s (1964) solution, given some initial allele frequency, p: Distribution of allele frequency:

  18. Theory: time-dependent solution, neutral case Expected frequency spectrum after a change in population size (=0.01) 0.8 0.6 P(i,n;,0.01) 0.4 0.2 1 2 3 4 5 6 7 8 9 frequency class

  19. Theory: time-dependent solution, neutral case Multinomial likelihood:  Maximum likelihood estimates of  and   Likelihood ratio test of population growth

  20. Comparing frequency spectra for different classes of mutation Observed frequency spectra • Requirements: • Demographic model • Frequency spectrum predictions from the model under neutrality • Frequency spectrum predictions from the model subject to natural selection Putatively neutral Potentially selected Count Frequency class

  21. Theory: time-dependent solution, selected case The forward equation with selection: where =2NCs Initial condition:

  22. Theory: time-dependent solution, selected case • Numerically solve the forward equation using the Crank-Nicolson finite differencing scheme • Use this approximation of f to evaluate the likelihood function: • Fix  and  to their MLEs from the neutral data • Optimize the likelihood for . Likelihood ratio test of neutrality:

  23. Theory: time-dependent solution, selected case How can we be sure that the numerical solution actually works? • Von Neumann stability analysis: solution is unconditionally stable • Numerical solution converges to the stationary distribution after ~4NC generations • Comparison with time-dependent neutral predictions: Kimura, Crank, and Nicolson all agree with each other

  24. Human Polymorphism Data • From Stephens et al. (2001) • 80 individuals, geographically diverse ancestry • 313 genes, 720 kb sequenced • ~3000 SNPs (72% non-coding, 13% synonymous, 15% non-synonymous)

  25. Results for non-coding changes, assuming neutrality

  26. Results for non-synonymous changes, categorized by Grantham’s distance

  27. Ongoing work and future directions • Simulate, simulate, simulate • How robust is the method to different types of demographic forces? • How does linkage among some sites affect the analysis? • How does estimation error affect the LRTs? • Numerical solution for different demographic scenarios (e.g. bottleneck, population structure) • Variable selective effects among new mutations

  28. The McDonald-Kreitman test Sn Number of non-synonymous segregating sites Dn Number of non-synonymous fixed differences Ss Number of synonymous segregating sites Ds Number of synonymous fixed differences Adaptive evolution Negative selection Extensions: Sawyer and Hartl (1992), Rand and Kann (1996), Smith and Eyre-Walker (2002), Bustamante et al. (2002), others

  29. Demography and the McDonald-Kreitman test • Robust to different demographic scenarios because it implicitly conditions on the underlying genealogy (see Nielsen 2001) • However, under some demographic scenarios it’s possible to misidentify the type of selection • Weak negative selection with population growth • When the population size is small, non-synonymous deleterious mutations might be fixed by drift • Once the population size becomes large, the level of non-synonymous polymorphism would be reduced (relative to the level of synonymous polymorphism)

  30. Demography and the McDonald-Kreitman test • Over what range of parameter values might you misidentify negative selection as adaptive evolution? • How large is the effect? Eyre-Walker (2002): • Addressed these questions, finding that recent population growth or bottlenecks can cause you to misidentify negative selection • Assumed that levels of polymorphism and fixation rates changed instantaneously with population size

  31. Demography and the McDonald-Kreitman test where tdiv is the divergence time, measured in 2NC generations

  32. Demography and the McDonald-Kreitman test =0.1, tdiv=10 10 10 =0.1, tdiv=4 1 1 0.1 1 0.1 1 0.01 0.01 Expected Neutrality Index (NI) =1, tdiv=4 =1, tdiv=10 10 10 1 1 0.1 1 0.1 1 0.01 0.01  (=NA/NC)

  33. Demography and the McDonald-Kreitman test: Preliminary results • It is possible to misidentify negative selection for some parameter combinations • But…the parameter range over which this is true is probably smaller than previously thought, as is the magnitude of the effect

  34. Summary • Model-based approach to correcting for demography while inferring selection • Evidence for very recent population growth in humans • Reasonable estimates of selection parameters for classes of non-synonymous changes • McDonald-Kreitman test: negative selection + population growth problem not as severe as previously thought • Numerical methods for solving the diffusion are fast, accurate, and fun!

  35. Collaborator:Carlos Bustamante Acknowledgements Data:Genaissance Pharmaceuticals Helpful discussions:Bret Payseur, Rasmus Nielsen, Matt Dimmic, Jim Crow, Hiroshi Akashi, Graham Coop

More Related