1 / 33

Coalescent Models for Genetic Demography

Coalescent Models for Genetic Demography. What can the Coalescent do for you? Rosalind Harding University of Oxford. Who was MtEve?. the most recent common ancestor (mcra) to whom all mtDNA haplotype diversity, currently sampled, can be traced.

braith
Download Presentation

Coalescent Models for Genetic Demography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

  2. Who was MtEve? • the most recent common ancestor (mcra) to whom all mtDNA haplotype diversity, currently sampled, can be traced.

  3. One possibility: First a bottleneck, then multiple lineages are established during expansion phases MtEve

  4. But if there wasn’t a bottleneck? • Then our predecessors collecting data 20,000 years ago, could have identified a different mtEve, an Eve from an earlier generation; • in 20,000 years time, a new generation will be likely to find their mtEve to be a grandn-daughter of our mtEve. • While our mtEve may be special to us, for archaeogeneticists of past and future generations she will have no particular significance!

  5. Insights from coalescent models Eve? Time Eve? Eve? present

  6. What is the coalescent? • a simple model which generates a probability distribution for gene genealogies sampled from a population.

  7. Further definitions • simple models: abstractions from complex demographic reality, which preserve key features • population: all individuals within a generation with the potential to contribute to the gene pool (including individuals who are reproductively successful as well as those who are not.) • gene genealogies: lineages of transmission of copies of a gene from parents to offspring • coalescence: where two transmission lineages find a common ancestor, looking backwards in time • probability distribution: a set of probabilities for many possible alternative gene genealogies compatible with the model

  8. Models and data • Interpreting genetic polymorphism data • consider a sample of genes from a contemporary population, with their allelic frequencies and sequence identities determined – these data do not reveal our genetic past directly, they must be interpreted. • Options for model choice • evolution as phylogeny, phylo-geography • evolution as a balance of mutation and genetic drift in a population with a specified demography (population size, mating pattern, offspring distribution)

  9. Characteristics of polymorphism data • For a small proportion of sites in human DNA, a second allele is present in populations due to a relatively recent mutation; this is polymorphism. • Polymorphism constitutes a transient phase in evolution, intermediate between the occurrence of a mutation and the fixation of either allele at 100%. • MtDNA trees may distort frequencies of polymorphisms. They show sets of mutation events as a proxy for fixed differences; it is the new allele that is assumed to fix (attain 100%). • These potential sources of error for time scale estimates may be minor but could be substantial.

  10. Ingman and Gyllensten, 2003 Genome Research 13:1600-1606 Neighbor-joining phylogram of 101 mtDNA coding regions sequences. Is phylogenetic branching the right model? Note variable branch lengths and endpoints; yet all individuals sampled in the present!

  11. A phylogenetic model with added genealogical detail and molecular clock

  12. Trajectories for neutral alleles

  13. Ne=10, constant over time Understanding genetic drift as genealogy Two of the gene copies in gen. t are inherited by all of the offspring copies in generation t+x. This is the process of drift that leads eventually to either loss or fixation (100% frequency in the population) of new mutations.

  14. Some advantages of coalescent models over phylogeny for interpreting polymorphism data • they make better use of molecular clocks and do not treat polymorphisms as fixed differences; • as models of populations they clarify the difference between • ‘absence of evidence’ (eg for Neanderthal ancestry) and • ‘evidence of absence’ (any single locus only represents such a small sample of ancestors from >50,000 years ago that with present data we don’t have the statistical power to rule out Neanderthal ancestry). • they incorporate some measure of our uncertainty about the evolution of allele frequencies (a mixed process of mutation and transmission in genealogies).

  15. Assumptions of Kingman’s (1982) coalescent for interpreting polymorphism data (random sample) • Neutrality • All new mutations unique and informative • If individuals are diploid in a population of size N, the model applies to 2N independent, haploid copies of a gene • Random mating within a population • Constant population size, Ne • A very specific probability distribution for transmissions of gene copies to 0, 1, 2 … offspring • Non-overlapping generations

  16. Aims of coalescent modelling: to make inferences from genetic data • to simulate different demographies to see what to expect in polymorphism data; • to estimate parameters under an explicit demographic model, eg Kingman’s coalescent; • to estimate in which generation (and sub-population) particular lineages coalesced or mutations occurred, given explicit demographic assumptions; • to evaluate the uncertainty in our estimates; • to introduce new parameters to improve the model, judging by its fit to data, to learn about demography.

  17. The ancestry of a sample composed of two copies of the gene in generation t0 MRCA Following the ancestry of a sample of two copies of a gene (gene A) from time t0, ie the present, backwards (red) , we find their most recent common ancestor (MRCA) at generation t8.

  18. Expected coalescence times Expected time to coalescence for n lineages As the sample size increases towards 2N, E(tmrca) approaches 4N, which equals the fixation time for a newly arisen mutation.

  19. Constant N N E(T2)=2Ne E(TMRCA)=4Ne(1-1/5) E(T5)=Ne/5 N expanding N reducing N0 N0 time N1 N1 Thanks to Lounes for this slide

  20. Simulated genealogies with constant Ne • TMRCA • 4.57 • 2.93* • 1.48 • 0.01 1 2 units of 2Ne generations 3 4 eg 2.93x2x10,000x20 = 1.2 million years

  21. Simulating recent expansion: not much variability in TMRCA between genealogies 1 2 TMRCA 1. 0.0026 2. 0.0029 3. 0.0028 4. 0.0027 3 4 units of 2Ne generations ~1000 years of human evolution

  22. 1. A time scale is given by the coalescent model for the demography (drift history) 2. Add mutations

  23. Infinite-sites mutation in a gene tree

  24. The relationship between av pairwise sequence difference, p, and the parameter q in Kingman’s Coalescent 2N generations

  25. Data: Aboriginal Australian mtDNAs Model: Kingman’s coalescent MtDNA Coding DNA Sites: 9000 to 16000 one colonization event? ? ? ? ? or several founding lineages at different times? Note the non-uniform spacing of mutations

  26. Another advantage of coalescent models over phylogeny • While the population bottlenecks implicitly assumed in phylogenetic and phylogeographic analyses can be explicitly assumed in a coalescent framework, alternative demographies may be assumed, or may be inferred. • (the relationship between coalescent nodes and colonization events is very ambiguous.)

  27. Kingman’s coalescent as H0 • Kingman’s coalescent model is a starting point, available to us even before we collect any data. • Having collected data, we can test whether the data show goodness-of-fit to the expectations of our starting model. • If not, we should change or add parameters to improve the model. At present there are some options available (not many, but some!)

  28. Variations from Kingman’s coalescent • Selection • Recurrent and back mutation • Recombination • *Non-random mating: eg geographic subdivision with specified migration between subpopulations • *Population size fluctuation, including bottlenecks and expansions • Non-’Poisson’ distributions of offspring numbers • Unequal generation intervals between lineages *similar model but additional parameters

  29. The coalescent with structure Much migration Little migration Each generation m alleles are exchanged between sub-populations. Discrete migration probability m/2N, an allele migrates. Continuous waiting time for migration is expo(m)

  30. Summary and points for discussion • Data drawn as gene trees show the relative ordering of coalescence events. • The length of time between coalescence events is a function of the number of mutation events inferred from the data AND the assumed demographic history. (Molecular clocks should NOT be applied directly.) • Present phylo-geographic methods fudge the data to circumvent thinking about demography. Consequently we do not learn anything about demography from them. Furthermore, these methods may be generating some highly inaccurate time estimates and they don’t provide satisfactory estimates of the uncertainty surrounding these estimates. • Coalescent modelling to date draws attention to many concerns, but to improve ‘phylo-geographic’ inference we need implementations of the structured coalescent appropriate for a colonization/extinction demography.

  31. MtDNACoding DNA Sites: 500 to 9000

  32. Implications of drift as genealogy All the identical copies of a gene, eg all the copies of the MC1R-151 red hair allele, carried by thousands of people across Europe, have been inherited from a single common ancestor living some time in the past. Although mutation may have generated MC1R-151 alleles many times, all these mutations were quickly lost, except for one. On one occasion only, the new mutation increased in frequency, becoming a common polymorphism. Could this be true? (We think so!)

More Related