650 likes | 743 Views
Separating Population Structure from Recent Evolutionary History. 1. f st . 4N ev .
E N D
Separating Population Structurefrom Recent Evolutionary History
1 fst 4Nev Problem: Spatial Patterns Inferred Earlier Represent An Equilibrium Between Recurrent Evolutionary Forces Such as Gene Flow and Drift. E.g.,But, Can Obtain The Same Pattern Due to Recent Historical Events That Have Not Had Time to Reach Equilibrium
To Examine Historical Events & Non-Equilibrium States, Need to Study Genetic Variation in Both Space & Time • Directly Sample Populations From the Past • Reconstruct Variation Through Time Indirectly
Direct Study: mtDNA in the Woolly Mammoth Debruyne et al. 2008. Out of America: Ancient DNA Evidence for a New World Origin of Late Quaternary Woolly Mammoths. Curr. Biol. 18:1320-1326.
Direct Study: mtDNA in the Woolly Mammoth Debruyne et al. 2008. Out of America: Ancient DNA Evidence for a New World Origin of Late Quaternary Woolly Mammoths. Curr. Biol. 18:1320-1326.
Indirect Studies • Recall that Dt=D0(1-r)t • Therefore, Multi-locus or Multi-site Polymorphic Data Contains Historical Information, and This Retention Is For Long Periods of Time When r Is Small. • Attempts to Reconstruct History Depend Upon Multiple Loci or Upon Multi-Site Haplotypes.
Multiple Loci: Principle Component Analysis of Genetic Data This procedure has long been used in human genetics to extract multi-locus information about gene flow patterns (e.g., Cavalli-Sforza & Ammerman, 1984).
Multiple Loci: Principle Component Analysis of Genetic Data Novembre et al. Nature 31 Aug 2008. Based on 197,146 loci in 1,387 individuals.
Overlay of the steepest slope values (upper 5%) Microsatelite survey of naked mole rats in Meru National Park, Kenya (Jon Hess)
Haplotypes • One Method Is To Look At the Spatial Distribution of Globally Rare, Tip Haplotypes (Although They May be Locally Common) • Coalescent Theory Implies Such Haplotypes Are Recent, And Therefore Are Not In Equilibrium And Have Limited Spatial Distributions • Therefore, Globally Rare, Tip Haplotypes Provide A Straightforward Method of Observing The Movements of Genes Through Space Over Short and Recent Time Periods.
“Private” 9-repeat allele Geographic distribution of the Asian and American populations genotyped for the microsatellite D9S1120 Schroeder, K. B. et al. Mol Biol Evol 2009 26:995-1016
Visual genotypes, clustered by population, for individuals either homozygous or heterozygous for the 9-repeat allele Implies that this “private allele” is identical by descent in all Western Beringians and Native Americans, which in turn implies that Native Americans Descended (at least in part) From These Western Beringian Populations. Schroeder, K. B. et al. Mol Biol Evol 2009 26:995-1016
Method for estimating the TMRCA of copies of an allele from the number of recombination events on its shared haplotypic background
Schematics of the demographic models used for the coalescent simulations: (A) population split with two equal-size descendant populations (Asia and America), (B) population split with NAs/NAm equal to 0.15 at TAs/Am, and (C) population split with NAs/NAm equal to 0.02 at TAs/Am, followed by population growth such that NAs/NAm equals 0.15 at T0. Models D and E are the same as models B and C, respectively, but include population substructure in Asia and in America. Under the different best models, the mean TMRCA of the 9-repeatallele ranged from 293 generations to 1,596 generations; usinga generation time of 25 years resulted in a TMRCA of 7,325-39,900years ago. Averaging over all of our best models, the mean TMRCAis 513 generations ago or about 12,825 years ago. The 95% confidenceintervals for all of the best models produced ages for the MRCAof the 9-repeat allele, that range from 144 to 1951 generationsago, or approximately 3,600-48,775 years ago.
Haplotype Trees • Are Biologically Meaningful Only When Recombination Is Absent Or Rare • Gives Some Information About Temporal Ordering of Mutational Variation, Both the Rare and the Common Mutations • Not Limited to Recent Events, But Can Go Back Further In Time (But Not Beyond the Most Recent Common Ancestral DNA Molecule)
A Haplotype Tree Should Never Be Equated To A Tree of Populations. It Is Only The Tree of The Genetic Variation For That DNA Region.There Is Information About Population History in the Haplotype Tree, But It Must Be Extracted Carefully.
It is dangerous to equate a haplotype tree to a species tree.It is NEVER justified to equate a haplotype tree to a tree of populations within a species because the problem of lineage sorting is greater and the time between events is shorter. Moreover, a population tree need not exist at all.
Nested Clade Analysis • Converts Haplotype Trees Into A Nested Statistical Design • Other Data (Phenotypic or Geographical) Are Then Overlaid Upon The Nested Design • Statistical Tests Are Performed To Detect Significant Associations Between the Data and The Haplotype Tree • DOES NOT EQUATE THE HAPLOTYPE TREE TO A POPULATION TREE!
NCPA Distance Measures = Sample locations
A Haplotype Tree In Elephants Amboseli Tsavo Hwange Sengwa Matetsi Victoria Falls
Only When Statistical Significance Is Achieved Is The Biological Significance Interpreted With Explicit, a priori Criteria • For Example, Under Isolation By Distance, It Takes Many Generations For A New Haplotype To Spread Across Many Demes. • Therefore, Expect Older Haplotypes To Be More Widespread Than Younger Haplotypes • Younger Haplotypes Tend To Have Geographical Ranges Nested Within the Ranges of Their Ancestral Haplotypes
A Haplotype Tree In Elephants Amboseli Tsavo Gene flow with IBD Gene flow with IBD Gene flow with IBD Hwange Gene flow with IBD Sengwa Matetsi Victoria Falls
Historical Events Also LeaveLasting Patterns in Haplotype Trees. For Example, When A Population Expands Into a New Area, Even Haplotypes Recently Created by Mutation Can Become Geographically Widespread, and Haplotypes Created By Mutation After the Expansion Can Be Located Far From the Geographical Center of Their Ancestral Haplotype.
Present Range Expansion Past
Nested Clade Analysis of the Chub (Leuciscus cephalus): Range Expansion (from Durand et al. 1999) Older Clade 2-1 Younger Clade SPE
Historical Events Also LeaveLasting Patterns in Haplotype Trees. For Example, When A Population Is Fragmented or Otherwise Effectively Isolated, Haplotypes That Arise After The Fragmentation/Isolation Event Cannot Spread to Other Geographical Areas, and With Increasing Time, More Mutations Can Accumulate, Resulting In Larger Than Average Branch Lengths Between Clades in Different Isolates.
Recent Fragmentation Old
Fragmentation between Ambystoma tigrinum tigrinum (Clade 4-2) and A. t. mavortium (Clade 4-1)
African Elephants (Roca, A. L., N. Georgiadis, and S. J. O'Brien. 2005. Cytonuclear genomic dissociation in African elephant species. 37:96-100. Forest Elephant Savanna Elephant The Nested Design Means That Inferences Are Robust To Topological Variation Induced by the Evolutionary Stochasticity of the Coalescent Process
Fragmentation Inferences From NCA mtDNA Y-DNA All 5 DNA regions had a different topology with respect to the 3 elephant taxa (only BGN gave the “species tree”); yet NCPA inferred a fragmentation event between forest and savanna elephants in all 5 DNA regions. Past Fragmentation Followed By Range Expansion and Secondary Contact PLP BGN Highly Significant Fragmentation Events Found In All Five Haplotype Trees Past Fragmentation PHKA2
Nested Clade Phylogeographic Analysis • Recurrent Gene Flow, Range Expansion and Fragmentation Could All Have Occurred at Different Times and/or Places. • NCPA Therefore Looks For Multiple Patterns, Not Just One • The Relative Temporal Ordering of Events in a Nested Series of Clades Is Also Inferred by NCPA
Range Expansion Secondary Contact Range Expansion Isolationby Distance Isolation by Distance Fragmentation Inferences from mtDNA haplotype tree of Ambystoma tigrinum from NCPA and supplemental test for secondary contact (Mol. Ecol. 10: 779-791, 2001)
By Analyzing Haplotype Trees for mtDNA, Y-DNA, X-linked DNA and Autosomal DNA, One Can Sample A Wide Variety of Time Scales and Both Male and Female Mediated Gene Flow and Historical Events
By Analyzing Multiple Haplotype Trees Can Statistically Correct For The Evolutionary Stochasticity of The Coalescent Process For Any One Genomic Region
Inference Errors in Nested Clade Analysis • Inference Requires That An Appropriate Mutation Occurred At the Right Time and Right Place: Therefore, Some Events and Processes Are Missed With A Particular DNA Region. • Selection and Evolutionary Stochasticity Can Distort The Distribution of Haplotypes in Space and Time, Thereby Leading to False Positive Inferences. These errors can be minimized by studying multiple loci and requiring each inference (type, place and time) to be cross-validated by two or more loci.
Multilocus Nested Clade Analysis • Perform Single Locus NCPA on n loci • Discard any inferences made only by a single locus • Group together all the inferences made by 2 or more loci that are concordant by type of inference and geographical location. • Test the null hypothesis that all inferences of an event that are concordant by event type and location are a single event. • Because gene flow is a recurrent process, inferences of gene flow between two regions are not necessarily concordant in time, but can test the null hypothesis that there was no gene flow between two regions in an interval of time, say t1to t2 given multiple inferences of gene flow between the two regions. • ALL RETAINED INFERENCES HAVE BEEN CROSS-VALIDATED ACROSS LOCI AND HAVE EXPLICIT, QUANTIFIED STATISTICAL SUPPORT.
Using Theory Developed by Tajima (1983) and Kimura (1970), The Distribution Of The Inference Time Is: where kiis the average pairwise nucleotide diversity among the haplotypes in DNA region i in the youngest monophyletic clade that contributed in a statistically significant fashion to the NCPA inference of interest, and Tiis the age obtained by the Takahata et al. molecular clock estimator (or perhaps some other method) for this inference from DNA region i.
Estimated Times To Common Ancestor (Method of Takahata et al. 2001) Dhc Nuc.Diff. Between Humans & Chimps Dh Nuc.Diff. Within Humans TMRCA = 12Dh/Dhc 6 Million Years Ago
A Likelihood Ratio Test of The Hypothesis That The Estimated Times of An Event From j Loci Are The Same
Fragmentation Inferences From NCA mtDNA Null Hypothesis: there was a single fragmentation event between forest and savanna elephants. log-likelihood ratio test = 1.497 with 4 degrees of freedom, p= 0.8272. Accept Null Hypothesis, with T = 4.2 MYA. There are at least 2 lineages of African Elephants. Y-DNA Past Fragmentation Followed By Range Expansion and Secondary Contact PLP BGN Highly Significant Fragmentation Events Found In All Five Haplotype Trees Past Fragmentation PHKA2
Performed Nested Clade Analyses on 25 DNA Regions in Humans • Mitochondrial DNA (Ingman et al.Nature408, 708 - 713, 2000: Sykes et al. American Journal of Human Genetics57, 1463-1475, 1995; Torroni et al. American Journal of Human Genetics53, 563-590, 1993, American Journal of Human Genetics53, 591-608, 1993). • Y-DNA (Hammer et al. Molecular Biology and Evolution15, 427-441, 1998) • 11 X-Linked Regions (Balciuniene et al. 2001; Garrigan et al. 2005; Hammer et al. 2004; Harris. & Hey, 1999, 2001; Kaessmann et al. 1999; Nachman et al. 2004; Saunders et al. 2002; Verrelli et al. 2002; Yu et al. 2002) • 12 Autosomal Genes (Bamshad et al. 2002, Harding et al. 1997; Hollox et al. 2001; Jin et al. 1999; Koda et al. 2001; Rana et al. 1999; Rogers et al. 2000; Toomajian and Kreitman 2002; Wooding et al. 2002;Zhang & Rosenberg 2000).
Three Out-of-Africa Events, All Defined By Three or More Loci With A High Degree of Temporal Homogeneity But With Highly Significant Heterogeneity Between The Three Events P = 0.95 P = 0.51 P = 0.62 The log likelihood ratio test rejects the null hypothesis that all 15 events are temporally concordant with a probability value of 3.89 10-15.
There Were At Least Three Out-of-Africa Expansion Events Over the Last 2 Million Years
Inferences of Gene Flow That Are Concordant Geographically Are NOT Necessarily Concordant Temporally Because Gene Flow is a Recurrent Process. However, We Can Test The Null Hypothesis of NO GENE FLOW Between Two Geographical Regions Over a Specified Time Interval.
Test Of The Null Hypothesis of NO GENE FLOW Between Two Geographical Regions Over a Specified Time Interval l to u:
Extensive overlap implies cross-validation with the exception of MX1, the only locus with most of its probability mass in the Pliocene. Gamma Distributions For 19 African/Eurasian Gene Flow Inferences With Isolation By Distance The lack of clusters implies there was no prolonged breaks in gene flow throughout the Pleistocene
The Null hypothesis of isolation (no gene flow) in this time interval is rejected with p < 10-8 Testing The Null Hypothesis of No African/Eurasian Gene Flow Throughout the Pleistocene