350 likes | 357 Views
This article discusses the different signatures that selection leaves in the genome, including population differentiation, frequency spectrum tests, selective sweeps, haplotype structure, and more.
E N D
Detecting selection using genome scans Roger Butlin University of Sheffield
Nielsen R(2005) Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218. What signatures does selection leave in the genome? Population differentiation – today’s focus! Frequency spectrum, e.g. Tajima’s D Selective sweeps Haplotype structure (linkage disequilibrium) MacDonald-Kreitman tests (or PAML over long time-scales)
Frequency distribution: From Nielsen (2005): frequency of derived allele in a sample of 20 alleles. Tajima’s D = (π-S)/sd, summarises excess of rare variants
MacDonald-Kreitman and related tests dN = replacement changes per replacement site dS = silent changes per silent site dN/dS = 1 - neutral dN/dS < 1 - conserved (purifying selection) dN/dS > 1 - adaptive evolution (positive selection)
Selection on phenotypic traits: QTL Association analysis Candidate genes
Genome scans (aka ‘Outlier analysis’)
Littorinasaxatilis – locally adapted morphs What signatures of selection might we look for? ‘H’ ‘M’ Thornwick Bay
Signatures of selection: • Departure from HWE • Low diversity (selective sweep) • Frequency spectrum tests • High divergence • Elevated proportion of non-synonymous substitutions • LD
A concrete example: adaptation to altitude in Rana temporaria (Bonin et al. 2006) High – 2000m Intermediate – 1000m Low – 400m 190 individuals 392 AFLP bands
Dfdist– Beaumont & Nichols 1996 DetSel – Vitalis et al. 2001 to N t m N1 Ne Ne N N N0 N N N2 μ N N FST – symmetrical population differentiation, as a function of heterozygosity F1,2 – measure of divergence of population 1,2 from population 2,1 Generating the expected distribution Does the structure/history matter?
DetSel Dfdist 95% CI 95% 50% 5% ‘Low 1’ vs ‘High 1’
392 AFLPs, 12 pairwise comparisons across altitude or 3 altitude categories, 95% cut off
343 loci 8 loci
Outliers and selected traits • Rogers and Bernatchez (2007): • Dwarf x Normal cross both backcrosses • Measure ‘adaptive’ traits (9) • QTL map (>400 AFLP plus microsatellites) • Homologous AFLP in 4 natural sympatric population pairs • Outlier analysis (forward simulation based on Winkle) Coregonusclupeaformis(lake whitefish) *Only 3 outliers shared between lakes
Nosil et al. 2009 review of 14 studies: • 0.5 – 26% outliers, most studies 5-10% • 1 - 5% outliers replicated in pair-wise comparisons • 25 - 100% of outliers specific to habitat comparisons • No consistent pattern for EST-associated loci • LD among outliers typically low • But many methodological differences between studies • Population sampling • Marker type • Analysis type and options • Statistical cut-offs
Environmental correlations • SAM – Joost et al. 2007 • IBA – Nosil et al. 2007 • FST for each locus correlated with ‘adaptive distance’, controlling for geographic distance (partial Mantel test)
Methodological improvements – Bayesian approaches • BayesFst – Beaumont & Balding 2004 • Bayescan – Foll & Gaggiotti 2008 For each locus i and population j we have an FST measure, relative to the ‘ancestral’ population, Fij Then decompose into locus and population components, Log(Fij/(1-Fij) = αi + βj αi is the locus-effect – 0 neutral, +ve divergence selection, -ve balancing selection βj is the population effect Assuming Dirichlet distribution of allele frequencies among subpopulations, can estimate αi + βj by MCMC In Bayescan, also explicitly test αi= 0 Ancestral
Apparently much greater power to detect balancing selection than FDIST Lower false positive rate Wider applicability
Methodological improvements – hierarchical structure • Arlequin – Excoffier et al. 2009
Bayenv – Coop et al. 2010 • Estimates variance-covariance matrix of allele frequencies then tests for correlations with environmental variables (or categories). • Software available at: http://www.eve.ucdavis.edu/gmcoop/Software/Bayenv/Bayenv.html Multiple analyses? Candidate vs control? E.g. Shimada et al. 2010
Mäkinen et al 2008 • 7 populations • 3 marine, 4 freshwater • 103 STR loci • Analysed by BayesFst • (and LnRH) • 5 under directional selection • (3 in Eda locus) • 15 under balancing selection • Used as a test case by Excoffier et al • 2 directional • 3 balancing
Can we replicate these results? • Bayescan • Stickleback_allele.txt – input file • Output_fst.txt – view with R routine plot_Bayescan • Arlequin • Stickleback_data_standard.arp – IAM • Stickleback_data_repeat.arp – SMM • Run using Arlequin3.5 • Try hierarchical and island models, maybe different hierarchies
Sympatric speciation? • FST distribution as evidence of speciation with gene flow • Savolainen et al (2006) • Cf. Gavrilets and Vose (2007) • few loci underlying key traits • intermediate selection • initial environmental effect on phenology Howea - palms