Detecting selection using genome scans

Detecting selection using genome scans Roger Butlin University of Sheffield

Nielsen R(2005) Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218. What signatures does selection leave in the genome? Population differentiation – today’s focus! Frequency spectrum, e.g. Tajima’s D Selective sweeps Haplotype structure (linkage disequilibrium) MacDonald-Kreitman tests (or PAML over long time-scales)

Frequency distribution: From Nielsen (2005): frequency of derived allele in a sample of 20 alleles. Tajima’s D = (π-S)/sd, summarises excess of rare variants

Selective sweep:

Extended haplotype homozygosity (Sabeti et al. 2002)

MacDonald-Kreitman and related tests dN = replacement changes per replacement site dS = silent changes per silent site dN/dS = 1 - neutral dN/dS < 1 - conserved (purifying selection) dN/dS > 1 - adaptive evolution (positive selection)

Selection on phenotypic traits: QTL Association analysis Candidate genes

Genome scans (aka ‘Outlier analysis’)

Littorinasaxatilis – locally adapted morphs What signatures of selection might we look for? ‘H’ ‘M’ Thornwick Bay

Signatures of selection: • Departure from HWE • Low diversity (selective sweep) • Frequency spectrum tests • High divergence • Elevated proportion of non-synonymous substitutions • LD

Neutral loci

Stabilizing selection

Local adaptation

Charlesworth et al. 1997 (from Nosil et al. 2009)

A concrete example: adaptation to altitude in Rana temporaria (Bonin et al. 2006) High – 2000m Intermediate – 1000m Low – 400m 190 individuals 392 AFLP bands

Dfdist– Beaumont & Nichols 1996 DetSel – Vitalis et al. 2001 to N t m N1 Ne Ne N N N0 N N N2 μ N N FST – symmetrical population differentiation, as a function of heterozygosity F1,2 – measure of divergence of population 1,2 from population 2,1 Generating the expected distribution Does the structure/history matter?

DetSel Dfdist 95% CI 95% 50% 5% ‘Low 1’ vs ‘High 1’

392 AFLPs, 12 pairwise comparisons across altitude or 3 altitude categories, 95% cut off

343 loci 8 loci

Outliers and selected traits • Rogers and Bernatchez (2007): • Dwarf x Normal cross  both backcrosses • Measure ‘adaptive’ traits (9) • QTL map (>400 AFLP plus microsatellites) • Homologous AFLP in 4 natural sympatric population pairs • Outlier analysis (forward simulation based on Winkle) Coregonusclupeaformis(lake whitefish) *Only 3 outliers shared between lakes

Roger Butlin - Genome scans

Nosil et al. 2009 review of 14 studies: • 0.5 – 26% outliers, most studies 5-10% • 1 - 5% outliers replicated in pair-wise comparisons • 25 - 100% of outliers specific to habitat comparisons • No consistent pattern for EST-associated loci • LD among outliers typically low • But many methodological differences between studies • Population sampling • Marker type • Analysis type and options • Statistical cut-offs

Environmental correlations • SAM – Joost et al. 2007 • IBA – Nosil et al. 2007 • FST for each locus correlated with ‘adaptive distance’, controlling for geographic distance (partial Mantel test)

Methodological improvements – Bayesian approaches • BayesFst – Beaumont & Balding 2004 • Bayescan – Foll & Gaggiotti 2008 For each locus i and population j we have an FST measure, relative to the ‘ancestral’ population, Fij Then decompose into locus and population components, Log(Fij/(1-Fij) = αi + βj αi is the locus-effect – 0 neutral, +ve divergence selection, -ve balancing selection βj is the population effect Assuming Dirichlet distribution of allele frequencies among subpopulations, can estimate αi + βj by MCMC In Bayescan, also explicitly test αi= 0 Ancestral

Apparently much greater power to detect balancing selection than FDIST Lower false positive rate Wider applicability

Methodological improvements – hierarchical structure • Arlequin – Excoffier et al. 2009

Circles – simulated STR data, grey – null distribution

Bayenv – Coop et al. 2010 • Estimates variance-covariance matrix of allele frequencies then tests for correlations with environmental variables (or categories). • Software available at: http://www.eve.ucdavis.edu/gmcoop/Software/Bayenv/Bayenv.html Multiple analyses? Candidate vs control? E.g. Shimada et al. 2010

Hohenlohe et al. 2010

Mäkinen et al 2008 • 7 populations • 3 marine, 4 freshwater • 103 STR loci • Analysed by BayesFst • (and LnRH) • 5 under directional selection • (3 in Eda locus) • 15 under balancing selection • Used as a test case by Excoffier et al • 2 directional • 3 balancing

Can we replicate these results? • Bayescan • Stickleback_allele.txt – input file • Output_fst.txt – view with R routine plot_Bayescan • Arlequin • Stickleback_data_standard.arp – IAM • Stickleback_data_repeat.arp – SMM • Run using Arlequin3.5 • Try hierarchical and island models, maybe different hierarchies

Sympatric speciation? • FST distribution as evidence of speciation with gene flow • Savolainen et al (2006) • Cf. Gavrilets and Vose (2007) • few loci underlying key traits • intermediate selection • initial environmental effect on phenology Howea - palms

Detecting selection using genome scans

Detecting selection using genome scans

Presentation Transcript

Detecting Adversaries Using Metafeatures

Whole genome QTL analysis using variable selection in complex linear mixed models

Gene discovery using combined signals from genome sequence and natural selection

CAT Scans

Bias Adjustment in Whole-Genome Scans

Detecting Inversions in Human Genome

GENOME SCANS

Detecting Bubbles Using Option Prices

GENOME SCANS

Scans

Sequential Multiple Decision Procedures (SMDP) for Genome Scans

Bias Adjustment in Whole-Genome Scans

Recombination and genome evolution – Recombination and selection

Detecting Polymorphisms in Mouse Genome

Using genome browsers

Whole genome scans to localise QTL

CT Scans :

Control of Population Stratification in Whole-Genome Scans

Simplified Sequential Multiple Decision Procedures (SSMDP) For Genome Scans

Recombination and genome evolution – Recombination and selection

PIXEL SCANS

Detecting Phishing Using Machine Learning