Population Genetics and Statistical Inferences on Wright's Island Model

DAY 3 Lecture 5

V. Inferences

1. Selfing AA Aa aa Dt Ht Rt At equilibrium, Ht=Ht+1=Heq Wright's generalized formula

2. Dispersal following a Wright's Island model with many Islands and many alleles Wright's Island model, n large, m and u small, K large: QT~0 local panmixia: QI=QS

Wright's Island model, n large, m and u small, K large: QT~0 local panmixia: QI=QS Evolution of QS, probability of drawing twice the same allelefrom one sub-population, between generations t and t+1 Because they already were at t or Because they became so at t+1 Both alleles are autochtonous and non-mutants and identical At migration/mutation/drift equilibrium

Wright's Island model, n large, m and u small, K large: QT~0 local panmixia: QI=QS; at migration/mutation/drift equilibrium N>0

Wright's Island model, n large, m and u small, K large: QT~0 local panmixia: QI=QS At migration/mutation/drift equilibrium Terms in m and u can be ignored in front of 1 as can terms in mu in front of m

Inference of migration Wright's Island model, n latge, m and u small, K latge: QT~0 local panmixia: QI=QS; FST=QS At migration/mutation/drift equilibrium if u<<m FST_max if m=0 FST_max ≈QS=1-HS Hedrick Or Meirmans AMOVA FST’ =FST/FST_max

3. Dispersal in a finite Island model (n small), with homoplasy (number of possible alleles K issmall) and local selfing (s) Impact of homoplasy Infinite island model with s=0.5, N=20 and m=0.05 Microsatellites Microsatellites

1 D 2 D 3 D 4. Dispersal in other models of populations, Stepping stone and neighborhood

1 D 2 D 4. Dispersal in other models of populations, Stepping stone and neighborhood QT QS QS QS QT QS QS QS QS QS QS Rousset

1 D Stepping-stone and neighborhood Rousset Slopeb Neighborhood=1/b De: Effective density of individuals in the neighborhood (/m ou /m²) σ: distance between reproducing adult and their parents

2 D Stepping-stone and neighborhood Rousset Neighborhood=1/b Slope b De: Effective density of individuals in the neighborhood (/m ou /m²) σ: distance between reproducing adult and their parents

5. Effective population size estimators Genetic differentiation between timely spaced subsamples Ne:Waples In space and time Neand m:Wang & Whitlock Linkage disequilibrium Ne: Bartley et al., Waples & Do Heterozygote excess (dioecious or self-incompatibles) Ne: Balloux Intra and inter loci disequilibrium for spatial data Neand m: Vitalis & Couvet Many others

FST FIT FIS F IS l 6. Unbiased estimators for Wright's F-statistics Sub-sample size Ns=1 Estimators Reminder: Variance: s² = [1/n].Si[(xi-x)²] ; s² = [1/(n-1)].Si[(xi-x)²] Weir and Cockerham's unbiased estimators f and θ

6. Unbiased estimators for Wright's F-statistics For K alleles labelled from A=1 to K Robertson & Hill biased lower variance (a better « statistic ») Weir & Cockerham unbiased large variance FIS FST FIT

>>0 ~0 ~0 ~0 7. F-statistics for more than three hierarchical levels Yang

8. F-statistics in clones Heterozygous individuals only => QI=0 if n large and m small QT~0 if m~0 if n=2 and m small

C=1, Nm not small C=1, Nm small FST~0.5 Fst<<0.5 Fis Fis Fis Fis 0 0 0 0 -1 -1 -1 -1 Loci Loci Loci Loci C=[0.999-0.99], Nm small C=[0.99-0.95], Nm not small Fst>>0.5 Population genetics of clonal or partially clonal diploids Leishmania? Phylloxera Trypanosoma Candida albicans

VI. Statistical procedures

1. Definitions What we are looking for is with what probability, called P-value, our data can be explained by chance when these data follow the null hypothesis H0. The test must be formulated a priori in the following forms: -bilateral: in this case the alternative hypothesis H1 is that the observed values are too extreme (in any direction) to be explained by chance under H0; -unilateral "greater": in this case H1 is that the observed values are greater than what can be expected by chance under H0; -unilateral "less": in that case H1 is that the observed values are smaller than what can be expected by chance under H0. By convention the limit P-value=0.05 was arbitrarily chosen for defining the threshold under which the test is considered significant (H0 rejected). But, depending on circumstances one can choose being more or less permissive. The final statistical decision is up to the manipulator/user. Type 1 error, α: probability to mistakenly reject H0 (P-value); Type 2 error, β: probability to mistakenly accept H0. Powerful tests easily reject H0; Robust tests do not reject H0 more often than necessary.

2. F-statistics confidence intervals (CI) Bootstrap (e.g. over loci): items are randomly resampled k times (e.g. 5000) with replacement (hence, the same item (e.g. locus) can be resampled several times) and the F is estimated, at each randomization, over the randomized set of items. Observed value

2. F-statistics confidence intervals (CI) Jackknife (e.g. over sub-samples): one item is withdrawn one at a time (e.g. one sub-sample) and Fis computed over the remaining items. As many values as there are items are obtained, over which a mean and a variance are computed for F that subsequently is used to compute a standard error of F. Under the assumption of normality a CI can be estimated as F±StdErr(F)tα,γ, where t can be found in a table of t, or computed under Excel, where α corresponds to the chosen critical level (0.05 for 95% CI, 0.01 for 99% CI) and where γ is the degree of freedom (i.e. number of items-1)

Statistical procedures: Jackknife 95% CI Table of t FIS=0.2 10 loci StdErr(FIS)=0.01 The 95% CI will be 0.2-2.2620.01 and 0.2+2.2620.01 hence 95% CI=[0.177, 0.223]

3. Significance testing through randomization Randomization tests: Simulation of H0 a large number of times; the P-value of the test = the proportion of simulated values that happened to be as or more extreme than the value observed in the real sample Understanding precisely what is behind H0 et H1 is key: what is it we want to test exactly? Number of randomizations: 10000 if permutations, at least 1 000 000 if Markov chains

Fis Statistical procedures Testing the significance of F 's through randomization Significance of FIS= testing local panmixia Testing FIS> 0 P-value P1 or < 0 P-value P2 or ≠ 0 P-value P3 Weir and Cockerham P3=min(P1,P2)+[1-max(P1,P2)] FIS≠ 0 (bilateral) Other estimators (e.g. Robertson & Hill) can be used as statistics Haldane exact test (no global testing over sub-samples an loci)

FST Statistical procedures Testing the significance of F 's through randomization Testing if FST> 0

Statistical procedures Testing if allelic distribution is random across sub-samples through the G-statistic H0: observed G is not greater than the one generated with randomizations of individuals across sub-samples The G statistic: natural logarithm of the maximum likelihoodratio of allele frequencies in the different sub-samples contingency table (one per locus). The additive properties of G allows global testing over loci

Statistical procedures Testing the significance of correlation between two distance matrices as for an isolation by distance testing The different values are auto-correlated Mantel test: the values of one of the matrices are randomized and the correlation computed afgain for each randomized matrix. The P-value=proportion of times the randomized correlations were as large or larger than the observed one A rather conservative test (very robust)

11 12 13 14 22 23 24 33 34 44 11 n n n n … … … … … … 11/11 11/12 11/13 11/14 12 n n n n … … … … … … 12/11 12/12 12/13 12/14 13 n n n n … … … … … … 13/11 13/12 13/13 13/14 14 n n n n … … … … … … 14/11 14/12 14/13 14/14 15 n n n n … … … … … … 15/11 15/12 15/13 15/14 22 … … … … etc… … … … … … 23 … … … … … … … … … … 24 … … … … … … … … … … 25 … … … … … … … … … … 33 … … … … … … … … … … 34 … … … … … … … … … … 35 … … … … … … … … … … 44 … … … … … … … … … … 45 … … … … … … … … … … 55 … … … … … … … … … … Statistical procedures Linkage disequilibrium Locus_ 2 Locus_ 1 Multi-locus measures

Statistical procedures Linkage disequilibrium The genotypes (most of the time haplotypes (also called the phase) are missing) at two (or all) loci are randomly associated a great number of times and a statistic measured at each randomization. The P-value of the test corresponds to the proportion of randomized values that are as large or larger than the value observed for the real data. Tests between pairs of loci: G allows an overall sub-samples testing for each pair of loci =>as many P-values as pairs of loci: L(L-1)/2 Multilocus testing: rD for instance allows an overall loci testing for each sub-sample =>as many P-values as sub-samples In any case, something has to be done to take into account this repetition of tests

Statistical procedures F-statistics for more than three hierarchical levels

Statistical procedures Comparison among groups fields woods S=FIS, FST, AIc, Ho, Hsetc… SObs=(SObs1-SObs2)²

Statistical procedures Comparison among categories of individuals S=FIS, FST, AIc, Ho, Hsetc… Randomization of the status keeping constant the local ratio value SObs=(SObs1-SObs2)²

4. Nested and orthogonal factors Geographic differentiation Differentiation between genders FST_2; P-value_2 FST_1; P-value_1 Combining the kP-values of a k test series

5. Procedures for combining k tests series P1, P2, P3, …Pk Is the k tests series significant overall? Which tests are significant? Sequential Bonferroni Tests are independent Stouffer's Z procedure if k<4 Pmink Pmin-1(k-1) etc.. Those adjusted P-values that stay significant correspond to actual significant tests. A very conservative test to be used only on the most powerful individual tests (largest and most polymorphic sub-samples) Zi=NORM.S.INV (Pi) P-value=NORM.S.DIST (Z) Generalized binomial procedure if k≥4 Is at least one test is significant in the series? Tests are not independent (LD between pairs of loci) Exact binomial test Fisher's procedure

PCA of tick populations Mouette PC1 (48%inertia) P < 0.001 PC2 (21%inertia) P < 0.001 Guillemot Macareux Assignment tests Atlantic puffin – 95% Black-legged kittiwake – 82% Common guillemot – 89% 6. Multivariate analyses FCA PCA Black-legged kittiwake Common guillemot Atlantic puffin

7. Exploration of a hidden structure FCA Bayesian methods for inferring the structure of populations Structure BAPS Flock many others…

Population Genetics and Statistical Inferences on Wright's Island Model

Population Genetics and Statistical Inferences on Wright's Island Model

Presentation Transcript

ETT 229 Lecture Day 3

Day 3, 4 and 5

Theme 5 Lesson 23 Day 3

Unit 3 Day 5

Monday, 3/5 Day 131

DAY 2 Lecture 3

Unit 2 week 3 Day 5

Lecture 5. Verilog HDL #3

Unit 5-Week 3 Day 1

Day 3 (5 March 2014)

Chapter 3- Matrices LECTURE 5

Lecture 5. Sequential Logic 3

Lecture 5. Verilog HDL 3

Lesson 3, Day 5

Week 3 lecture 5

Lecture 9 5/3/12

SJTU CMGPD 2012 Methodological Lecture Day 3

Phys_151 (Sections 1-5) Lecture 3

Module 3, Lecture 5

Tuesday, 3/5/14 Day 2

TPEP Year 3 Day 5

3-5 Simple Probability Day 1