270 likes | 543 Views
Genealogies of time structured data, an application on cave bear ancient DNA. UMR 7625 Laboratoire d’écologie Paris 6/ENS. Frantz Depaulis. UMR 5534 Centre de Génétique Moléculaire et Cellulaire Université Claude Bernard, Lyon I. Ludovic Orlando Catherine Hannï.
E N D
Genealogies of time structured data, an application on cave bear ancient DNA UMR 7625 Laboratoire d’écologie Paris 6/ENS Frantz Depaulis UMR 5534 Centre de Génétique Moléculaire et Cellulaire Université Claude Bernard, Lyon I Ludovic Orlando Catherine Hannï
Outline of the presentation • Introduction: Gene genealogies • Results • .1 Simulation exploratory results • .2 Cave bear application • Conclusions
-Coalescence- Wright Fisher Neutral model Assumptions • Selective neutrality (Ne s <<1) • Demography - Isolated panmictic Population, - Constant size N - Poisson Distribution of offspring P (1) - Same sampling time • Mutational, sequence data: infinite site model (ISM) - No recombination - Independent mutations - Constant mutation rate µ Along the sequence Across time - Each mutation affects a new nucleotide site
-Coalescence- Genealogy of a gene sample Most recent common ancestor (MRCA) coalescence= common ancestor ancestral lineage gene sample
-Coalescence- Coalescent Most recent common ancestor of the sample (MRCA) A G Common ancestor (CA) T C C neutral mutations G A C c d e a b f sample of “genes” / of individuals
Exp( p ) t5: p=1/2N t4 t3 t2 t1 1°) Ages of the nodes a b c d e f -Coalescence- Constructing coalescents, additional assumption: n << N p = (n (n -1)/2)/2N
MRCA A common ancestor (CA) G T C neutral mutations C G A C T T A A A C C A G G C -Coalescence- 2°) Topology of the tree Constructing-deconstructing coalescents t5: t4 t3 t2 t1 100 000 times gene sample a b c d e f neutral distribution of sequence polymorphism 3°) uniform distribution of mutations
-Coalescence- Haplotype tests: simulations T parameters‡ : S =8 n =6 T A ... A G G A G C T A C C A G T C C 10 000 T G A C C Distribution of simulated H C simulations T T C density G T C C C C C T T T T A T G C C G G G A A A A A A G A A A C C C G C haplotype number K{ K = 5 K = 6 K = 4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 { 2 H= 1- S H f haplotype diversity H = 0.72 H = 0.78 H = 0.83 i observed H : P = 0.03 * Depaulis and Veuille MBE 1998 ‡ Hudson 1993
C T → GCCCGCGAATCCATT GCGTGCGATCCGATT GCGTACAATCCCGTC GTGTACAATCTCGAC GTGTACAATCTCGAC GCGTGGAATCCCGTT CCGCGCGGTCCCATT -Coalescence- Alignment of polymorphic sites: frequencies of mutations S =15 T C n =7 C GCGCGCGAACCCATT outgroup 121531416121423 frequencies
-Coalescence- Frequency spectrum of mutations & neutrality tests Number of polymorphic sites q=4Ne m fi : number of occurrences in a sample H=qp-qH =0 (Tajima Genetics 1989) (Fu and Li Genetics 1993) (Fay and Wu Genetics 2000)
Mitochondria, correlation LD/distance recombination or mutational effects? r 2 = ↘(d ) Pearson’s statistic tested by permutations of sites distance d Awadalla et al. (Science 1999)
-Coalescence- Time structured data & genealogies - Parasites during disease evolution (virus…) - Microbial experimental evolution - Ancient DNA • Issue: • To what extent the analyses are affected by time structure? • How to correct for this?
- Simulations- Algorithm for time structured coalescent n =2 n =3 n =4 n =2 n =5 d e f n1=3 n =3 t 1 a b c The exponential law is memoryless !
- Simulations- Age structure effect on gene genealogies n1=4 Two subsets with large time spacing Contemporaneous sample t 1 Limited time structure Excess of rare variants Deficit of LD Deficit of rare variants Excess of LD Differentiation
- Simulations- Effect of subset size on statistical tests : mean t1 =0.2 Ne generations n1 Dt D*fl Hfw ZnS K H Pearson Fst pi/pi0 S/S0 1.2 Mean 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 n1/n -0.2 -0.4 -0.6 n =40, S =20 Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to q); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.
- Simulations- Dt_inf D*fl_inf Hfw_inf ZnS_inf K_sup H_sup Fst 0.15 significance rate 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n1/n Effect of subset size on statistical tests : significance rate t1 =0.2 Ne generations n1 n =40, S =20 Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations
- Simulations- Dt D*fl Hfw K H ZnS Pearson Fst Pi/Theta0 S/S0 3 Mean 2.5 2 1.5 1 0.5 0 -0.5 -1 0.001 0.01 0.1 1 10 t1 in 2 generations Ne Effect of a half subset age on statistical tests: mean n1=n/2 t 1 Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to q); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.
- Simulations- Dt_inf Dt_sup D*fl_inf D*fl_sup ZnS_inf K_sup H_sup Fst 0.35 Significance rate 0.3 0.25 0.2 0.15 0.1 0.05 0 t1 0.001 0.01 0.1 1 10 in 2 generations Ne Effect of a half subset age on statistical tests: significance rates n1=n/2 t 1 Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations
- Application- Cave bear: Ursus spelaeus(12-300kYA)
- Application- Sampling sites
- Application- Alignment of polymorphic sites: D-loop of cave bear REF TTGTCAACTT TCGAATTGAA GT#NOASC3500_40-45 ..A....T.C ..A....... ..#NOASC3800_40-45 ..A....T.C ..A....... ..#NOASC85F16_40-45 .......... .......... ..#NOASC95456_40-45 ..A....T.C ..A....... ..#NOASC92386_40-45 ..A....T.C ..A....... ..#NOASC92413_40-45 C.A....T.C ..A....... ..#NOASC92152_40-45 C.A....T.C ..A....... A.#NOASC5300_50-60 ..A....T.C ..A....... ..#NOASC11600_80 .......... .......... ..#NOASC12500_80 .......... .......... ..#NOASC13800_80 .......... .......... ..#NOASC100801_80 .......... .......... ..#NOASC12400_80 ..A....T.C ..A....... ..#NOASC11800_80 .CA....T.C ..A.G..... ..#NOASC11700_80 C.A....T.C ..A....... A.#NOASC84E16_90-130 C.A....T.C ..A....... ..#NOASC84G19_90-130 C.A....T.C ..A....... ..#NOASCbrC5-02_90-130 C.A....T.C ..A....... ..#NOASC15400_90-130 C.A....T.C ..A......G ..#NOASC15700_90-130 ....T.G.C. .TA..C..G. ..#NOATAB2_40 .......... .......... ..#NOAGrotteMerve_? .......... .T........ ..#NOAAZE_80-130 .......... .......... .C#NOAGigny189F3_? ..A....T.C ..A....... ..#NOAJAL104_? C.A....T.C ..A....... ..#NOATAB15_25-35 ..A......C ..A....... ..#NOAGailenreuth_? ..A......C ..A....... ..#NOA47910_30 ..A....T.C ..A....A.. ..#NOAHohleFels_? ..A....T.C ..A..C.... ..#NOACLA_35 ..A....T.C C.A....... ..#NOACLB_35 ..A....T.C C.A....... ..#NOAChiemsee_35 ..A..G.... ..A...C... ..#NOARamesch1_? ..A..G.... ..A...C... ..#NOARamesch2_? ..A..G.... ..A...C... ..#NOAGeissenklt1_? ...CT..... .T.G.C.... ..#NOAGeissenklt2_? ...CT..... .T.G.C.... ..#NOANixloch_? ...CT..... .T...C.... .. --------------------------------------------- Alp barrier #SOAPoto_? ...CT..... .T...C.... ..#SOAVind1_? ...CT..... .T...C.... ..#SOAVind2_? ...CT..... .T...C.... ..#SOAConturi_? .......T.. .......... .. n =41 S =22 Ne= 13 000 (Loreille et al. 2001) (Orlando et al. 2002) (Hofreiter et al. 2002) (Kühn et al. 2001)
- Application- Neutrality tests, Belgium cave * Statistic D D H K H Z Pearson t fl fw nS a Scladina Observed - 0.82 - 1.55 - 1.32 7 0.79 0.24 - 0.39 (2.8*) P No time (21.0) (5.3) (18.4) (16.4) (37.7) (43.7) (2.8*) ( value %) n =20 structure Mean 0.06 - 0.05 0.30 8.3 0.79 0.26 0.00 S =15 CI [ - 1.42;1.51] [ - 1 .89;1.18] [ - 4.46;2.62] [5;11] [0.64;0.88] [0.10;0.55] [ - 0.25;0.20] % rejected (4.9;5.5) (5.2;2.8) (5.4;4.8) (1.7;3.9) (4.9;4.6) (5.5;5.1) (5.0;/) Average P (30.0) (8.8) (17.2) (8.6) (31.2) (31.7) (2.7*) ( value %) time Mean - 0.30 - 0.38 0.3 9 9.1 0.80 0.22 0.00 structure CI [ - 1.56;1.26] [ - 1.89;0.84] [ - 4.04;2.56] [6;12] [0.66;0.89] [0.08;0.47] [ - 0.29;0.23] % rejected (7.8;3.0) (8.2;1.0) (4.2;3 . 7) (0.8;9.5) (3.3;7.8) (11.5;2.9) (4.9;/) P (30.0) (8.6) (17.4 ) (7.9) (30.9) (31.9) (2.8*) ( value %) Uncertainty Mean - 0.33 - 0.42 0.37 9.1 0.80 0.22 0.00 in time CI [ - 1.59;1.18] [ - 1.89;0.84] [ - 4.20;2.54] [6;12] [0.66;0.89] [0.08;0.48] [ - 0.29;0.24] structure (4.8;/) % rejected (9.3;2.8) (9.3;0.8) (4.5;3.6) (0.7;9.8) (3.7;7.5) (11.6;2.8) a permutation test
- Application- Neutrality tests, dated subsample * Statistic D D H K H Z Pearson t fl fw nS a all dated Observed - 1.21 - 2.28 - 0.69 12 0.86 0.14 - 0.27 (11.4) n No time P (10.5) (0.6**) (25.7) (16.5) (32.1) (24.3) (11.5) =27, ( value %) structure S Mean - 0.09 - 0.08 0.29 10.3 0.82 0.23 0.00 =20 CI [ - 1.49; 1.50] [ - 1.98;1.32] [ - 5.66;3.18] [7;14] [0.69;0.90] [0.09;0.48] [ - 0.19;0.16] % rejected (5.0;5.2) (3.6;1.4) (5.3;4.7) (4.0;2.8) (5.3;4.7) (5.7;5.0) (4.7;/) Average P (17.7) (1.7*) (24.3) (38.2) (42.6) (41.8) (11.2) ( value %) time structure Mean - 0.4 2 - 0.59 0.35 11.8 0.84 0.18 0.00 CI [ - 1.69;1.11] [ - 2.28;0.72] [ - 5.34;2.98] [8;15] [0.71;0.91] [0.07;0.39] [ - 0.23;0.20] % rejected (9.3;2.1) (6.9;0.3) (4.7;2.6) (1.2;11.1) (3.4;9.5) (13.7;2.4) (4.9;/) Uncertainty P (18.5) (1.9*) (23.4) (39.9) (43.2) (41.1) (11.9) ( value %) in time Mean - 0.44 - 0.61 0.37 11.8 0.84 0.18 0.00 structure CI [ - 1.70;1.09] [ - 2.28;0.72] [ - 5.23;2.99] [8;16] [0.71;0.91] [0.07;0.40] [ - 0.24;0.19] % rejected (9.3;2.4) (7.0;0.2) (4.6;2.7) (1.2;11.7) (3.5;9.7) (14.1;2.5) (5. 4;/) a permutation test
- Application- Neutrality tests, total sample * Statistic D D H K H Z Pearson F t fl fw nS st a a n Observed - 0.45 - 0.88 1.35 17 0.91 0.10 - 0.09 (22.0) 0.32 (0.4**) =41, No time P (37.1) (14.7) (47.1) (1.7*) (3.7*) (18.1) (21.5) (0.4**) ( value %) S =22 structure Mean - 0.09 - 0.09 0.30 12.3 0.83 0.19 0.0 0 - 0.03 CI [ - 1.44;1.52] [ - 1.85;1.38] [ - 5.84;3.15] [8;16] [0.70;0.90] [0.07;0.41] [ - 0.20;0.17] [ - 0.38;0.27] % rejected (4.5;5.3) (4.1;1.1) (4.8;4.7) (3.0;4.3) (4.8;4.9) (5.5;4.6) (4.8;/) (/;4.6) Average P (45.5) (35.6) (45.6 ) (7.8) (5.5) (36.6) (21.8) (1.3*) ( value %) time Mean - 0.45 - 0.74 0.32 13.9 0.84 0.15 0.00 - 0.01 structure CI [ - 1.71;1.10] [ - 2.49;0.73] [ - 5.38;2.93] [9;18] [0.71;0.91] [0.05;0.34] [ - 0.23;0.20] [ - 0.40;0.38] % rejected (10.2;2.2) (10.7;0.1) (4.2;2.4) (0.8;16.1) (4.3;7.9) ( 15.2;2.2) (4.9;/) (/;8.9) Uncertainty P (42.1) (40.7) (44.9) (10.3) (6.2) (39.2) (21.8) (1.7*) ( value %) in time Mean - 0.54 - 0.90 0.26 14.3 0.84 0.14 0.00 - 0.01 structure CI [ - 1.76;0.96] [ - 2.81;0.73] [ - 5.70;2.90] [10;18] [0.71;0.91] [0.05;0.32] [ - 0 .24;0.21] [ - 0.40;0.41] % rejected (12.2;1.4) (14.2;0.1) (4.5;2.3) (0.5;19.8) (4.0;7.9) (16.7;2.1) (4.7;/) (/;9.7) a permutation test
1 2 R = 0.4174 2 r 0.1 0.01 0 10 20 30 40 50 60 70 distance (nt) - Application- LD as a function of distance
Time structure , Conclusion • Can substantially bias the results • Even if within 10% of the age of the MRCA bottom of the tree with more branches non random subset of mutations (rare ones) • small: long external branches, excess of rare variants (negative D, deficit of LD) • great: a long internal branch apparent differentiation excess of intermediate frequency variants (positive D, excess of LD) if equilibrated
Acknowledgements • CNRS • Nick Barton