390 likes | 489 Views
Meiotic gene conversion in humans: rate, sex ratio, and GC bias. Amy L. Williams. June 19, 2013 University of Chicago. Gene conversion defined. Meiosis: produces haploid germ cells with recombinations Gene conversion: short segment copied into given chromosome from other homolog.
E N D
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013University of Chicago
Gene conversion defined • Meiosis: produces haploid germ cells with recombinations • Gene conversion: short segment copied into given chromosome from other homolog Two types of recombination: Meiosis Crossover GeneConversion
Study question 1: gene conversion rate? • Number of gene conversions per meiosis? • 4-15× # crossovers? Jeffreys and May (2004) • Length of gene conversion tracts? • 55-290 bp? Jeffreys and May (2004)
Study question 1: gene conversion rate? • Number of gene conversions per meiosis? • 4-15× # crossovers? Jeffreys and May (2004) • Length of gene conversion tracts? • 55-290 bp? Jeffreys and May (2004) • Per base-pair rate? Fraction of genome affected • R = (number × tract length) / genome length • 2.2×10-6 to 4.4×10-5? Jeffreys and May (2004)
Study question 2: male vs. female rate? • Gender differences in rate? • Crossovers: female rate 1.78× male (deCODE)
Study question 3 & 4: GC bias? Localization? • GC bias observed in allelic transmissions? • Crossover hot spots influence location? • Locations of gene conversions independent in a given meiosis? Myers et al., Science 2005
Summary: study questions • Genome-wide de novo gene conversion rate? • Different rate between males/females? • Extent of GC bias in tracts? • Localization: Hotspots? Tracts independent?
Outline • Background / study questions • Study design and methods • Results • SNP chip data • Sequence data
Approaches to identify gene conversions • Linkage disequilibrium based • Can give rate estimate • Averaged over human history, both genders • Sperm-based • Many meiotic products: per-individual estimates • Single molecule: genome-wide assays difficult • Pedigree-based • De novo, per-gender events observable • Data for many samples required
Study design: SNP chip data for pedigrees • Primary analysis: pedigree SNPchip data • Challenge: small tracts • Tracts covered by ≤ 1 SNP • Not all tracts covered, but stillobtain overall rate • Chip data give per base-pair rate • R = # gene conversions / # informative sites
Datasets for analysis • Mexican American pedigrees • Data source 1: San Antonio Family Studies • 2,490 genotyped samples, 80 pedigrees • SNP chip genotypes (Illumina 1M, 660k) • Can estimate de novo gene conversion rate
Datasets for analysis • Mexican American pedigrees • Data source 1: San Antonio Family Studies • 2,490 genotyped samples, 80 pedigrees • SNP chip genotypes (Illumina 1M, 660k) • Can estimate de novo gene conversion rate • Data source 2: T2D-GENES Consortium • 607 sequenced samples, 20 pedigrees • Whole genome sequence (Complete Genomics) • Can examine tract length, distribution, etc. • Though need deep data on single family to do so
Study design: SNP chip data for pedigrees • Pedigree-based haplotypes/phasereveal recombinations • Heterozygous sites: informative for recombination • Phasing method: Hapi • Phases nuclear families • Williams et al., Genome Biol. 2010
Family-based phase reveals recombinations • Hapi output: paternal haplotype transmissions Crossover: Haplotype 1 Haplotype 2
Family-based phase reveals recombinations • Hapi output: paternal haplotype transmissions Crossover: Gene Conversion: Haplotype 1 Haplotype 2
Other pedigree phasing methods • Most pedigree phasing methods slow • Runtime complexity for phasing ~O(m 22n) • n = # non-founders • m = # markers • Example: nuclear family with 11 children • 4,194,304 states per marker • Can merge exponential class of states • Many states extremely unlikely to be optimal
Hapi: efficient phasing of nuclear families • Hapi: state space reduction improves efficiency • Merges exponential class of states • Omits states that cannot yield optimal solution • Applied to family with 11 children • Average per marker states: 4.2, maximum 48
Hapi: efficient phasing of nuclear families • Hapi: state space reduction improves efficiency • Merges exponential class of states • Omits states that cannot yield optimal solution • Applied to family with 11 children • Average per marker states: 4.2, maximum 48 * Superlink failed to analyze 11 child family; 8/11 children used
Hapi: efficient phasing of nuclear families • Hapi: state space reduction improves efficiency • Merges exponential class of states • Omits states that cannot yield optimal solution • Applied to family with 11 children • Average per marker states: 4.2, maximum 48 * Superlink failed to analyze 11 child family; 8/11 children used
Applying Hapi to multi-generational pedigrees • Hapi currently applies to nuclear families • For 3-generation pedigrees analyzed for gene conversions, omit sites with phase conflicts • Will not bias results, but data are reduced
Applying Hapi to multi-generational pedigrees • Hapi currently applies to nuclear families • For 3-generation pedigrees analyzed for gene conversions, omit sites with phase conflicts • Will not bias results, but data are reduced • Extension to Hapi possible to efficiently analyzearbitrarily large pedigrees • Most San Antonio Family Studies pedigrees too large to be phased in practical time
Approach to identifying gene conversions • Perform QC, phase 3-generation pedigrees • Find gene conversions in 2ndgeneration:single SNPdouble crossovers • Confirm: • Gene converted allele in 3rdgeneration • Other allele in 2nd generation sibling(s) • False positive only if ≥ 2 genotyping errors
Outline • Background / study questions • Study design and methods • Results • SNP chip data • Sequence data
Current analysis dataset • Analyzed SNP chip data for 16 pedigrees • Data for both parents, 3+ children, 1+ grandchild • 190 samples • 42 meioses (21 paternal, 21 maternal) • 4.15×106 informative sites
Result 1: 33 putative gene conversions, rate • Rate:7.95×10-6/bp/generation • Within range of Jeffreysand May (2004) • Close to LD-based estimates Female Male
Result 1: 33 putative gene conversions, rate • Rate:7.95×10-6/bp/generation • Within range of Jeffreysand May (2004) • Close to LD-based estimates Female Are these real gene conversions? Male
T2D-GENES sequence confirms events • 19 sites sequenced by T2D-GENES Consortium • 18/19 gene conversion genotypes verified • Differing site looks like sequencing artifact • 2nd generation recipient has genotype mismatch3rd generation grandchild shows same genotype • If sequence data correct,gene conversion ingrandchild
Result 2: gene conversion rates by gender • More female gene conversions than male • Females transmit 1.54× males • Difference (yet) not significant – larger sample coming • Different rates expected based on crossovers • Female crossover rate 1.78× male (deCODE)
Result 3: gene conversions localize in hotspots 2.71% of genome in ≥10 cM/Mb hotspots
Result 3: gene conversions localize in hotspots 2.71% of genome in ≥10 cM/Mb hotspots 10/33 gene conversions with ≥10 cM/Mb: P=1.1×10-8
Result 4: observe extreme GC bias • 31 GC informative sites • A/C, A/GT/C, T/G • GC transmission in 74% of cases(95% CI 59% – 90%) • GC bias likely (P=5.3×10-3)
Outline • Background / study questions • Study design and methods • Results • SNP chip data • Sequence data
Sequencenearchip-identifiedgeneconversions • Sequence available for 11/33 putative sites
Sequencenearchip-identifiedgeneconversions • Sequence available for 11/33 putative sites • Shortest resolution for tract length ≤ 143 bp
Sequencenearchip-identifiedgeneconversions • Sequence available for 11/33 putative sites • Clustered gene conversions in 4 sequences
Sequencenearchip-identifiedgeneconversions • Sequence available for 11/33 putative sites • Clustered gene conversions in 4 sequences Boxed regions confirmed by Sanger sequencing
Relationship to complex crossover? Haplotype 1 Haplotype 2
Conclusions • Estimate of de novo gene conversion rate • 7.95×10-6/bp/generation • Females: 1.54× gene conversions vs. males • Enriched in hotspots: similar mechanism to crossover • GC vs AT allele transmitted ~3:1 – GC bias • Complex/clustered gene conversions observed in sequence data • Suggests unique correlation within short region
Acknowledgements The T2D-GENES Consortium (NIDDK)San Antonio Family Studies (NIDDK, NIMH) NHGRI NRSA Fellowship Tom Dyer Giulio Genovese Kati Truax Nick Patterson John Blangero David Reich