280 likes | 291 Views
Genomic Evaluation with Many More Genotypes and Phenotypes. Topics. Methods to combine different marker densities and datasets More markers: 500,000 simulation More animals: 3,000 marker subset More breeds: multi-trait markers More traits: same genotype cost.
E N D
Topics • Methods to combine different marker densities and datasets • More markers: 500,000 simulation • More animals: 3,000 marker subset • More breeds: multi-trait markers • More traits: same genotype cost
Methods to Trace Inheritance • Few markers • Pedigree needed • Prob (paternal or maternal alleles inherited) computed within families • Many markers • Can find matching DNA segments without pedigree • Prob (haplotypes are identical) mostly near 0 or 1 if segments contain many markers
Haplotype Probabilities with Few Markers (12 SNP / chromosome)
Haplotype Probabilities with More Markers (50 SNP / chromosome)
Haplotyping Programfindhap.f90 • Begin with population haplotyping • Divide chromosomes into segments, ~250 SNP / segment • List haplotypes by genotype match • Similar to FastPhase, IMPUTE, or long range phasing • End with pedigree haplotyping • Detect crossover, fix noninheritance • Impute nongenotyped ancestors
Recent Program Revisions • Improved imputation and reliability • Changes since January 2010 • Use known haplotype if second is unknown • Use current instead of base frequency • Combine parent haplotypes if crossover is detected • Begin search with parent or grandparent haplotypes • Store 2 most popular progeny haplotypes • Simulated crossover rate increased
Coding of Alleles and Segments • Genotypes • 0 = BB, 1 = AB or BA, 2 = AA • 3 = B_, 4 = A_, 5 = __ (missing) • Allele frequency used for missing • Haplotypes • 0 = B, 1 = not known, 2 = A • Segment inheritance (example) • Son has haplotype numbers 5 and 8 • Sire has haplotype numbers 8 and 21 • Son got haplotype number 5 from dam
Most Frequent Haplotypes1st segment of chromosome 15 1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202 3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 022020222020222002022022202020000202220000200002020002002 5 3.66% 022222222020222022020200220000020222202000002020220002022 6 3.65% 022020022202200200022020220000220202200002200222200202222 7 3.51% 022002222020222022022020220200222002200000002022220002220 8 3.42% 022002222002220022022020220020200202202000202020020002020 9 3.24% 022222222020200000022020220020200202202000202020020002020 10 3.22% 022002222002220022002020002220000202200000202022020202220 For efficiency, store haplotypes just once. Most frequent haplotype in Holsteins had 4,316 copies = .0516 * 41,822 animals * 2 chromosomes each
Population Haplotyping Steps • Put first genotype into haplotype list • Check next genotype against list • Do any homozygous loci conflict? • If haplotype conflicts, continue search • If match, fill any unknown SNP with homozygote • 2nd haplotype = genotype minus 1st haplotype • Search for 2nd haplotype in rest of list • If no match in list, add to end of list • Sort list to put frequent haplotypes 1st
Check New Genotype Against List1st segment of chromosome 15 Search for 1st haplotype that matches genotype: 022112222011221022021110220010110212202000102020120002021 5.16% 022222222020020022002020200020000200202000022022222202220 4.37% 022020220202200020022022200002200200200000200222200002202 4.36% 022020022202200200022020220000220202200002200222200202220 3.67% 022020222020222002022022202020000202220000200002020002002 3.66% 022222222020222022020200220000020222202000002020220002022 Get 2nd haplotype by removing 1st from genotype: 022002222002220022022020220020200202202000202020020002020 3.65% 022020022202200200022020220000220202200002200222200202222 3.51% 022002222020222022022020220200222002200000002022220002220 3.42% 022002222002220022022020220020200202202000202020020002020 3.24% 022222222020200000022020220020200202202000202020020002020 3.22% 022002222002220022002020002220000202200000202022020202220
Simulated 500K Tests • How many 500K genotypes needed? • Is computation affordable? • Two subsets of mixed 500K and 50K: • Of 33,414 HO, only 1,406 (young) had 500K • Also bulls > 99% reliability, total 3,726 • Linkage generated in base population • Efficient and similar to autoregressive • Linkage affects gain from more markers
Measures of Haplotyping Success • Does estimated = true genotype? • Does estimated = true linkage for adjacent heterozygous markers? • Does estimated = true paternity? • How many alleles remain missing? • What is the error rate (Druet, 2010)? • What is corr2(estimated, true genotype)? • Are resulting GEBVs reliable?
Imputation Summary • 1,406 young animals genotyped at 500K • REL gain 0.8% vs. 1.4% with all 500K • Imputation better if ancestors also genotyped • Could genotype additional reference bulls instead of re-genotyping bulls already done • 32,008 animals imputed from 50K • 10% SNP known before, 93% after • 97-98% of 500K genotypes correct • .839 squared correlation (estimated, true genotype)
Multi-Breed Genomic Evaluation • Treat allele effects as independent, same, or correlated, using data of • 5,331 purebred Holsteins, • 1,361 purebred Jerseys, and • 506 purebred Brown Swiss
Protein Yield R2 Optimum correlation was .3 with 43K markers, and would be larger with more markers
Fewer Markers, More Animals • Half of young animals assigned 3K • Proven bulls, cows all had 43K • Dams imputed using 43K and 3K • Half of ALL animals assigned 3K • Could 3K reference animals help? • 10,000 proven bulls yet to genotype • Should cows with 3K be predictors?
Correlations2 of 3K and PA with 43KGenotyped ancestors had 43K • Consistent gains across traits • Reliability gain from progeny with 3K was 79-87% of gain from 43K • Gain % = [Corr(3K,43K)2 - Corr(PA,43K)2] / [1 - Corr(PA,43K)2] • Large benefits for smaller cost
Conclusions - 1 • Missing genotypes can be filled easily • Population and pedigree haplotyping can both process long segments efficiently • Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours • Haplotyping implemented for April 2010 routine U.S. evaluation • Several recent improvements to accuracy • Ready to include lower or higher density genotypes in evaluations
Conclusions - 2 • More markers improved reliability < 2% • 1,406 high density genotypes sufficient • 32,008 other animals imputed from 50K to 500K in simulation • Fewer markers can decrease cost • More animals can greatly increase reliability and selection differential • Multi-breed model improves reliability only slightly (< 1%) at current density
Acknowledgments • Katie Olson computed the multi-breed genomic evaluation • Mel Tooker assisted with graphics and computation • Bob Schnabel helped improve marker locations on the map