1 / 28

Genomic Evaluation with Many More Genotypes and Phenotypes

Genomic Evaluation with Many More Genotypes and Phenotypes. Topics. Methods to combine different marker densities and datasets More markers: 500,000 simulation More animals: 3,000 marker subset More breeds: multi-trait markers More traits: same genotype cost.

weymouth
Download Presentation

Genomic Evaluation with Many More Genotypes and Phenotypes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomic Evaluation with Many More Genotypes and Phenotypes

  2. Topics • Methods to combine different marker densities and datasets • More markers: 500,000 simulation • More animals: 3,000 marker subset • More breeds: multi-trait markers • More traits: same genotype cost

  3. Methods to Trace Inheritance • Few markers • Pedigree needed • Prob (paternal or maternal alleles inherited) computed within families • Many markers • Can find matching DNA segments without pedigree • Prob (haplotypes are identical) mostly near 0 or 1 if segments contain many markers

  4. Haplotype Probabilities with Few Markers (12 SNP / chromosome)

  5. Haplotype Probabilities with More Markers (50 SNP / chromosome)

  6. Haplotyping Programfindhap.f90 • Begin with population haplotyping • Divide chromosomes into segments, ~250 SNP / segment • List haplotypes by genotype match • Similar to FastPhase, IMPUTE, or long range phasing • End with pedigree haplotyping • Detect crossover, fix noninheritance • Impute nongenotyped ancestors

  7. Recent Program Revisions • Improved imputation and reliability • Changes since January 2010 • Use known haplotype if second is unknown • Use current instead of base frequency • Combine parent haplotypes if crossover is detected • Begin search with parent or grandparent haplotypes • Store 2 most popular progeny haplotypes • Simulated crossover rate increased

  8. Coding of Alleles and Segments • Genotypes • 0 = BB, 1 = AB or BA, 2 = AA • 3 = B_, 4 = A_, 5 = __ (missing) • Allele frequency used for missing • Haplotypes • 0 = B, 1 = not known, 2 = A • Segment inheritance (example) • Son has haplotype numbers 5 and 8 • Sire has haplotype numbers 8 and 21 • Son got haplotype number 5 from dam

  9. Most Frequent Haplotypes1st segment of chromosome 15 1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202 3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 022020222020222002022022202020000202220000200002020002002 5 3.66% 022222222020222022020200220000020222202000002020220002022 6 3.65% 022020022202200200022020220000220202200002200222200202222 7 3.51% 022002222020222022022020220200222002200000002022220002220 8 3.42% 022002222002220022022020220020200202202000202020020002020 9 3.24% 022222222020200000022020220020200202202000202020020002020 10 3.22% 022002222002220022002020002220000202200000202022020202220 For efficiency, store haplotypes just once. Most frequent haplotype in Holsteins had 4,316 copies = .0516 * 41,822 animals * 2 chromosomes each

  10. Population Haplotyping Steps • Put first genotype into haplotype list • Check next genotype against list • Do any homozygous loci conflict? • If haplotype conflicts, continue search • If match, fill any unknown SNP with homozygote • 2nd haplotype = genotype minus 1st haplotype • Search for 2nd haplotype in rest of list • If no match in list, add to end of list • Sort list to put frequent haplotypes 1st

  11. Check New Genotype Against List1st segment of chromosome 15 Search for 1st haplotype that matches genotype: 022112222011221022021110220010110212202000102020120002021 5.16% 022222222020020022002020200020000200202000022022222202220 4.37% 022020220202200020022022200002200200200000200222200002202 4.36% 022020022202200200022020220000220202200002200222200202220 3.67% 022020222020222002022022202020000202220000200002020002002 3.66% 022222222020222022020200220000020222202000002020220002022 Get 2nd haplotype by removing 1st from genotype: 022002222002220022022020220020200202202000202020020002020 3.65% 022020022202200200022020220000220202200002200222200202222 3.51% 022002222020222022022020220200222002200000002022220002220 3.42% 022002222002220022022020220020200202202000202020020002020 3.24% 022222222020200000022020220020200202202000202020020002020 3.22% 022002222002220022002020002220000202200000202022020202220

  12. Simulated 500K Tests • How many 500K genotypes needed? • Is computation affordable? • Two subsets of mixed 500K and 50K: • Of 33,414 HO, only 1,406 (young) had 500K • Also bulls > 99% reliability, total 3,726 • Linkage generated in base population • Efficient and similar to autoregressive • Linkage affects gain from more markers

  13. Holstein Linkage Disequilibrium

  14. Simulated Linkage

  15. Computer Requirements500,000 markers, 33,414 animals

  16. Measures of Haplotyping Success • Does estimated = true genotype? • Does estimated = true linkage for adjacent heterozygous markers? • Does estimated = true paternity? • How many alleles remain missing? • What is the error rate (Druet, 2010)? • What is corr2(estimated, true genotype)? • Are resulting GEBVs reliable?

  17. 500K Imputation Results

  18. 500K Imputation Results

  19. Imputation Summary • 1,406 young animals genotyped at 500K • REL gain 0.8% vs. 1.4% with all 500K • Imputation better if ancestors also genotyped • Could genotype additional reference bulls instead of re-genotyping bulls already done • 32,008 animals imputed from 50K • 10% SNP known before, 93% after • 97-98% of 500K genotypes correct • .839 squared correlation (estimated, true genotype)

  20. Multi-Breed Genomic Evaluation • Treat allele effects as independent, same, or correlated, using data of • 5,331 purebred Holsteins, • 1,361 purebred Jerseys, and • 506 purebred Brown Swiss

  21. Protein Yield R2 Optimum correlation was .3 with 43K markers, and would be larger with more markers

  22. Correlation with Single-Breed GEBV

  23. Fewer Markers, More Animals • Half of young animals assigned 3K • Proven bulls, cows all had 43K • Dams imputed using 43K and 3K • Half of ALL animals assigned 3K • Could 3K reference animals help? • 10,000 proven bulls yet to genotype • Should cows with 3K be predictors?

  24. Reliability from 3K, 43K Mixture

  25. Correlations2 of 3K and PA with 43KGenotyped ancestors had 43K • Consistent gains across traits • Reliability gain from progeny with 3K was 79-87% of gain from 43K • Gain % = [Corr(3K,43K)2 - Corr(PA,43K)2] / [1 - Corr(PA,43K)2] • Large benefits for smaller cost

  26. Conclusions - 1 • Missing genotypes can be filled easily • Population and pedigree haplotyping can both process long segments efficiently • Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours • Haplotyping implemented for April 2010 routine U.S. evaluation • Several recent improvements to accuracy • Ready to include lower or higher density genotypes in evaluations

  27. Conclusions - 2 • More markers improved reliability < 2% • 1,406 high density genotypes sufficient • 32,008 other animals imputed from 50K to 500K in simulation • Fewer markers can decrease cost • More animals can greatly increase reliability and selection differential • Multi-breed model improves reliability only slightly (< 1%) at current density

  28. Acknowledgments • Katie Olson computed the multi-breed genomic evaluation • Mel Tooker assisted with graphics and computation • Bob Schnabel helped improve marker locations on the map

More Related