1 / 41

Imputation

Imputation. 1 00 0 1111 0 1 22 00 2 00 1 2 0 2 1 2 111 0 1111 2 1

kamala
Download Presentation

Imputation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Imputation 100 011110 1220020012 02121110111121 10111100112110002012200222011112021012002111221100211120220 0011110010110110102200110022011011200201101020222121122101220 2010011100011220221222112021120120201002022020002122 21122011101210011121110211211002010210002200020221 201000201100002202211022112101121110122220012011 12220020002002020201222110022222220022121111220 2100211112001101110112002022200011120110102121 1121211102022100211201211001111102111211020002 122000101101110202200221110102011121111011221 202102102121101102212200121101121101202201100 01 22200210021100011100211021101110002220021121 22121211000222010200222212001221121210111011 11 2002011020200122222200211102200112211122 10101121211202111 2112 12112121 10120 1021 01 11220012 10021 0022 11 12 1 021 1 2 12001 0 12

  2. Imputation • Based on splitting the genotype into individual chromosomes (maternal and paternal contributions) • Missing SNPs assigned by tracking inheritance from ancestors and descendents • Imputed dams increase predictor population • Genotypes from all chips merged by imputing SNPs not present

  3. Terms • Genotype – Alleles on both chromosomes for all markers • Allele representation – A,B; A,C,T,G • Genotype representation – number of A’s; 0,1,2,5 (missing) • Imputation – Determination of an allele from alleles of other markers and animals • Phasing – Separating a genotype into individual chromosomes and possibly assigning maternal or paternal origin

  4. Genotype for Elevation • Chromosome 1 10001112200200121110111121111011110011211000201220022201111202101200211122110021112001111001011011010220011002201101120020110102022212112210201001110001122022122211202112012020100202202000021100011202011221112111022011110000212202000221012020002211220111012100111211102112110020102100022000220100020110000220221102211210112111012222001211212220020002002020201222110022222220022121111210021111200110111011200202220001112011010211121211102022100211201211001111102111211021112200010110111020220022111010201112111101120210210212110110221220012110112110120220110022200210021100011100211021101110002220020221212110002220102002222121221121112002011020200122222211221202121121011001211011020022000200100200011110110012110212121112010101212022101010111110211021122111111212111210110120011111021111011111220121012121101022202021211222120222002121210121210201100111222121101

  5. X chromosome • Bull 202220200002022220002020222020202 • Cow 1201201212222010111022210210212022

  6. Pedigree – parents, grandparents, etc.

  7. O-Style haplotypes – chromosome 15

  8. findhap • Developed by Paul VanRaden • Divides chromosomes into segments • Allows for successively shorter segments, typically 3 runs • Long segments lock in identical by descent • Shorter segments fill in missing SNPs • Separates genotype into maternal and paternal contribution, haplotypes (phasing) • Builds haplotype library sequenced by frequency

  9. findhap characteristics • Population haplotyping • Divides chromosomes into segments • Lists haplotypes by genotype match • Similar to FastPhase, Impute, or long range phasing • Pedigree haplotyping • Detects crossover; fixes noninheritance • Imputes nongenotyped ancestors

  10. Recent program revisions • Improved imputation and reliability • Changes since January 2010 • Use known haplotype if 2nd is unknown • Use current instead of base frequency • Combine parent haplotypes if crossover is detected • Begin search with parent or grandparent haplotypes • Store 2 most popular progeny haplotypes • Decreased computing time by using previous haplotype library

  11. Population haplotyping • Put 1st genotype into haplotype list • Check next genotype against list • Do any homozygous loci conflict? • If haplotype conflicts, continue search • If match, fill any unknown SNP with homozygote • 2nd haplotype = genotype minus 1st haplotype • Search for 2nd haplotype in rest of list • If no match in list, add to end of list • Sort list to put frequent haplotypes 1st

  12. Coding of alleles and segments • Genotypes • 0 = BB, 1 = AB or BA, 2 = AA, • 3 = B_, 4 = A_, 5 = __ (missing) • Allele frequency used for missing • Haplotypes • 0 = B, 1 = not known, 2 = A • Segment inheritance (example) • Son has haplotype numbers 5 and 8 • Sire has haplotype numbers 8 and 21 • Son got haplotype number 5 from dam

  13. Most frequent haplotypes • 1st segment of chromosome 15 • For efficiency, store haplotypes just once • Most frequent Holstein haplotype had 4,316 copies (0.0516 41,822 animals 2 chromosomes each) 1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202 3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 022020222020222002022022202020000202220000200002020002002 5 3.66% 022222222020222022020200220000020222202000002020220002022 6 3.65% 022020022202200200022020220000220202200002200222200202222 7 3.51% 022002222020222022022020220200222002200000002022220002220 8 3.42% 022002222002220022022020220020200202202000202020020002020 9 3.24% 022222222020200000022020220020200202202000202020020002020 10 3.22% 022002222002220022002020002220000202200000202022020202220

  14. Check new genotype against list • 1st segment of chromosome 15 • Search for 1st haplotype that matches genotype 022112222011221022021110220010110212202000102020120002021 • Get 2nd haplotype by removing 1st from genotype 022002222002220022022020220020200202202000202020020002020 5.16% 022222222020020022002020200020000200202000022022222202220 4.37% 022020220202200020022022200002200200200000200222200002202 4.36% 022020022202200200022020220000220202200002200222200202220 3.67% 022020222020222002022022202020000202220000200002020002002 3.66% 022222222020222022020200220000020222202000002020220002022 3.65%022020022202200200022020220000220202200002200222200202222 3.51% 022002222020222022022020220200222002200000002022220002220 3.42%022002222002220022022020220020200202202000202020020002020 3.24% 022222222020200000022020220020200202202000202020020002020 3.22%022002222002220022002020002220000202200000202022020202220

  15. Recessive defect discovery • Check for homozygous haplotypes • Most haplotype blocks ~5 Mbp long • 7–90 expected, but 0 observed • 5 of top 11 haplotypes confirmed as lethal • Investigation of 936–52,449 carrier sirecarrier MGS fertility records found 3.0–3.7% lower conception rates

  16. Traditional evaluations 3X/year • Yield • Milk, fat, protein, component percentages • Type • Stature, udder characteristics, feet and legs • Calving • Calving ease, stillbirth rate • Functional • Somatic cell score, productive life, fertility

  17. Genomic prediction of progeny test 0 1 2 3 4 5 • Select parents, transfer embryos to recipients Calves born from DNA-selected parents Bull receives progeny test • Calves born and DNA tested Reduce generation interval from 5 to 2 yr

  18. Benefit of genomics • Determine value of bull at birth • Increase selection accuracy • Reduce generation interval • Increase selection intensity • Increase rate of genetic gain

  19. Genomic evaluation program • Identify animals to genotype • Send sample to genotyping laboratory • Genotype sample • Send genotype to evaluation center • Calculate genomic evaluation • Release monthly evaluation

  20. Genomic data flow DHI herd DNA samples DNA samples genomic evaluations DNA samples DNA laboratory AI organization, breed association genotypes nominations, pedigree data genotype quality reports genomic evaluations genotypes CDCB

  21. Genotyped animals – April 2013

  22. Steps to prepare genotypes • Nominate animal for genotyping • Collect blood, hair, semen, nasal swab, or ear punch • Blood may not be suitable for twins • Extract DNA at laboratory • Prepare DNA and apply to beadchip • Do amplification and hybridization, 3-day process • Read red/green intensities from chip and call genotypes from clusters

  23. What can go wrong • Inadequate DNA quality or quantity from sample • Genotype with many SNPs that cannot be determined (90% call rate required) • Parent-progeny conflicts • Pedigree error • Sample ID error (switched samples) • Laboratory error • Parent-progeny relationship detected not in pedigree

  24. Parentage validation and discovery • Parent-progeny conflicts detected • Animal checked against all other genotypes • Conflict reported to breeds and requesters • Correct sire usually detected • MGS checked • 1 SNP at a time • Haplotype checking more accurate • Breeds moving to accept SNPs in place of microsatellites

  25. Parent-progeny conflicts Sire Conflicts = 0 *Tests = 10 Conflict % = 0% MGS Conflicts = 3 *Tests = 10 Conflict % = 30.0% Conflict % Relationship

  26. For animal Pedigree wrong Genotype unreliable (3K) For SNP SNP unreliable Clustering needs adjustment Parent 10212002101201211001020100100 Progeny 10202010100200221001120120220 Parent-progeny conflicts

  27. Detecting unreliable genotypes Unreliable genotype (reject) Reject Accept 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2.0 3.6 2.4 2.8 3.2 Conflicts (%)

  28. MGS detection • SNP conflict method (SNP) • Check if animal and MGS have opposite homozygotes(duo test) • If sire is genotyped, some heterozygous SNP can be checked (trio test) • Common haplotype method (HAP) • After imputation of all loci, determine maternal contribution by removing paternal haplotype • Count maternal haplotypes in common with MGS • Remove haplotypes from MGS and check remaining against maternal great-grandsire (MGGS)

  29. Results by breed *50K genotyped animals only

  30. Lab QC • Each SNP evaluated for • Call rate • Portion heterozygous • Parent-progeny conflicts • Clustering investigated if SNP exceeds limits • Number of failing SNPs indicates genotype quality • Target <10 SNPs in each category

  31. Before clustering adjustment 86% call rate

  32. After clustering adjustment 100% call rate

  33. Automated QC reporting 6160 Genotypes Processed from LAB2013021811 PASS/FAIL,Count,Description PASS,1,Parent Progeny Conflict SNP >2% PASS,5,Low Call Rate SNP >10% PASS,0,HWE SNP PASS,0,Chips w/ >20 Conflicts PASS,0.3,No Nomination % PASS,0,Genotype Submitted with No Sample Sheet Row

  34. Reliability of Holstein predictions *2011 deregressed value – 2007 genomic evaluation

  35. Marketed Holstein bulls

  36. Ways to increase accuracy • Automatic addition of traditional evaluations of genotyped bulls when are 5 yr old • Possible genotyping of 10,000 bulls with semen in repository • Collaboration with other countries • Use of more SNPs from HD chips • Full sequencing – identify causative mutations

  37. Application to more traits • Animal’s genotype is good for all traits • Traditional evaluations required for accurate estimates of SNP effects • Traditional evaluations not currently available for heat tolerance or feed efficiency • Research populations could provide data for traits that are expensive to measure • Will resulting evaluations work in target population?

  38. Impact on producers • Young-bull evaluations with accuracy of early 1st­crop evaluations • AI organizations marketing genomically evaluated young bulls • Genotype usually required to be a bull dam • Rate of genetic improvement likely to increase by up to 50% • AI organizations reducing progeny-test programs

  39. Why genomics works for dairy cattle • Extensive historical data available • Well developed genetic evaluation program • Widespread use of AI sires • Progeny-test programs • High-value animals worth the cost of genotyping • Long generation interval that can be reduced substantially by genomics

  40. Council on Dairy Cattle Breeding – CDCB • CDCB assuming responsibility for receiving data and computing and delivering U.S. evaluations • USDA will continue research and development to improve evaluation system • CDCB and USDA employees located at USDA’s Beltsville Agricultural Research Center in Beltsville, Maryland

  41. Questions?

More Related