150 likes | 302 Views
Fill ing Missing Genotypes Using Haplotypes. Genotypes / Haplotypes. Genotypes indicate how many copies of each allele were inherited Haplotypes indicate which alleles are on which chromosome Observed genotypes partitioned into the two unknown haplotypes
E N D
Genotypes / Haplotypes • Genotypes indicate how many copies of each allele were inherited • Haplotypes indicate which alleles are on which chromosome • Observed genotypes partitioned into the two unknown haplotypes • Pedigree haplotyping uses relatives • Population haplotyping finds matching allele patterns
Filling missing genotypes • Predict unknown SNP from known • Measure 3,000, predict 43,000 SNP • Measure 50,000, predict 500,000 • Measure each haplotype at highest density only a few times • Predict dam from progeny SNP • Increase reliabilities for less cost
Haplotyping Programfindhap.f90 • Begin with population haplotyping • Divide chromosomes into segments, ~250 SNP / segment • List haplotypes by genotype match • Similar to fastPhase, IMPUTE • End with pedigree haplotyping • Detect crossover, fix noninheritance • Impute nongenotyped ancestors
Recent Program Revisions • Improved imputation and GEBV reliability since 9WCGALP paper • Changes since January 2010 • Use known haplotype if second is unknown • Use current instead of base frequency • Combine parent haplotypes if crossover is detected • Begin search with parent or grandparent haplotypes • Store 2 most popular progeny haplotypes
Example Bull: O-StyleUSA137611441, Sire = O-Man • Read genotypes and pedigrees • Write haplotype segments found • List paternal / maternal inheritance • List crossover locations
Pedigree HaplotypingAB allele coding Genotypes: OMan BB,AA,AA,AB,AA,AB,AB,AA,AA,AB Ostyle BB,AA,AA,AB,AB,AA,AA,AA,AA,AB Haplotypes: OStyle (pat) B A A _ A A A A A _ OStyle (mat) B A A _ B A A A A _
Allele and Segment Coding • Genotypes • 0 = BB, 1 = AB or BA, 2 = AA • 5 = missing • Haplotypes • 0 = B, 1 = not known, 2 = A • Segment storage (example) • O-Style has haplotype numbers 5 and 8 • O-Man has haplotype numbers 8 and 21 • O-Style got haplotype number 5 from dam
Most Frequent Haplotypes1st segment of chromosome 15 1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202 3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 022020222020222002022022202020000202220000200002020002002 5 3.66% 022222222020222022020200220000020222202000002020220002022 6 3.65% 022020022202200200022020220000220202200002200222200202222 7 3.51% 022002222020222022022020220200222002200000002022220002220 8 3.42% 022002222002220022022020220020200202202000202020020002020 9 3.24% 022222222020200000022020220020200202202000202020020002020 10 3.22% 022002222002220022002020002220000202200000202022020202220 Most frequent haplotype in Holsteins had 4,316 copies = .0516 * 41,822 animals * 2 chromosomes each
Population Haplotyping Steps • Put first genotype into haplotype list • Check next genotype against list • Do any homozygous loci conflict? • If haplotype conflicts, continue search • If match, fill any unknown SNP with homozygote • 2nd haplotype = genotype minus 1st haplotype • Search for 2nd haplotype in rest of list • If no match in list, add to end of list • Sort list to put frequent haplotypes 1st
Check New Genotype Against List1st segment of chromosome 15 Check genotype: 022112222011221022021110220010110212202000102020120002021 5.16% 022222222020020022002020200020000200202000022022222202220 4.37% 022020220202200020022022200002200200200000200222200002202 4.36% 022020022202200200022020220000220202200002200222200202220 3.67% 022020222020222002022022202020000202220000200002020002002 3.66% 022222222020222022020200220000020222202000002020220002022 3.65% 022020022202200200022020220000220202200002200222200202222 3.51% 022002222020222022022020220200222002200000002022220002220 3.42% 022002222002220022022020220020200202202000202020020002020 3.24% 022222222020200000022020220020200202202000202020020002020 3.22% 022002222002220022002020002220000202200000202022020202220 Subtract 1st haplotype from genotype to get 2nd: 022002222002220022022020220020200202202000202020020002020
Conclusions • Missing genotypes can be filled easily • Population and pedigree haplotyping can both process long segments efficiently • Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours • Program findhap.f90 implemented for April 2010 routine evaluation • Several recent improvements to accuracy • Ready to include lower or higher density genotypes in evaluations