140 likes | 248 Views
Efficient Estimation of Breeding Values from Dense Genomic Data. Genomic Calculations. Genotypes soon available from BFGL: 50,000 SNPs / animal 3,000 animals, many more possible Need efficient computing algorithms Traditional PTAs available from AIPL:
E N D
Efficient Estimation of Breeding Values from Dense Genomic Data
Genomic Calculations • Genotypes soon available from BFGL: • 50,000 SNPs / animal • 3,000 animals, many more possible • Need efficient computing algorithms • Traditional PTAs available from AIPL: • PTAs combine phenotypes and pedigree • SNP effects evaluated in second step using deregressed PTAs weighted by reliability
Genomic Computer Programs • Simulate SNPs and QTLs • Compare SNP numbers, size of QTLs • Calculate genomic EBVs • Use selection index, G instead of A • Use iteration on data for SNP effects • Form haplotypes from genotypes? • Not tested yet, SNP regression used
Simulation Program • Save memory by processing each chromosome separately • 3,000 Holstein bulls to genotype • 17,000 ancestors in pedigree file • 1 billion (20,000 x 50,000 SNPs) genotypes simulated per replicate • Only 150 million (3,000 x 50,000) genotypes stored for evaluation
Linear Estimates using Markers • Selection index equations for EBV • u^ = Cov(u,y) Var(y)-1 (y – Xb) • u^ = Z Z’ [Z Z’ + R]-1 (y – Xb) • R has diagonals = (1 / Reliability) - 1 • BLUP equations for marker effects, sum to get EBV • u^ = Z [Z’R-1Z + I k]-1 Z’R-1(y – Xb) • k = var(u) / var(m)
Iteration on Data • Simple trick to reduce time from quadratic to linear with # SNPs • Sum coefficients x solutions once • Sum – diagonal = off-diagonals • Janss and de Jong, 1999 conference • Rediscovered by Legarra and Misztal • Elements of Z are –p and (1 – p), where p is frequency of 2nd allele
Computer Memory • Inversion including G matrix • Animals x markers to hold genotypes • Animals2 to hold elements of G • <1 Gbyte for 50,000 SNPs, 3000 bulls • Iteration on genotype data • Markers +animals • <.1 Gbyte for 50,000 SNPs, 3000 bulls • Little memory required for either
Computing Times • Inversion including G matrix • Animals2 x markers to form G matrix • Animals3 to invert selection index • 10 hours for 3000 bulls, 50,000 SNPs • Iteration on genotype data • Markers x animals x iterations • 16 hours for 1000 iterations • .997 correlation with inversion
Convergencewith iteration on data • Jacobi iteration • Use previous round coefficients x solutions • Adaptive under-relaxation • Increase relax if convergence improving • Decrease relax (each round) if diverging • Solution convergence reasonable • SD of change < .0001 after 350 rounds • SD of change < .000001 after 1700 rounds
Potential ResultsSimulation of 50,000 SNPs, 100 QTLs Higher REL if major QTLs exist or >3000 bulls genotyped, lower if more loci (>100) affect trait Reliability = accuracy2
Reliability from Genotyping • Daughter equivalents • DETotal = DEPA + DEProg + DEYD + DEG • DEG is additional DE from genotype • REL = DEtotal / (DETotal + k) • Gains in reliability • DEG could be about 15 for Net Merit • More for traits with low heritability • Less for traits with high heritability
Conclusions • Predictions from 50,000 SNPs using: • Selection index equations, or • Iteration on genotype data • Predictions correlated by up to .9999 • Linear and nonlinear costs OK • Convergence within 200 to 2500 rounds • Nonlinear regression improved reliabilities • Real data predictions available soon