140 likes | 154 Views
This study explores the efficient computing algorithms required for estimating breeding values from dense genomic data with 50,000 SNPs per animal and 3,000 Holstein bulls. The use of linear estimates and non-linear models is discussed, along with the impact of genomic calculations on pedigree evaluation. By simulating SNPs and QTLs, genomic EBVs can be calculated using selection index equations, and marker effect prior distribution. Computational memory management techniques and convergence strategies are also outlined for optimized computing times and accurate results. Real data predictions for Holstein bulls will be available soon.
E N D
Efficient Estimation of Breeding Values from Dense Genomic Data
Genomic Calculations • Genotypes soon available from BFGL: • 50,000 SNPs / animal • 3,000 animals, many more possible • Need efficient computing algorithms • Traditional PTAs available from AIPL: • PTAs combine phenotypes and pedigree • SNP effects evaluated in second step using deregressed PTAs weighted by reliability
Genomic Computer Programs • Simulate SNPs and QTLs • Compare SNP numbers, size of QTLs • Calculate genomic EBVs • Use selection index, G instead of A • Use iteration on data for SNP effects • Form haplotypes from genotypes? • Not tested yet, SNP regression used
Simulation Program • Save memory by processing each chromosome separately • 3,000 Holstein bulls to genotype • 17,000 ancestors in pedigree file • 1 billion (20,000 x 50,000 SNPs) genotypes simulated per replicate • Only 150 million (3,000 x 50,000) genotypes stored for evaluation
Linear Estimates using Markers • Selection index equations for EBV • u^ = Cov(u,y) Var(y)-1 (y – Xb) • u^ = Z Z’ [Z Z’ + R]-1 (y – Xb) • R has diagonals = (1 / Reliability) - 1 • BLUP equations for marker effects, sum to get EBV • u^ = Z [Z’R-1Z + I k]-1 Z’R-1(y – Xb) • k = var(u) / var(m)
Iteration on Data • Simple trick to reduce time from quadratic to linear with # SNPs • Sum coefficients x solutions once • Sum – diagonal = off-diagonals • Janss and de Jong, 1999 conference • Rediscovered by Legarra and Misztal • Elements of Z are –p and (1 – p), where p is frequency of 2nd allele
Computer Memory • Inversion including G matrix • Animals x markers to hold genotypes • Animals2 to hold elements of G • <1 Gbyte for 50,000 SNPs, 3000 bulls • Iteration on genotype data • Markers +animals • <.1 Gbyte for 50,000 SNPs, 3000 bulls • Little memory required for either
Computing Times • Inversion including G matrix • Animals2 x markers to form G matrix • Animals3 to invert selection index • 10 hours for 3000 bulls, 50,000 SNPs • Iteration on genotype data • Markers x animals x iterations • 16 hours for 1000 iterations • .997 correlation with inversion
Convergencewith iteration on data • Jacobi iteration • Use previous round coefficients x solutions • Adaptive under-relaxation • Increase relax if convergence improving • Decrease relax (each round) if diverging • Solution convergence reasonable • SD of change < .0001 after 350 rounds • SD of change < .000001 after 1700 rounds
Potential ResultsSimulation of 50,000 SNPs, 100 QTLs Higher REL if major QTLs exist or >3000 bulls genotyped, lower if more loci (>100) affect trait Reliability = accuracy2
Reliability from Genotyping • Daughter equivalents • DETotal = DEPA + DEProg + DEYD + DEG • DEG is additional DE from genotype • REL = DEtotal / (DETotal + k) • Gains in reliability • DEG could be about 15 for Net Merit • More for traits with low heritability • Less for traits with high heritability
Conclusions • Predictions from 50,000 SNPs using: • Selection index equations, or • Iteration on genotype data • Predictions correlated by up to .9999 • Linear and nonlinear costs OK • Convergence within 200 to 2500 rounds • Nonlinear regression improved reliabilities • Real data predictions available soon