1 / 14

Efficient Estimation of Breeding Values from Dense Genomic Data

This study explores the efficient computing algorithms required for estimating breeding values from dense genomic data with 50,000 SNPs per animal and 3,000 Holstein bulls. The use of linear estimates and non-linear models is discussed, along with the impact of genomic calculations on pedigree evaluation. By simulating SNPs and QTLs, genomic EBVs can be calculated using selection index equations, and marker effect prior distribution. Computational memory management techniques and convergence strategies are also outlined for optimized computing times and accurate results. Real data predictions for Holstein bulls will be available soon.

dunnrobert
Download Presentation

Efficient Estimation of Breeding Values from Dense Genomic Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Estimation of Breeding Values from Dense Genomic Data

  2. Genomic Calculations • Genotypes soon available from BFGL: • 50,000 SNPs / animal • 3,000 animals, many more possible • Need efficient computing algorithms • Traditional PTAs available from AIPL: • PTAs combine phenotypes and pedigree • SNP effects evaluated in second step using deregressed PTAs weighted by reliability

  3. Genomic Computer Programs • Simulate SNPs and QTLs • Compare SNP numbers, size of QTLs • Calculate genomic EBVs • Use selection index, G instead of A • Use iteration on data for SNP effects • Form haplotypes from genotypes? • Not tested yet, SNP regression used

  4. Simulation Program • Save memory by processing each chromosome separately • 3,000 Holstein bulls to genotype • 17,000 ancestors in pedigree file • 1 billion (20,000 x 50,000 SNPs) genotypes simulated per replicate • Only 150 million (3,000 x 50,000) genotypes stored for evaluation

  5. Linear Estimates using Markers • Selection index equations for EBV • u^ = Cov(u,y) Var(y)-1 (y – Xb) • u^ = Z Z’ [Z Z’ + R]-1 (y – Xb) • R has diagonals = (1 / Reliability) - 1 • BLUP equations for marker effects, sum to get EBV • u^ = Z [Z’R-1Z + I k]-1 Z’R-1(y – Xb) • k = var(u) / var(m)

  6. Non-linear vs Linear Models

  7. Marker Effect Prior DistributionNonlinear Model

  8. Iteration on Data • Simple trick to reduce time from quadratic to linear with # SNPs • Sum coefficients x solutions once • Sum – diagonal =  off-diagonals • Janss and de Jong, 1999 conference • Rediscovered by Legarra and Misztal • Elements of Z are –p and (1 – p), where p is frequency of 2nd allele

  9. Computer Memory • Inversion including G matrix • Animals x markers to hold genotypes • Animals2 to hold elements of G • <1 Gbyte for 50,000 SNPs, 3000 bulls • Iteration on genotype data • Markers +animals • <.1 Gbyte for 50,000 SNPs, 3000 bulls • Little memory required for either

  10. Computing Times • Inversion including G matrix • Animals2 x markers to form G matrix • Animals3 to invert selection index • 10 hours for 3000 bulls, 50,000 SNPs • Iteration on genotype data • Markers x animals x iterations • 16 hours for 1000 iterations • .997 correlation with inversion

  11. Convergencewith iteration on data • Jacobi iteration • Use previous round coefficients x solutions • Adaptive under-relaxation • Increase relax if convergence improving • Decrease relax (each round) if diverging • Solution convergence reasonable • SD of change < .0001 after 350 rounds • SD of change < .000001 after 1700 rounds

  12. Potential ResultsSimulation of 50,000 SNPs, 100 QTLs Higher REL if major QTLs exist or >3000 bulls genotyped, lower if more loci (>100) affect trait Reliability = accuracy2

  13. Reliability from Genotyping • Daughter equivalents • DETotal = DEPA + DEProg + DEYD + DEG • DEG is additional DE from genotype • REL = DEtotal / (DETotal + k) • Gains in reliability • DEG could be about 15 for Net Merit • More for traits with low heritability • Less for traits with high heritability

  14. Conclusions • Predictions from 50,000 SNPs using: • Selection index equations, or • Iteration on genotype data • Predictions correlated by up to .9999 • Linear and nonlinear costs OK • Convergence within 200 to 2500 rounds • Nonlinear regression improved reliabilities • Real data predictions available soon

More Related