1 / 24

A scalable Bayesian mixed model approach for GWAS and genomic prediction

Explore a scalable Bayesian methodology for GWAS and genomic prediction, addressing big-data challenges and maximizing prediction accuracy. This method offers fast and accurate approximation, making it ideal for various sample types including dairy cattle and UK Biobank.

lamb
Download Presentation

A scalable Bayesian mixed model approach for GWAS and genomic prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A scalable Bayesian mixed model approach for GWAS and genomic prediction Jicai Jiang1, Li Ma2, Paul M. VanRaden3, Jeffrey R. O’Connell1 1University of Maryland School of Medicine, Baltimore, MD 2University of Maryland, College Park, MD 3AGIL-ARS-USDA, Beltsville, MD 2019 Interbull Meeting, Cincinnati, Ohio June 22, 2019

  2. Introduction

  3. Big-data challenges

  4. Recent advances

  5. Issues • in BOLT and SAIGE for GWAS • Correction factor c may have big variation. • Loss of accuracy in test statistics • Genomic prediction • Scalable and accurate Bayesian approaches are lacking.

  6. Methods

  7. SNP-set Genomic Prediction (SSGP) SNP grouping SNP weighting No grouping Bayes A Prior 1 No grouping Horseshoe Prior 2

  8. Mean field approximation Consider proximity-based SNP grouping of equal size … SNP set size S=1:BayesA S=M: GBLUP ↓ as S ↑ Prediction accuracy of P may ↑ as S↓ We want smaller and higher-accuracy P. A small S may work well for genomic prediction.

  9. Variational inference (VI)

  10. Association testing • Note . • We compute by block-wise Gauss-Seidel method. • We compute similarly. for z in group h. • S ∈ [1000, 5000] Fast and accurate approximation!

  11. Time complexity • Scaling linearly in S, N, and M

  12. Results

  13. Dairy bull data for genomic prediction

  14. Time with single core on Intel Xeon E5-2680

  15. Cow data for GWAS • 300K cows with yield deviations • 60K SNPs • SAIGE as benchmark • 10K cows randomly sampled from 300K • 100 replicates • BFMAP (like EMMAX but 15X faster) as benchmark • Slope and R2of lm() for 60K SNPs

  16. 100 replicates

  17. Red line: y=x Red line: y=x

  18. 300K cows Fat % ≈ PAEP FASN ≈

  19. SAIGE SSGP 30 markers 30 markers

  20. Time on MacBook Pro (Intel i9) for 300K cows Estimating the correction factor is time-consuming.

  21. Summary • SSGP can be applied to various types of samples. • Mostly unrelated, like UK Biobank • Highly related, like dairy cattle • Admixed samples • SSGP is accurate for GWAS and for genomic prediction. • SSGP is fast. • 1 million animals and 60K SNPs: <10 hours for GWAS and <5 hours for computing SNP effects on standard hardware. • SSGP can be applied to sequence GWA. • Reasonably more computational burden than linear regression.

  22. Acknowledgements • Funding NHLBI U01 HL137181-01 Contact • Jicai Jiang • jicai.jiang@gmail.com • Software • SSGP • https://sites.google.com/view/ssgp • BFMAP • https://jiang18.github.io/bfmap/ • GWA is currently not available in the online version.

More Related