1 / 40

Use of NGS to identify the causal variant associated with a complex phenotype

This overview discusses the use of Next Generation Sequencing (NGS) to identify the causal variant associated with a complex phenotype. It covers the process of sequencing, selecting animals, and analyzing the data. It also highlights the challenges and potential solutions in the field.

dunston
Download Presentation

Use of NGS to identify the causal variant associated with a complex phenotype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use of NGS to identify the causal variant associated with a complex phenotype

  2. Overview • Why are we sequencing? • How did we select the animals to sequence? • What are the steps involved in the process? • What do you do with the reads once you have them? • Where are we now?

  3. Introduction • Several studies (Kuhn et al., 2003; Cole et al., 2007; Seidenspinner et al., 2009) have reported QTL on BTA 18 associated with dystocia • Bioinformatic analysis using SNP data has not identified the causal variant • Next generation sequencing (NGS) has recently been used to find causal variants for novel recessive disorders

  4. Chromosome 18 is different • Markers on chromosome 18 have large effects on several traits: • Dystocia and stillbirth: Sire and daughter calving ease and sire stillbirth • Conformation: rump width, stature, strength, and body depth • Efficiency: longevity and net merit • Large calves contribute to reduced lifetimes and decreased profitability

  5. Marker effects for dystocia complex AR-BFGL-NGS-109285 ARS-BFGL-NGS-109285 Cole et al., 2009 (J. Dairy Sci. 92:2931–2946) Cole et al., 2009 (J. Dairy Sci. 92:2931–2946)

  6. Correlations in dystocia complex

  7. The QTL also affects gestation length Maltecca et al. 2011. Animal Genetics, 42:6, 585-591.

  8. Overview of the dystocia complex • The key marker is ARS-BFGL-NGS-109285 at (rs109478645 ) 57,585,121 Mb on BTA18 • Intronic to SIGLEC12 (sialic acid binding Ig-like lectin 12) • Recent results indicate effects on gestationlength (Maltecca et al., 2011) and calf birth weight (Cole et al., unpublished data)

  9. This is a gene-rich region http://useast.ensembl.org/Bos_taurus/Location/View?r=18%3A57583000-57587000 http://www.ncbi.nlm.nih.gov/gene?cmd=Retrieve&dopt=Graphics&list_uids=618463

  10. Copy number variants are present • ARS-BFGL-NGS-109285 is flanked by CNV • There’s a loss and a gain to the left (8 SNP region) • There’s a gain to the right (10 SNP region) • This can result in assembly problems Hou et al. 2011. Genomic characteristics of cattle copy number variations. BMC Genomics. 12:127.http://www.biomedcentral.com/1471-2164/12/127

  11. Where did this problem come from? 40,803 daughters http://aipl.arsusda.gov/CF-queries/Bull_Chromosomal_EBV/bull_chromosomal_ebv.cfm?

  12. What if we look at a different trait? • Cole et al. (2007) proposed the following mechanism: • SIGLEC12 may sequester circulating leptin • This increases gestation length • Calf birth weight (BW) is higher because of increased gestation length • Higher BW is associated with dystocia

  13. We don’t have birth weight data • Birth weights are not routinely recorded in the US • Collaborated with Hermann Swalve’s group to develop a selection index prediction of BW PTA • Performed GWAS and gene set enrichment analysis to search for interesting associations

  14. GWAS for birth weight PTA h Cole et al.(2013), unpublished data

  15. Are we measuring anything new? • Identified a SNP intronic to LHX4, which is associated with cow body weight and length (Ren et al., 2010, Mol. Bio. Reprod., 37:417-422). • 4 SNP in the QTL region on BTA 18 had large effects • Several other SNP with large effects intronic or adjacent to genes with unknown functions

  16. KEGG pathways for birth weight What does regulation of the actin cytoskeleton have to do with birth weight in cattle? That is, do these results make sense? Maybe…these pathways may be involved in establishment & maintenance of pregnancy, as well as coordination of growth and development. Cole et al.(2013), unpublished data

  17. Sequencing is becoming very affordable

  18. Sequencing successes at AIPL/BFGL • Simple loss-of-function mutations • APAF1– Spontaneous abortions in Holstein cattle (Adams et al., 2012) • CWC15– Early embryonic death in Jersey cattle (Sonstegard et al., 2013) • Weaver syndrome – Neurological degeneration and death in Brown Swiss cattle (McClure et al., 2013)

  19. Original pedigree-based design Bull A (1968) AA, SCE: 8 Bull B (1962) AA, SCE: 7 MGS Bull C (1975) AA, SCE: 8 δ = 10 Bull E (1974) Aa, SCE: 10 Bull H (1989) Aa, SCE: 14 Bull D (1968) ??, SCE: 7 Bull I (1994) Aa, SCE: 18 Bull E (1982) Aa, SCE: 8 Bull F (1987) Aa, SCE: 15 MGS MGS

  20. Modified pedigree & haplotype design These bulls carry the haplotype with the largest, negative effect on SCE: Bull J (2002) Aa, SCE: 6 Bull A (1968) AA, SCE: 8 Bull B (1962) AA, SCE: 7 Bull K (2002) Aa, SCE: 15 MGS Bull C (1975) AA, SCE: 8 δ = 10 Bull J (2002) aa, SCE: 15 Bull E(1974) Aa, SCE: 10 Bull H(1989) Aa, SCE: 14 Bull I(1994) Aa, SCE: 18 Bull E (1982) Aa, SCE: 8 Bull F(1987) Aa, SCE: 15 MGS Couldn’t obtain DNA: Bull D (1968) ??, SCE: 7

  21. Molecular prep Sample Collection Library Construction Library Quality Control DNA Extraction DNA Quality Control

  22. Sample preparation time is substantial • DNA Extraction: ~12 hours (30 mins) • DNA QC: ~1-2 hours (1-2 hours) • Library Construction: 48 hours (12 hours) • Library QC: ~2-4 hours (1 hour) • Total: 3-4 days (15.5 hours)

  23. DNA quality

  24. Library quality

  25. Sequencing stage • Illumina cBot: • Preps DNA for sequencing • Takes 4-5 hours • Must be done 48 hours before • Illumina HiSeq 2000: • Does the sequencing • Takes ~10-14 days for 100 x 100 • Minimal hands-on time

  26. Anatomy of a flow cell • 8 lanes per flow cell • 3 columns per lane • 96 tiles per column • Each tile imaged 8 times • 1 from upper surface, 1 from lower • Approximately 300Gb of sequence per flow cell http://www.qbi.uq.edu.au/images/genomics/genomics1.jpg

  27. Sequencing by synthesis https://www.broadinstitute.org/files/shared/illuminavids/sequencingSlides.pdf

  28. How many scientists does it take…

  29. Flowcell 1: Cluster densities Cluster densities from current HiSeq run finished 30 April 2013 (unpublished data):

  30. Flowcell 2: Cluster densities Cluster densities from current HiSeq run started 22 May 2013 (unpublished data):

  31. The Aftermath • Total Time (sample to sequence): • 3 weeks • That’s assuming nothing went wrong! • More realistic: months • Resulting Data • Large text files • ~300 gigabytes compressed • Analysis • Often underestimated • Can take months as well

  32. Variant detection Raw Sequencer Output Alignment to the Genome Variant Detection • Alignment against a reference genome • Analysis is very disk I/O-intensive.

  33. Computational Logistics • Desktop computers • Viable for single lanes • Long computation time • Servers are better • >100GB RAM and >16 processorcores • Cloud • Amazon Web Services • iAnimal/iPlant

  34. Storage considerations • What to save? • Raw data? • Processed results? • How much workspace? • Suggestions: • Workspace 10x compressed files • Save alignments • Backup REGULARLY!!!

  35. Why should you use a pipeline? • Automates analysis • Maximizes resource consumption • Because post-docs aren’t cheap

  36. Many options for analysis pipelines • Galaxy server • NextGene • Custom pipeline • Scripting languages • Open-source tools

  37. Challenges • Annotation • This is a mess in the cow • The reference assembly may not be representative of all taurine cows • Validation • Doing functional genomics with large mammals is expensive – who pays? • When have we proven something?

  38. Conclusions • Sequencing is powerful, but presents many challenges • Computational requirements are substantial • We’re learning how much we don’t know about functional genomics in the cow • Validation remains a problem

  39. Acknowledgments • AIPL: Derek Bickhart, Dan Null, Paul VanRaden • BFGL: Reuben Anderson, Steve Schroeder, Tad Sonstegard, Curt Van Tassell

  40. http://gigaom.com/2012/05/31/t-mobile-pits-its-math-against-verizons-the-loser-common-sense/shutterstock_76826245/http://gigaom.com/2012/05/31/t-mobile-pits-its-math-against-verizons-the-loser-common-sense/shutterstock_76826245/ Questions?

More Related