1 / 44

Exome Sequencing Data Analysis

Exome Sequencing Data Analysis. Chunhua Yan, Ph.D. NCI Center for Biomedical Informatics and Information Technology (CBIIT ) 3/18/2015. Sample preparation Genomic DNA. Exome sequencing workflow. Library construction. Sequencing Illumina , Roche, Life Tech. Raw reads FASTQ, QC report.

chi
Download Presentation

Exome Sequencing Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exome Sequencing Data Analysis Chunhua Yan, Ph.D. NCI Center for Biomedical Informatics and Information Technology (CBIIT) 3/18/2015

  2. Sample preparation Genomic DNA Exome sequencing workflow Library construction Sequencing Illumina, Roche, Life Tech. Raw reads FASTQ, QC report Bwa-Mem, Novoalign, SAMtools Mapped reads Bam files, Alignment stats MuTect, VarScan, GATK Possible mutations and indels ANNOVAR, MutSig, HotNet, MEMo Annotated and filtered functional variants Manual review, Target re-sequencing Confirmed novel functional variants

  3. Next generation sequencing (NGS) • Platform • Productivity • Exomesequencing • Experimental design • Mutation study resources

  4. Sanger sequencing was the 20th century technology

  5. Next generation sequencing technology is just a decade old

  6. Comparison of sequencing methods https://www.qiagen.com/us/resources/e-learning/webinars/next-generation-sequencing/

  7. DNA sequencing – The past decade Mardis, E.R. (2011) Nature. 470(7333):198-203. .

  8. http://www.genome.gov/sequencingcosts/

  9. http://dnatech.genomecenter.ucdavis.edu/next-generation-sequencing/http://dnatech.genomecenter.ucdavis.edu/next-generation-sequencing/

  10. Next generation sequencing (NGS) • Exome sequencing • Benefit • Capture technology • Experimental design • Mutation study resource

  11. NextGenerationSequencing Applications Genomics Transcript- omics Epi- genomics • Dna-seq • ChIP-Seq, Methyl-Seq • RNA-seq • Global mapping of DNA-protein interactions • DNA methylation • Histone modification • Expression level • Novel transcripts • Fusion transcripts • Splice variants • Mutation, SNVs, • Indels, • CNVs, • Translocation

  12. Whole Exome Sequencing, Why? • Focuses on the part of the genome we understand best, the exons of genes • About 85% of known mutations in Mendelian diseases affect the exome • Nonsense, missense, splice, indel mutations • Depending on the annotation and coverage of flanking sequencing: ~35-60Mb => 1-2% of human genomes • There are ~200,000 coding exons in ~20,000 genes • A whole exome is 1/6 the cost of whole genome and 1/15 the amount of data Biesecker, L. et al. (2011) Genome Biology 12:128

  13. Exomesequencing balances the coverage and cost Sanger Targeted Exome Whole Genome • Accurate • Cheapperexon • Highturn‐around • Optimizationpossible • Lowchance ofincidentalfindings • Easy analysis • Nobiasforgenes • Standardizedworkflow • Re‐useofperformedexomesto interpretnew ones • No sequencing bias • Detect SVs and SNVs • Lowdiagnosticyieldforgenetically heterogeneousdiseases • Designandre‐designrequired • Differentdesignsfordifferentdisorders • No non-coding regions • Sequencing bias • Incidental findings • Data analysis bottleneck • Interpretation of non-coding regions • Expensive, time-consuming http://webinar.sciencemag.org/webinar/archive/exome-sequencing-today%E2%80%99s-lab

  14. Exome sequencing detects mutations http://www.stjude.org

  15. Somatic mutation calls require tumor-normal paired samples normal tumor

  16. The SureSelect Target Enrichment workflow http://www.genomics.agilent.com/article.jsp?pageId=3083

  17. Comparison of four exomecapture systems Chilamakuri, C. et al. (2014) BMC Genomics. 15:449..

  18. Coverage of exome capture systems (33.3Mb coding exons) Chilamakuri, C. et al. (2014) BMC Genomics. 15:449..

  19. Next generation sequencing (NGS) • Exomesequencing • Experimental design • Sample size • Sequencing coverage • Mutation study resource

  20. Complexity of the cancer genome OVCAR-3, NCI60 cell line, Ovarian cancer http://www.ncbi.nlm.nih.gov/projects/sky/skyquery.cgi

  21. WholeexomeDNAsources • Tumor DNA • Fresh frozen (FF) • Paraffin embedded tissue (FFPE) • Cell line • Xenograft • Normal tissue • Blood • Neighboring tissue • Clinical information • DOB • Date of diagnosis • Malignancy stage • Location of primary tumor • Location of metastatic tumor • Therapies

  22. The Number of samples needed to detect significantly mutated genes Lawrence, M.S. et al. (2014) Nature. 505(7484):495-501

  23. Variants detected in exome sequencing data from the paired FF/FFPE samples Hedegaard, J. et al. (2014) PLoSONE 9(5): e98187.

  24. Sequencing terminology 1. Insert 2. Read 3. Single Read (SR) 4. Paired End (PE) 5. Multiplexing 6. Flowcell 7. Lane Normand, R. et al. (2013) Methods Mol Biol. 1038:1-26.

  25. Sequencing Coverage Average coverage = read length * number of mapped reads/genome size Normand, R. et al. (2013) Methods Mol Biol. 1038:1-26.

  26. Diploid genome and coverage Wendl, M.C. et al. (2008) BMC Bioinformatics. 9:239.

  27. Polyploid genome and coverage Wendl, M.C. et al. (2008) BMC Bioinformatics. 9:239.

  28. Single cell exome sequencing demonstrates the sample heterogeneity Triple-negative breast cancer (TNBC) Wang, Y. et al. (2014) Nature. 512(7513):155-60.

  29. Certain mutations only occur in a subset of TNBC cell populations Wang, Y. et al. (2014) Nature. 512(7513):155-60.

  30. Clonal evolution in the TNBC inferred from single cell exomeand copy number data Wang, Y. et al. (2014) Nature. 512(7513):155-60.

  31. High coverage is needed for low tumor fraction samples. Ding, L. et al. (2014) Nat Rev Genet. 15(8):556-70

  32. Mutation detection sensitivity as a function of sequencing depth and allelic fraction Cibulskis, K. et al. (2013) Nat Biotechnol. 31(3):213-9.

  33. Steps to Bring in Projects to CCR–SF Make sure to consult your bioinformatics for experiment design https://ostr.cancer.gov/sequence/request Courtesy of Yongmei Zhao, CCR-SF

  34. Next generation sequencing (NGS) • Exomesequencing • Experimental design • Mutation study resources • Genome in a Bottle • DREAM mutation challenge

  35. Genome in a Bottle Consortium • No widely accepted set of metrics to characterize the fidelity of variant calls from NGS… • Genome in a Bottle Consortium is developing standards to address this… • well-characterized human genomes as Reference Materials (RMs) • characterized and disseminated by NIST • tools and methods to use these RMs • Global Alliance for Genomics and Health Benchmarking Team http://genomeinabottle.org

  36. The data sets for NA12878 are available at the Genome in a Bottle ftp site at NCBI ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878 Zook, J.M. et al. (2014) Nat Biotechnol. 32(3):246-51.

  37. Integration Methods to Establish Benchmark Variant Calls Zook, JM et al (2014) Nat Biotechnol. 32(3):246-51. http://www.slideshare.net/GenomeInABottle/presentations

  38. ~2.7M high confident snps are detected by multiple algorithms Exome SNPs Exome SNPs Zook, J.M. et al. (2014) Nat Biotechnol. 32(3):246-51.

  39. The mutation caller performance varies drastically 16 LUSC tumor-normal exome-seq pairs Kim, S.Y., Speed, T.P. (2013) BMC Bioinformatics. 10;14:189.

  40. http://dreamchallenges.org/

  41. Challenge Data and Assessment http://dreamchallenges.org/

  42. In silico challenge #4 (SEPT, 2014) Tumor sample data, early 2015. http://dreamchallenges.org/

  43. What can we learn from exome sequencing Mudvari, P. et al. (2013) J Metabolomics Syst Biol. 1(1). pii: 7.

More Related