1 / 46

Intro to Next Generation Sequencing

Intro to Next Generation Sequencing. Nick Loman and James Hadfield. http:// omicsmaps.com /. Koboldt et al., 2010 (Figure 3). Bench work to build libraries and sequence. Clean up and QA reads. Alignments to Genome or Transcriptome. Analysis of Alignments. Koboldt et al., 2010.

harper
Download Presentation

Intro to Next Generation Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro to Next Generation Sequencing

  2. Nick Loman and James Hadfield http://omicsmaps.com/

  3. Koboldt et al., 2010 (Figure 3)

  4. Bench work to build libraries and sequence Clean up and QA reads Alignments to Genome or Transcriptome Analysis of Alignments

  5. Koboldt et al., 2010 Sample Contamination Tumor-normal switches Sample mix-ups Run quality Library chimeras

  6. Koboldt et al, (Fig 4A)

  7. Chor et al., 2009

  8. CCL Bio

  9. GCTACGGCATTCAGGCATCAGGCATTAGCAG GGCATTCAGGGATCAGGCATTAGC-> <-CATGGCATTCAGGGATCAGGCATT <-GCCATGGCATTCAGGGATCAGGC CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATTAGC-> CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATT-> <-GGATCAGGCATTAGCAG <-GATCAGGCATTAGCAG <-GGATCAGGCATTAGCAG

  10. High Coverage: qualities may not be needed

  11. Low Coverage: qualities are important

  12. Custodia-Lora et al., 2003

  13. FASTQ Example For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example, Illumina stores quality scores ranging from 0-62; Sanger quality scores range from 0-93. Solexa quality scores have to be converted to PHRED quality scores. • FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771.

  14. SAM (Sequence Alignment/Map) • It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format • SAM is the output of aligners that map reads to a reference genome • Tab delimited w/ header section and alignment section • Header sections begin with @ (are optional) • Alignment section has 11 mandatory fields • BAM is the binary format of SAM http://samtools.sourceforge.net/

  15. Mandatory Alignment Fields http://samtools.sourceforge.net/SAM1.pdf

  16. Alignment Examples Alignments in SAM format http://samtools.sourceforge.net/SAM1.pdf

  17. Valid BED files chr1 86114265 86116346 nsv433165 chr2 1841774 1846089 nsv433166 chr16 2950446 2955264 nsv433167 chr17 14350387 14351933 nsv433168 chr17 32831694 32832761 nsv433169 chr17 32831694 32832761 nsv433170 chr18 61880550 61881930 nsv433171 chr1 16759829 16778548 chr1:21667704 270866 - chr1 16763194 16784844 chr1:146691804 407277 + chr1 16763194 16784844 chr1:144004664 408925 - chr1 16763194 16779513 chr1:142857141 291416 - chr1 16763194 16779513 chr1:143522082 293473 - chr1 16763194 16778548 chr1:146844175 284555 - chr1 16763194 16778548 chr1:147006260 284948 - chr1 16763411 16784844 chr1:144747517 405362 +

  18. GTF

  19. GVF format ##gff-version 3 ##gvf-version 1.02 ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090 ##genome-build NCBI MGSCv36 ##assembly-name MGSCv36 ##assembly-accession GCF_000001635.15 ##file-date 2011-11-18 # Study_accession: Combined studies on MGSCv36 # Display_name: Combined studies on MGSCv36 # Study_description: Combined studies on MGSCv36 chr1 dbVarcopy_number_variation 90044442 90114410 . . . ID=nsv433533;Name=nsv433533;Start_range=.,90044442;End_range=90114410,. chr4 dbVarcopy_number_variation 121483931 121646639 . . . ID=nsv433534;Name=nsv433534;Start_range=.,121483931;End_range=121646639,. chr9 dbVarcopy_number_variation 109128634 109146964 . . . ID=nsv433535;Name=nsv433535;Start_range=.,109128634;End_range=109146964,. chr17 dbVarcopy_number_variation 30240627 30614866 . . . ID=nsv433536;Name=nsv433536;Start_range=.,30240627;End_range=30614866,. chr17 dbVarcopy_number_variation 30983722 31036099 . . . ID=nsv433537;Name=nsv433537;Start_range=.,30983722;End_range=31036099,. chr17 dbVarcopy_number_variation 34907088 34962504 . . . ID=nsv433538;Name=nsv433538;Start_range=.,34907088;End_range=34962504,.

  20. Derived data http://www.ncbi.nlm.nih.gov/dbvar http://www.ebi.uk/dgva http://www.ncbi.nlm.nih.gov/snp

  21. Derived data

  22. Actual data

  23. Getting exponential growth under control

  24. Trace Organization SRA Organization seq1 FASTA Experiments Quality Chromatogram Experimental info Samples Sample Sequences and Qualities seq2 FASTA Quality Chromatogram Experimental info Sample

  25. Era of NGS Explosion FASTQ Era Bits/Base Era As of April 10, 2012 SRA contains less bytes then bases

  26. New CycleDecision Circle Increases the number of data series • BAM and similar formats containing both raw reads and alignments become primary output of raw sequencing Compression By Reference reduces sizes of other data series New compression algorithms New sets of tradeoffs

  27. Analyzing New Compression MethodData from 1000 Genome Project

  28. Changes To SRA Run Browser

  29. http://aws.amazon.com/datasets/4383

  30. https://main.g2.bx.psu.edu/

  31. http://www.genomespace.org/

  32. Science 1 July 2011: Vol. 333 no. 6038 pp. 53-58 DOI: 10.1126/science.1207018

  33. Li et al., 2011, Figure 1

  34. Li et al., 2011 Fig. 2

  35. Kleinman et al., 2012 Fig 1

  36. Kleinman et al., 2012 Table 1

  37. Lin et al., 2012 Fig 1

  38. Lin et al., 2012 Fig 2

  39. Pickrell et al., 2012 Fig 1

  40. Li et al, 2012 Fig 1

  41. Li et al., 2012 Fig 2

  42. Li et al., 2012 Fig 3

  43. Li et al, 2012 Fig 4

More Related