1 / 30

Current Sequencing Technologies and Data Generation

Current Sequencing Technologies and Data Generation. Corbin Jones & Piotr Mieczkowski Department of Biology, College of Arts and Sciences, Carolina Center for Genome Sciences Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill.

magnar
Download Presentation

Current Sequencing Technologies and Data Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current Sequencing Technologies and Data Generation Corbin Jones & Piotr Mieczkowski Department of Biology, College of Arts and Sciences, Carolina Center for Genome Sciences Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill

  2. Next-generation Sequencing (Deep Sequencing) Platforms • Short reads • Genome Analyzer IIx (GAIIx), HiSeq2000, HiSeq2500, MiSeq – Illumina • SOLiD 5500xl System – Applied Biosystem • HeliScope™ Single Molecule Sequencer - Helicos • Long reads • Genome Sequencer FLX System (454) – Roche • PacBio RS - Pacific Bioscience • Personal Genome Machine, Ion Proton - Ion Torrent • GridION – Oxford Nanopore • Mapping sequences to large DNA fragments • NABsys • Bionanomatrix

  3. UNC – HTSF • 9 HiSeq 2000/2500 • 1 GA II • PacBio • Ion Torrent • MiSeq (Jeff Dangl) Liz Buda and Donghui Tan Also on campus: 454 (Microbiome) 454 jr. (Viral genomics) MiSeq – Kevin Weeks

  4. What type of sequencing should I choose for the Illumina sequencing project? • HiSeq 2000/2500 – 100-160mln single end sequencing reads per lane. • - ChIPseq – Single End 50 cycles (2-3 human samples per lane) • - RNAseq – Single End 50 cycles (2-3 human samples per lane) • If you are interested in splicing variants and fusion genes both Single End 100cycles and Paired End 2x50cycles will be better option for you. • Whole Genome Sequencing – Paired End 2x100cycles (2-3 lanes per genome) • Exome Capture - Paired End 2x100cycles (4 samples per lane) • MiSeq – 3-7 mln single end sequencing reads per lane. Custom projects , fast turnaround. • Metagenomics - 16S profile – Paired End 2x150cycles up to 24 samples per lane. • Whole Microbial Genome Sequencing - Paired End 2x150cycles

  5. SHORT READ PLATFORMS at UNC HiSeq 2000 Initially capable of up to 600Gb per run in 13 days. Cost of resequencing one human genome: Now UNC PI - (30x coverage) about $6,000 Now for outside of UNC - (30x coverage) about $9,000 HiSeq 2500 Initially capable of up 100Gb per run in 27hours. Cost per genome - ???

  6. MiSeq • Small capacity system. PE 2x150cycles in 27hours. • PE 2 x 250bp coming soon – error rate for read 1 – less than 1%; read 2 about 1.2%. • In preparation – PE 2 x 400bp – error rate for read1 about 2%; read 2 about 4%. • In preparation – Longer insert size possible 1.5kb

  7. PacBio RS • Single molecule resolution in real time • Short waiting time for result and simple workflow • Generate basecalls in <1 day • Polymerase speed ≥1 base per second • No amplification required • Bias not introduced • More uniform coverage • Direct observation • Distinguish heterogeneous samples • Simultaneous kinetic measurements • Long reads • Identify repeats and structural variants • Less coverage required • Information content • One assay, multiple applications • Genetic variation (SVs to SNPs) • Methylation • Enzymology • C2 chemistry – installed March 2012 • Long reads 6-10kb • Meidan size of molecules 3kb • Still 15% error rate • No strobe sequencing • Software focus on: • De novo assembly • Hi quality CCS consensus reads • In preparation • Load long molecules by magnetic beads • Modified nucleotides detection

  8. Standard Sample Preparation Circular Consensus PacBio RS – two sequencing modes LS – long sequencing reads • Large insert sizes (2kb-10kb) • Generates one pass on each molecule sequenced CCS – high quality sequencing reads • Small insert sizes 500bp • Generates multiple passes on each molecule sequenced

  9. Example Data: 1 smart cell Pre-Filter # of Bases 180,320,136 bp Post-Filter # of Bases 165,424,592 bp Pre-Filter # of Reads 75153 Post-Filter # of Reads 52801 Pre-Filter Mean Readlength 2399 bp Post-Filter Mean Readlength 3133 bp Pre-Filter Mean Read Quality 0.624 Post-Filter Mean Read Quality 0.827 % Adapter Dimer (0-10bp) 1.94 % % Short Insert (11-100bp) 0.47 %

  10. Personal Genome Machine – Ion Torrent (life technologies) Three types of semiconductor chips: 314 – 20Mb 316 - 200Mb 318 – 1Gb Read length depends on base composition 200-250bp (200cycles) System is enabled for Paired End 2x100cycles The fastest sequencing system on the market. How it works: H+ ion is released during base incorporation. Individual polymerases attached to beads are positioned in tiny wells that rest on a tiny pH meter. • Recommendation: • Resequencing applications which require fast turnaround of samples • - Amplicons (PCR products) • Small and medium size genomes • Custom DNA capture applications

  11. PGM/Ion Torrent Data 316 chip Thr. Total Number of Bases [Mbp] 77.65 ‣ Number of Q17 Bases [Mbp] 36.11 ‣ Number of Q20 Bases [Mbp] 27.33 Total Number of Reads 368,860 Mean Length [bp] 211 Longest Read [bp] 380

  12. Library Preparation from Low Quantities of DNA or RNA Microfluidics stationary and portable systems Mondrian SP System – NuGEN Technologies • Human libraries from 5ng of total DNA. Only 10-15% of duplicate reads. • Ultralow DNA library systems • Soon: • Ultralow RNA library systems • Libraries from total RNA with rRNA depletion. Advanced Liquid Logic from RTP

  13. Emerging Sequencing Technologies Semiconductor sequencing chip Nanopore / Nanochannel sequencing

  14. Ion Proton System • Human genome in one day • Cost of reagents $1000 per run • Error rate around 1.2% • Human Genome, RNAseq, ChIPseq Ion Proton Chip I – 10Gb (Whole Exome capture experiments) Ion Proton Chip II – 100Gb Whole human Genome resequencing

  15. Oxford Nanopore – new view on sequencing Hemolysin – pore - inner diameter of 1nm, about 100,000 times smaller than that of a human hair.

  16. Oxford Nanopore DNA sequencing Error rate 4%, prediction for end of the year 0.1 – 2%.

  17. Nanopore array

  18. Oxford Nanopore – new concepts MinION • - 150Mb per run • - Tested 48kb read length • $900 per instrument • 500 pores per device GridION • - XXXMb per run • - Tested 48kb read length • $XXX per instrument • 2000 pores per device, soon 8000 pores • Cost per human genome $1500.

  19. Oxford Nanopore – applications • DNA sequencing • Protein detection • Protein DNA interaction • Small molecule detection • 96 well plates for 96 samples • Controlled time of sequencing

  20. Intelligent BioSystems Mini20 System (manufactured by Azco Biotech) • Amplification by rolony method • Sequencing by Synthesis with announced 100 base reads, but expect to compete with Sanger down the road • Designed for clinical labs • 20 independent flow cells, no queue for loading, run asynchronously • 20M reads/flow cell, 4 GB/ flow cell • Potential problems with repeats • System cost $120K, $150 flow cell (disposable), full costs per sample not clear yet. • Entering early access now, expect commercial shipping late 2012

  21. Genia Technologies • Very early stage announcement – Backed by Life Technologies (at least 1 year away) • Describe system as a cross between Ion Torrent and Oxford Nanopore • Electronic “Active Control” technology enables highly efficient nanopore-membrane assembly and control of DNA movement through the channel • Initially used α-Hemolysin and claimed 98% raw accuracy with that but now are using an undisclosed pore for further development. • Claim sensitivity 1-2 orders of magnitude greater than Oxford Nanopore. • Ramping up pore density to 100K pores/chip by end of 2012. • Plan to market a mobile reader for <$1K and per sample costs <$100 • Plan early access in late 2012, commercial shipment 2013

  22. Basic RNAseq • Type 1: Description of trancriptome • Assembly of transcripts/isoforms • Annotation of genes • Type 2: “Paired” e.g. treatment vs control • Differential expression • Differential transciption • Type 3: Population • Elements of 1 and 2, but “random effects” • TCGA roughly fall into this category

  23. Strand Specific RNAseq • Perkins et al 2009, Levin et al 2010 • Goal: To mark the RNA molecules in order to know the direction of transcription. • differentiate anti-sense transcripts, lncRNAs, mRNAs etc. • Many methods, dUTP may be best, Illumina has kit

  24. End tagged RNAseq • GOAL: Identify ends of transcripts by attaching adaptors to ends of mRNAs • can be used in strand specific protocols • can be used in annotation and assembly protocols AAAAAA mG

  25. Normalized RNAseq • GOAL: To even the distribution of transcripts sequenced • Reduce the representation of high abundance transcripts and increase sensitivity to low abundance

  26. Normalized RNAseq 2 • Methods • Kinetic (Patanjali et al 1991, Bonaldo et al 1996) • dsDNA nuclease (Zhulidov 2004) • Cap-Trapper (Carninci et al 2000) • Results • Abundant transcripts reduced proportional to freq • Coverage still proportional to expression • Problems: bias, contamination w/ ncRNA

  27. Total RNAseq • Goal: Sequence every RNA molecule in the cell • Observe: unspliced RNAs, small RNAs, non-coding RNAs, tRNAs • Must remove rRNA! • Variants: Nuclear only, cytoplasmic only, mRNA removal

  28. small RNA • GOAL: Small RNAs are important for gene regulation, synthesis, splicing, and immunity (miRNA/miR, snRNA, snoRNAs, scaRNAs) • Several protocols (e.g. Illumina, Morin et al 2010) • All involve size selection, which can lead to bias • Produce short sequences that are then mapped back to the genome. • Aside, seem more Poisson like than other counts

  29. RIPseq/CLIPseq/HITS-CLIP • GOAL: Identify the sites on the RNA where RNA binding proteins are bound. • e.g. Components of the spliceosome • protocol is similar to ChIPseq except there is a random hexamer ds-cDNA synthesis step • refs: Khalil et al 2009, Sanford 2009, Licatalosi 2008, Zhang and Darnell 2011

More Related