1.3k likes | 2.9k Views
Next Generation Sequencing Platforms Sequencing by synthesis (SBS) 454/Pyrosequencing Illumina/Solexa Helicos Pacbio (Charge-based detection system, Now-sequencing) Sequencing by hybridization SOLiD. Pyrosequencing: Sequencing-By-Synthesis.
E N D
Next Generation Sequencing Platforms • Sequencing by synthesis (SBS) • 454/Pyrosequencing • Illumina/Solexa • Helicos • Pacbio • (Charge-based detection system, Now-sequencing) • Sequencing by hybridization • SOLiD
454 Sequencing System Chemistry & Platform Sequencing-by-synthesis Library Preparation Emulsion Based Clonal AmplificationSOLiD Bioanalyzer Chemistry & Platform Library Construction Emulsion PCRApplications Fragment Sequencing Transcriptome Studies Paired-End Sequencing Targeted sequencing Small RNA UCSC Sequencing Center
Overview of The 454 Sequencing System 1) Prepare Adapter Ligated ssDNA Library (A-[insert]-B) 2) EmPCR: Clonal Amplification on 28 µ beads followed by enrichment 3) Load beads and enzymes in PicoTiter Plate™ 4) Perform sequencing-by-synthesis on the 454 Sequencer UCSC Sequencing Center
Emulsion Based Clonal Amplification A + PCR Reagents + Emulsion Oil B Micro-reactors Mix DNA Library & capture beads (limited dilution) Create “Water-in-oil” emulsion Adapter carrying library DNA “Break micro-reactors” Isolate DNA containing beads Perform emulsion PCR • Generation of millions of clonally amplified sequencing templates on each bead • No cloning and colony picking UCSC Sequencing Center
44 μm Depositing DNA Beads into the PicoTiter™Plate Load Enzyme Beads Load beads into PicoTiter™Plate Centrifuge Step UCSC Sequencing Center
PP i Sequencing By Synthesis Sequencing-By-Synthesis • Simultaneous sequencing of the entire genome in hundreds of thousands of picoliter-size wells • Pyrophosphate signal generation DNA Capture Bead Containing Millions of Copies of a Single Clonal Fragment A A T C G G C A T G C T A A A A G T C A T Anneal Primer Sulfurylase Luciferase APS ATP luciferin Light + oxy luciferin UCSC Sequencing Center
Sample input: Genomic DNA, BACs, amplicons, cDNA Generation of small DNA fragments via nebulization Ligation of A/B-Adaptors flanking single-stranded DNA fragments Emulsification of beads and fragments in water-in-oil microreactors Sequencing and base calling Sequencing Workflow Overview One Fragment One Bead Clonal amplification of fragments bound to beads in microreactors One Read 400,000 reads per run UCSC Sequencing Center
Sequencing DNA library preparation and titration emPCR 4.5 h and 10.5 h 8.0 h 7.5 h Sequencing Workflow Library Preparation sstDNA library created with adaptors A/B fragments selected using streptavidin-biotin purification Genome fragmented by nebulization UCSC Sequencing Center
DNA library preparation and titration Sequencing emPCR 4.5 h and 10.5 h 8.0 h 7.5 h Sequencing Workflow Emulsion PCR Emulsion-based clonal amplification Anneal sstDNA to an excess of DNA Capture Beads Break microreactors, enrich for DNA-positive beads Emulsify beads and PCR reagents in water-in-oil microreactors Clonal amplification occurs inside microreactors UCSC Sequencing Center
Sequencing WorkflowLoading of PicoTiterPlate Device • Well diameter: average of 44 µm • > 400,000 reads obtained in parallel • A single clonally amplified sstDNA bead is deposited per well Depositing DNA beads into the PicoTiterPlate device Amplified sstDNA library beads Quality filtered bases UCSC Sequencing Center
Flowgram Key sequence DNA library preparation and titration Sequencing emPCR 4.5 h and 10.5 h 8.0 h 7.5 h Sequencing WorkflowSequencing by Synthesis • Bases (TACG) are flowed sequentially and always in the same order (100 times for a large GS FLX run) across the PicoTiterPlate device during a sequencing run. • A nucleotide complementary to the template strand generates a light signal. • The light signal is recorded by the CCD camera. • The signal strength is proportional to the number of nucleotides incorporated. UCSC Sequencing Center
4-mer Flow Order TACG 3-mer 2-mer 1-mer Key sequence = TCAG for signal calibration GS FLX Data AnalysisFlowgram Generation Flowgram TTCTGCGAA UCSC Sequencing Center
Image capture Image processing Signal processing GS Run Browser GS De Novo Assembler GS Reference Mapper GS Amplicon Variant Analyzer GS FLX Data AnalysisOverview UCSC Sequencing Center
GS FLX System PerformanceRead Length UCSC Sequencing Center
The Genome is comprised of repeat regions • Depending upon the specific genome characteristics, microreads (~25 bp’s) cover only a portion of the genome • In human – 25 base pair reads can only be mapped uniquely to 80% of the genome • Short reads are limiting in known genomes – What about unknown genomes? • Mapping versus de novo assemblies • Mapping will miss genome rearrangements • Mapping is only as good as the reference UCSC Sequencing Center
Why Does Length Matter? • Longer sequencing reads mean more applications • Identify and characterize small and short RNA’s • Full length cDNA sequencing for expression levels and variations • Amplicon resequencing for genetic variation including somatic mutations • Sequencing of micro-organisms in a single instrument run • Sequencing of complex genomes – mammalian & plant • Sequencing of complex samples – Metagenomics, Ancient DNA UCSC Sequencing Center
Longer sequencing reads mean more applications • HIV Studies (3) • ChIP-Sequencing (8) • Boyle et al, Cell: Mixed technologies for mapping open chromatin • Metagenomics (12) • Palacios et al, New England Journal of Medicine, Pathogenic Virus Detection • Whole Genome Sequencing (30) • Velasco et al, PLoS: Pinot Noir Genome • Paired-End sequencing • Detecting Structural Variations across two human genomes • Technology and Bioinformatics (11) • Meyer et al, NAR,: Using Picogram quantities of sample • Transcriptome studies – cDNA (17) • Small RNA (32) • Amplicon and Methylation Studies (9) UCSC Sequencing Center
Applications of Whole Genome, Ultra Broad and Ultra Deep HT- Sequencing HT- Sequencing Technology Applications Whole Genome Ultra Broad Ultra Deep Sequencing Sequencing Sequencing Small RNA Virus HIV Resistance Tropism / SAGE CAGE Bacteria Expression Fungus Amplicons Transcriptome Higher Eukaryotes Metagenomics Population Biology Human Novel strain ID Bacterial 16S HLA Typing UCSC Sequencing Center
The power of Metagenomics • How to Identify an environment based upon the microbial organisms that are present • Microbial Population Structures in the Deep Marine Biosphere • Huber et al., Science, 318, p97, 2007 • Determining the state of an environment based upon the presence and mixture of microbial organisms • The interdependence of Coral and it’s microbial environment • Wegley et al., Environmental Microbiology, 9, p2707, 2007 • Detecting viral pathogens – quickly and accurately • Less than 12 months from first identification of affected hives to possible pathogen • Cox-Foster et al., Science, 2007 • Transplant victims from Australia • Palacios et al, New England Journal of Medicine, 2008 UCSC Sequencing Center
GS FLX (clonal sequencing ensured through emPCR) emPCR Time: Days Sequencing cDNA libraries (short tag library, EST library) Sanger (E. coli cloning, often concatemerization) cDNA libraries (short tag library, EST library) Sequencing Grow, pick colonies Concatemerization, insert fragments into vectors and clone into bacteria Template Generation Time: Weeks Transcriptome AnalysisWorkflow Comparison UCSC Sequencing Center
Sequencing of approximately 400,000 small RNAs from C. elegans • Another 18 unknown miRNA genes were detected • Thousands of endogenous siRNAs acting preferentially on transcripts associated with spermatogenesis and transposons were identified • A new class of small RNAs was identified: 21U-RNAs. They all begin with an U and are precisely 21 nt long. UCSC Sequencing Center
Multiplex Identifier Basics • What is it? • Two new kits, each with 6 different library adapters (total of 12 adapters) • Each MID library adapter has an added, specially encoded 10-base region • Used to “bar-code” up to 12 different genomic library samples to be run in the same region of a single sequencing run Standard Library Seq. primer Read Primer A Key Library fragment Primer B #bases: 40 4 MID Library Read Seq. primer Primer A Key MID 1 Library fragment Primer B #bases: 15 4 10 Primer A Key MID 2 Library fragment Primer B Primer A Key MID n Library fragment Primer B UCSC Sequencing Center
Paired-End Applications • ~100 bp sequencing tags separated by 3 kb spacing • Use for de novo assembly • Order contigs • Use for Structural Variation studies • Inversions, Deletions, Insertions… • High resolution detection – 3kb spacing vs 10 to 40 kb UCSC Sequencing Center
Paired-Ends workflow UCSC Sequencing Center
Exon 2 Exon 3 Exon 5 Exon 1 Exon 4 gDNA Fragment and hybridize to NimbleGen capture array Elute Analyze Exon Sequences HT-Sequencing Targeted Enrichment of Human gDNA UCSC Sequencing Center
Sequencing all the known exons from the human genome • “Direct selection of human genomic loci by microarray hybridization,” • Albert et al., Nature Methods, (4) 11, 903 -905, 2007 • ~6,700 gDNA loci selected • BRCA1 region 2 MB Region UCSC Sequencing Center
Another Sequence-Capture Example • 19 Kb region from Chromosome 4 Targeted Exons Seq-Cap Array Probes GS FLX Seq Reads Sequencing Coverage UCSC Sequencing Center
SOLiD Library Preparation The SOLiD system uses either a fragment library or a mate-paired library depending on the user’s desired information or application. UCSC Sequencing Center
Emulsion PCR and Bead Enrichment PCR takes place in oil in water microreactors. Post-PCR, templated beads are separated from non-templated beads, and modified at the 3’ end to allow covalent linkage to the SOLiD sequencing slide. UCSC Sequencing Center
Bead Deposition Beads are deposited into 1,2,4, or 8 segmented chambers on a slide. UCSC Sequencing Center
Sequencing By Ligation and Data Analysis Primers hybridize to adaptors and a set of 4 dye labeled probes competes for ligation to the primer with probe specificity determined by the 4th and 5th base interrogation during each ligation series, for 5-7 rounds. After each round of ligation, a new primer offset by 1 base is hybridized for a new round of ligations. 25-35bp are generated through 5 sequential primer reset and ligation rounds. UCSC Sequencing Center
Library Construction 2 different libraries can depending on the application and desired information. UCSC Sequencing Center
Fragment Library DNA is fragmented and PCR primer adaptors are ligated to the DNA UCSC Sequencing Center
Mate-Pair Library DNA is sheared, selected for a desired input size, and circularized around an internal adaptor. UCSC Sequencing Center
Mate-Pair Library (cont.) The circularized DNA is enzymatically cleaved to yield 2 DNA fragments separated by an internal adaptor. PCR primer adaptors are ligated on to the end of this piece of DNA. UCSC Sequencing Center
Emulsion PCR (ePCR) PCR takes place in oil in water microreactors containing P1-coupled beads, templates, primers, and all required PCR reaction components.. UCSC Sequencing Center
ePCR UCSC Sequencing Center
Emulsion PCR yields both monoclonal (unique templates and polyclonal beads (multiple templates), as well as some non-templated beads. UCSC Sequencing Center
Clonal Amplification UCSC Sequencing Center
Post-Emulsion and ePCR UCSC Sequencing Center
Bead Enrichment Templated beads are separated from non-templated beads via polystyrene beads UCSC Sequencing Center
Pre- and Post-Bead Enrichment P2-hybridization UCSC Sequencing Center
Bead Deposition Templated beads are modified at their 3’-end and covalently attached to a glass slide. UCSC Sequencing Center
Slide Configurations UCSC Sequencing Center
SOLiD Sequencing Chemistry UCSC Sequencing Center
4-color Ligation Reaction UCSC Sequencing Center
A complementary dye-labeled probe hybridizes is ligated to the universal sequencing primer. UCSC Sequencing Center