1 / 41

Next-Generation Sequencing of Microbial Genomes and Metagenomes

Next-Generation Sequencing of Microbial Genomes and Metagenomes. Christine King Farncombe Metagenomics Facility Human Microbiome Journal Club July 13, 2012. Overview. Next-generation sequencing Applications Instruments Library prep and sequencing chemistry Sequence quality

taariq
Download Presentation

Next-Generation Sequencing of Microbial Genomes and Metagenomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next-Generation Sequencing of Microbial Genomes and Metagenomes Christine KingFarncombeMetagenomics Facility Human Microbiome Journal Club July 13, 2012

  2. Overview • Next-generation sequencing • Applications • Instruments • Library prep and sequencing chemistry • Sequence quality • Project overview • Microbial genomes • Microbial communities

  3. DNA Sequencing • 1st generation • Sanger chain termination • Capillary electrophoresis • 2nd generation (NGS) • High throughput, “massively parallel” • Shorter reads • Sequencing-by-synthesis • 3rd generation • Single molecule • Nanopores

  4. Applications • DNA sequencing • De novo genomes • Resequencing • Shotgun (e.g. mutant strains) • Amplicon (e.g. HLA, cancer) • Sequence capture (e.g. exome) • Metagenome • Amplicon (e.g. 16S, COI, viral) • Shotgun • ChIP • RNA sequencing • Gene expression • Gene annotation, splice variants • Metatranscriptome

  5. Instruments

  6. Instruments

  7. Which instrument(s) to use? • Read length vs number of reads • Cost per base, per sample, per project (multiplexing?) • Accuracy • Run time, wait time

  8. Library Preparation • Goal: fragments of DNA, each end flanked by adaptor sequences • Adaptors contain amplification- and sequencing primer binding sites; platform- and chemistry-specific • Optional: sample-specific barcodes/indexes/MIDs/tags allow multiplexing during sequencing • Library QC: quantity, size

  9. Library Preparation • Library types: • Shotgun (DNA) • May begin with ChIP • May follow with sequence capture • Mate pair (DNA) • Amplicon (DNA) • Total RNA • May enrich for mRNA (poly-A enrichment, rRNA depletion) • Convert to cDNA (then similar to DNA protocols) • Small RNA • RNA ligations, convert to cDNA after

  10. Library Preparation: Shotgun • Fragmentation • Sonication • Nebulization • Enzymatic • End repair • 3’ overhangs digested • 5’ overhangs filled • 5’ phosphate added

  11. Library Preparation: Shotgun • Adapter ligation • T-overhangs • Forked structure controls orientation • Library amplification • Few cycles • Enrich for correctly-adapted fragments • Required to complete adapter structure in some protocols • Size selection • Gel excision, AMPure beads • Limit insert size as needed, remove artifacts

  12. Library Preparation: Amplicon • Amplify region of interest using PCR • Primers contain adapter sequences

  13. Library Preparation: Mate Pair • Begin with large fragments (e.g. 3kb, 20kb) • Circularize and fragment again • Illumina: direct ligation • 454: Cre/Lox recombination • Enrich for fragments containing the junction • Proceed with shotgun library prep

  14. Library Preparation: Mate Pair • Why? Paired sequences are a known distance apart; improves genome assembly • Note: 454 calls these “paired end libraries”, not to be confused with Illumina’s “paired end sequencing”!

  15. Sequencing: Illumina • Cluster generation • Library fragments hybridize to oligos on the flow cell • New strand synthesized, original denatured, removed • Free end binds to adjacent oligos (bridge formation) • Complimentary strand synthesized, denatured (both tethered to flow cell) • Repeat to form clonal cluster • Cleave one oligo, denature to leave ssDNA clusters • ~800K clusters/mm^2

  16. Sequencing: Illumina • Variety of workflows: • Single- or paired end reads • 0, 1, or 2 index reads

  17. Sequencing: Illumina • At each cycle, all 4 fluorescently-labeled nucleotides pass over the flow cell • Each cluster incorporates one nt (terminator) per cycle • Fluor is imaged, then cleaved • De-block and repeat

  18. Sequencing: Illumina • Other terminology: • cBot – accessory instrument that performs cluster generation • Lanes – divisions (8) of HiSeq and GAIIx flow cells • PhiX – bacteriophage with small, balanced genome; PhiX library spiked in with samples for QC • Phasing/pre-phasing – nt incorporation falls behind or jumps ahead on a portion of strands in the cluster and contributes to noise • Chastity filter – measures signal purity (after intensity corrections); if the background signal is high, cluster will be discarded • BaseSpace – cloud computing site for processing MiSeq data • File format: fastq

  19. Sequencing: 454 • emPCR: clonal amplification of bead-bound library in microdroplets • Library input amounts critical! • One molecule per bead • Titration procedure

  20. Sequencing: 454 • Library capture: beads coated with complimentary oligo • Amplification: droplet contains PCR reagents and the other oligo • Post-PCR: millions of identical fragments attached to the bead

  21. Sequencing: 454 • Bead Recovery: physical and chemical disruption • Enrichment: capture successfully amplified beads using biotinylated primers + magnetic, streptavidin beads

  22. Sequencing: 454 • Deposit bead layers onto PicoTiterPlate: • Enzyme beads • Enriched DNA beads • More enzyme beads • PPiase beads

  23. Sequencing: 454

  24. Sequencing: 454 • Pyrosequencing • 4 nucleotides flow separately • If nt incorporation…PPi...light • APS + PPi (sulfurylase)  ATP • Luciferin + ATP (luciferase) light + oxyluciferin • Amount of light proportional to #nt incorporated • Rinse and repeat with next nt

  25. Sequencing: 454 • Camera captures light emitted from every well during every nucleotide flow

  26. Sequencing: 454 • Flowgram: representation of a sequence, based on the pattern of light emitted from a single well

  27. Sequencing: 454 • Other terminology: • Lib-L/Lib-A: adapter variants, “ligated” or “annealed” • Titanium chemistry: ~450 bp reads on all instruments • XL+ chemistry: ~700 bp reads on the FLX+ instrument • Flow: one of the four nucleotides flows over the PTP • Cycle: a set of four flows, in order • Valley flow: if number of bases incorporated in a given read during that flow is uncertain, e.g. 1.5 units of light (background signal, homopolymers) • File format: sff (standard flowgram format)

  28. Sequencing: Ion Torrent • Procedures and chemistry similar to 454 • Instead of PPi, measure H+ release (pH change) via semiconductor chip • No expensive camera or laser required, no modified nucleotides

  29. Sequence Quality • Error probabilities determined using training sets, platform-specific biases • Expressed as a quality value (QV or Q score) per base • Similar to PHRED scores: • Q = -10 log10P • P = 10 -Q/10

  30. Project 1: Microbial Genome • Considerations: • Reference genome? • How much coverage do I want? • How big is the genome • How much data do I need? • bp needed = genome size X coverage • Which instrument/chemistry configuration to use? • Coverage • Depth (number of times a particular base is “covered” by a read (e.g. 25X) • Breadth (% of genome with at least 1X coverage)

  31. Project 1: Microbial Genome • Sample preparation • Isolate high quality (not degraded) and high purity (no RNA) gDNA • Verify on a gel • Quantify using dsDNA-specific dye • Library preparation • Can do this yourself if you like • ~ $200 per sample for Nextera • Cheaper protocols • Cheaper in bulk • Barcode compatibility

  32. Project 1: Microbial Genome • Library QC • Insert size confirmed on BioAnalyzer (within range, no artifacts) • Poolbarcoded libraries (normalize based on PicoGreen quantification) • Absolute quantification of library pools using qPCR

  33. Project 1: Microbial Genome • MiSeq sequencing • Dilute and denature library pool (optimal concentration requires titration...) • Spike in PhiX library as needed (e.g. 1%) • Prepare and load reagents, flow cell • Basic filtering and de-multiplexing performed automatically • Download fastq files from BaseSpace

  34. Project 1: Microbial Genome • Data processing • Additional filtering • Trim the ends • Remove PCR duplicates • Assembly: overlapping reads are assembled to eachother based on sequence similarity = contigs

  35. Project 1: Microbial Genome • What’s next? • Polish the genome (hybrid assemblies, mate pair libraries) • Annotate (ORFs, RNA-seq) • Compare

  36. Project 2: Microbial Community • Shotgun metagenomics • Unbiased survey of community content • Random library fragments may provide very little taxonomic resolution (e.g. conserved, unknown) • Identify genes, classify by function • Targeted metagenomics • Limited survey of community content • Targeted loci provide excellent taxonomic resolution, but may exclude certain taxa • Identify OTUs, classify by taxonomy

  37. Project 2: Microbial Community • 16S rRNA • Multi-copy gene (1.5 kb) • Conserved and hypervariable regions • Extensive databases from known species

  38. Project 2: Microbial Community • Considerations: • Biases in sampling methods, culturing, DNA isolation, PCR...replicate • Available SOPs • How many reads per sample? • Read length matters! • Sample preparation: • Isolate DNA • PCR amplify, purify • High-fidelity polymerase • Barcoded primers • No primer dimers! • Normalize PCR products and pool

  39. Project 2: Microbial Community • 454 Sequencing • emPCR titrations with different library input • Bulk emPCR • Sequence • Basic filtering • Collect sff files • Data processing • De-multiplexing • Additional filtering • Trim the barcodes, primers • Check for chimeras

  40. Project 2: Microbial Community • Clustering • Sequences grouped by similarity = OTUs

  41. Project 2: Microbial Community • Taxonomic identification • OTUs are classifed by comparing to known 16S sequences • Level of classification (e.g. family vs genus)? • Diversity • Within sample • Between samples

More Related