410 likes | 676 Views
Ultra-High Throughput DNA Sequencing on the 454 FLX. Principles & Applications Graham Wiley. 150,000,000. 454/Roche GS-FLX. A Brief History of Automated DNA Sequencing Instruments. 454-GS20. 64,000,000. ABI 3730. ABI 370/377. ABI 3700. 2007. 454 GSFLX Sequencer.
E N D
Ultra-High Throughput DNA Sequencing on the 454 FLX Principles & Applications Graham Wiley
150,000,000 454/Roche GS-FLX A Brief History of Automated DNA Sequencing Instruments 454-GS20 64,000,000 ABI 3730 ABI 370/377 ABI 3700 2007
454 GSFLX Sequencer • Pico-scale sequencing reactions • 2 Core Techniques: • Emulsion PCR • Pyrosequencing
Emulsion PCR • Micro-reactors • Water-in-oil emulsion generates millions of micelles. • Each micelle contains all reagents/templates for a PCR reaction. • ~10 Million individual PCR reactions in a single tube.
General overview of 454 DNA preparation protocol Nebulization DNA End Repair 5’ 3’ 3’ 5’ 5’ 3’ Adaptor Ligation (A&B) 3’ 5’ A B 5’ 3’ DNA End Repair 3’ 5’ A B Library Quantification on Caliper
General overview of 454 DNA preparation protocol 3’ 5’ Library dsDNA 3’ 5’ B A Emulsify with DNA Beads, Primers, and Reaction Mix 16B’=A’ 5’ 3’ 5’ 3’ 3’ 5’ A’ primer is 5’ biotinylated B’ A’ 3’ 5’ 94oC Hot Start 5’ 3’ 5’ 3’ 3’ 5’ B’ A’
General overview of 454 DNA preparation protocol 3’ 5’ Free B’ Extends Along Free Template 5’ Free Biotinylated A’ Extends Excess Biotinylated A’ Strand Anneals to Bound B’ and Extends 5’ [ ] X10^6 5’ emPCR Final Result 3’
General overview of 454 DNA preparation protocol [ ] emPCR Final Result X10^6 5’ 3’ Bind to Streptavidin Coated Enrichment Bead M A G N E T Use Magnet to Sequester Beads Then Melt the DNA Anneal Sequencing Primer 3’ 5’ Load on 454
Load DNA Positive Beads into 454 Plates with 1.6 Million Wells • Each chamber is filled with DNA beads and sequencing enzymes
44 μm Load Beads into 454 Plate Load Enzyme Beads Load beads into PicoTiterPlate Centrifugation
Pyrosequencing • Sequence one base at a time Template + dNTP Template-dNMP + PPi • Polymerase adds nucleotide (dNTP) • Pyrophosphate is released (PPi) • Sulfurylase creates ATP from PPi • Luciferase hydrolyses ATP to oxidize luciferin and produce light Polymerase
PP i Pyrosequencing DNA Bead dTTP • Polymerase adds • nucleotide (dNTP) (1) Polymerase A A T C G G C A T G C T A A A A G T C A T APS Annealed Primer (2) • Pyrophosphate • is released (PPi) Sulfurylase Luciferase ATP (3) • Sulfurylase creates ATP • from PPi and APS Enzyme Bead (5) luciferin (4) CCD camera detects bursts of light • Luciferase hydrolyses ATP • to oxidize luciferin and • produce light Light + oxy luciferin
Output Assembly • Raw data is series of images T C • Each well’s data is extracted, quantified and normalized G A T dNTP Base Addition • Read data converted into “flowgrams”
Basecalling via Flowgram TTCTGCGAA
Types of Libraries • 454/Roche • Shotgun • Random 200+bp reads • Paired End • 25-50bp ends of a circularized DNA molecule • Amplicon • PCR product for SNP discovery • Roe Lab • Paired-End/Shotgun • Best of both worlds
General overview of 454 Paired End/Shotgun DNA preparation protocol Hydroshear Quantitate on Caliper AMS-90 DNA End Repair & Linker Ligation Cut Terminal Linkers with EcoR1 and Ligate Ends Together Circularized DNA Nebulize
General overview of 454 Paired End/Shotgun DNA preparation protocol Quantitate on Caliper AMS-90 DNA End Repair, Adaptor Ligation, Adapter End Repair Amplification (emPCR) Pyrosequencing (on 454-GSFLX)
454 Paired-End/Shotgun • Separate Sequences • Linker? • Left and Right? • ~3-5% • Triple Assembly • 100 flows, 84 flows, 62 flows • Convert Paired Ends for Exgap • *.454f and *.454r
Applications • Whole Genome Sequencing • Sample Pools • BACs • Plant Viruses • EST Libraries
Xanthamonas campestris • ~5MB genome • 47.9MB total sequence • ~10x coverage
Medicago BAC Pooling Strategy • Massively parallel sequencing • Inefficient for 1 BAC at a time • Many BACs at the same time • Grow 10-24 BACs and pool cells in equal volumes • Isolate DNA together • Generate 454 library and sequence together
Medicago BAC Pooling Strategy Pool 2 15x Random Coverage on 454 Pool 1 • Compare reads from the two pools • Assign common contigs to common BAC • Check BAC assignments • Anchor orphan contigs 15x Random Coverage on 454
Two BAC pools on 454 • Extra flows • Pooled Reagents • Bases obtained
Plant Viruses • Single or double stranded RNA • Typically <10,000bp, ~12,000bp max. • 4-12 encoded genes • Inherent instability of RNA leads to large amount of mutations, hence, large species variation
cDNA pooling strategy • Tags on PCR primers will allow for deconvolution of viral sequences post sequencing • cDNA samples will be pooled in sets of 20 at the Noble Foundation and sent to OU for sequencing
5’ 3’ 3’ 5’ NNNNNN CCTTCGGATCCTCC CCTCCTAGGCTTCC NNNNNN CCTTCGGATCCTCC NNNNNN CCTCCTAGGCTTCC NNNNNN NNNNNN CCTCCTAGGCTTCC Strategy for preparing cDNA ready for 454 sequencing from dsRNA Anneal with Random Hexamer Primers followed by Reverse Transcriptase PCR Reaction 5’ 5’ 3’ + 5’ 5’ 3’ 5’ Additional Rounds of RT PCR with Random Hexamer Primers 3’ 5’ + 5’ 3’ CCTTCGGATCCTCC NNNNNN RNAse Treatment to Remove any Excess Random Hexamer Primers followed by a Taq Polymerase PCR with one of the 20 Tagged Primers 3’ 5’ 5’ 3’ CCTTCGGATCCTCC GGAAGCCTAGGAGG NNNNNN CCTCCTAGGCTTCCGAGA + 3’ 5’ 5’ 3’ CCTCCTAGGCTTCC NNNNNN GGAAGCCTAGGAGG AGAGCCTTCGGATCCTCC Amplified Product Ready for Ligating 454 A and B Primers 5’ A AGAGCCTTCGGATCCTCC B CCTCCTAGGCTTCCGAGA
Average Read Lengths and Total Bases for 4 region/half-plate run of tgp_p01-4
Sample Statistics Sample bead read length data Control bead read length data Sample bead quality data Control bead quality data
RT-PCR Sequence TGP common primer (CCTTCGGATCCTCC) 454 tag (TCAG) TGP Unique tag (GACA) Uniquely Tagged cDNA Sample from the TGP on the 454
Index starting location = 0 Index length = 4 Minimum contigs/reads per index = 41 19819 contigs read from fasta_input_file='EJ2SBFV03.sff.fna' Creating index files Tag Count Filename AAAA 4 AAAC 3 AAAG 163 EJ2SBFV03.index_AAAG AAAT 1 AACA 22 AACC 5 AACT 7 AAGA 2 AAGC 5 AAGG 931 EJ2SBFV03.index_AAGG AAGT 9 AATA 5 AATC 2 AATT 3 ACAA 5 ACAC 1152 EJ2SBFV03.index_ACAC ACAG 2 ACAT 2 ACCA 5 ACCC 18 ACCG 4 ACCT 14 ACGA 3 ACGC 6 ACGG 4 ACTA 3 ACTC 1124 EJ2SBFV03.index_ACTC ACTG 3 TCTG 419 EJ2SBFV03.index_TCTG TCTT 2 TGAA 1 TGAC 11 TGAG 778 EJ2SBFV03.index_TGAG TGAT 2 TGCA 1 TGCC 14 TGCG 2 TGCT 6 TGGA 3 TGGC 1 TGGG 5 TGGT 6 TGTA 2 TGTC 18 TGTG 639 EJ2SBFV03.index_TGTG TGTT 4 TTAA 3 TTAG 3 TTAT 1 TTCA 3 TTCC 6 TTCG 10 TTCT 6 TTGA 5 TTGC 3 TTGG 8 TTGT 2 TTTC 3 TTTG 3 TTTT 8 -bad- 10 10 contigs with bad (non-ACGT) tags were dropped 217 tags containing a total of 1378 contigs were skipped, because each tag had fewer than 41 contigs 23 index files created containing 18431 contigs
TGP Pools 1-8 Minimum Tags Reads Pool Reads Indexed Used Unused 1 100 20 28557 2275 2 125 20 29641 2145 3 41 23 18431 1388 4 100 21 21319 2206 5 75 20 3540 725 6 100 21 6796 921 7 70 23 5687 799 8 88 22 6533 696 • 1/8th vs 1/4th Half-plate • Indexed reads
EST Libraries Clontech SMART cDNA Library Construction Kit CDS III/3' PCR Primer 5'-ATTCTAGAGGCCGAGGCGGCCGACATG-d(T)30N–1N-3' (N = A, G, C, or T; N–1 = A, G, or C)