310 likes | 667 Views
Sequencing tutorial. Peter HANTZ EMBL Heidelberg. Uni Osnabruck M. Waterman. Dideoxy (Sanger) sequencing. Principle: Gel electrophoresis: discrimination of 1 bp below ~1000 bp Synthesis: starts with a DNA oligo, stops after incorporating a (marked) ddNTP
E N D
Sequencing tutorial Peter HANTZ EMBL Heidelberg
Uni Osnabruck M. Waterman Dideoxy (Sanger) sequencing Principle: Gel electrophoresis: discrimination of 1 bp below ~1000 bp Synthesis: starts with a DNA oligo, stops after incorporating a (marked) ddNTP First ~ 60 bp uncertain (high relative mass of the fluo. dye) Radiolabeling: 4 reactions Dye-termination: 4 fluorescent dyes, one reaction
Pyrosequencing (Roche / 454) ds Bead I. Streptavidin coated Library construction A,B: short DNA oligos fused with genomic DNA segments B is biotinilated Selection of dsDNA: streptavidin-coated magnetic beads denaturation: AB strands collected www.454.com wiki
Pyrosequencing (Roche / 454) Bead II. Simple agarose beads coated with B oligos Single sstDNA (singles-stranded template DNA) with cA and cB oligo immobilized one on a bead Bead-bound library emulsified (water-in-oil) PCR reaction: One strand will be covalently bound to the bead www.454.com wiki
Pyrosequencing (Roche / 454) denaturation, one strand is released Following the selection of DNA-positive beads (enrichment), Beads+reactants in wells having a diameter of cca 40um www.454.com wiki
Pyrosequencing (Roche / 454) The reaction: -addition of dNTP-s: incorporation releases pyrophosphate (only one phosphate is needed for the backbone) -ATP sulfurylase converts PPi to ATP -luciferase: acts in the presence of ATP -Unincorporated nucleotides and ATP are degraded by the apyrase -400,000 reads in parallel -multiple consensus incorporations: >higher signal intensity >problematic... www.454.com wiki
Illumina (Solexa) sequencing -making DNA library (~300bp fragments) -ligation of adapters A and Bto the fragments -binding the ssDNA randomly to the flow cell surface -complementary primers are ligated to the surface Illumina-Fasteris
Illumina (Solexa) sequencing Bridge amplification: initiation On the surface: complementary oligos GeneCore
Illumina (Solexa) sequencing EMBL Gene Core
Illumina (Solexa) sequencing Data aquisition: sequencing by synthesis: “reverible terminator” nucleotides blocked + fluorescently labeled de-blocking to enable the synthesis dye cleavage+elimination wash step+repeat TGCA illumina.com
Illumina (Solexa) sequencing Mate-pair sequencing
Single Molecule Real Time Sequencing Principle: fluorescent label on the terminal phosphate of NTP-s DNA polymerase: cleaves this incorporation lasts ~ mS Detection: "Zero-Mode Waveguide" holes: near-field standing waves (~Total Internal Reflection ) Present performance: 1,500 bp in read lengths Wiki Pacific Biosciences
Assembling Shotgun sequencing The genome is fragmented randomly (sonication) No positional and orientatin information is available The fragments are sequenced The results have to be assembled Merging reads into contigs
www.bioalgorithms.info Graphs set of edges that connect pairs of nodes used to model pairwise relations between certain objects Bridges of Königsberg Leonhard Euler, 1735 Find a path that visits each bridge (=edge) once! Eulerian path problem: visit each edgeonce and only once: linear-time algorithm
Hamiltonian Path Problem www.ams.org Find a route that visits each node (=each airport) exactly once This is an NP (Non-Polynomial) -problem the amount of computation necessary, using the most efficient algorithms known at present, grows exponentially with the size of the route map
www.wolfram.com Traveling Salesman Problem Find the shortest path which visits every vertex exactly once. That is: the shortest Hamiltonian pathway This is also an NP-hard problem...
The Shortest Superstring Problem Problem: Given a set of strings, find a shortest string that contains all of them Input: Strings s1, s2,…., sn Output: A string s that contains all strings s1, s2,…., sn as substrings, such that the length of s is minimized Equivalent of: -finding the shortest Hamiltonian pathway -TSP
Graph Theory helps DNA assembly University of Maryland "Translation" of the problem: a model Nodes: reads Edges: connects nodes if the corresponding reads overlap Example: assembling a bacterial genome Red lines - wrong assembly Bold Black lines - good assembly Assembling the reads = finding the shortest Hamiltonian pathway = TSP = SSP NP...impossible...?
The Way Out: Constructing and analyzing de Bruijn Graphs Finding Eulerian paths in the de Bruijn graph can lead to sequence reconstruction Linear problem! J. Kaptcianos
Second-generation DNA sequencing "Sequencing by synthesis" methods (Solexa) 300bp [normal] - 10kb [mate-pair] (454) 1-10 kb, and 20 kb in expt. stage DNA Colonies amplified by PCR: “Polonies” (Solexa) isothermal extension "bridge PCR" note: even PCR-free! (454) emulsion PCR fluorescent imaging of the entire array Reads: (Solexa): ~50-80 (454): ~200-300 Nature Biotech, vol. 26
Illumina (Solexa) sequencing flow cell: Paired-end sequencing EMBL GeneCore
Directed graphs We assign a certain direction with the edges The Eulerian Path Problem can be re-formulated accordingly: Visit each edge 1! while passing along the edges in their direction Note: Eulerian path might not exist!
M. Waterman Examples: kezdet tenyleg legrovidebb-e Red: repeats (also known as Overlap-Layout-Consensus method)
The Way Out: Constructing and analyzing de Bruijn Graphs directed graph representing overlaps between sequences of symbols Given sequences of symbols (~reads): ATG, TGG, TGC, GTG, GGC, GCA, GCG, CGT "k-length fragments" (k=3) Nodes: fragments of k-1 (k-1=2) Edges: k-length fragments connecting overlapping vertices Finding Eulerian paths in the de Bruijn graph can lead to sequence reconstruction (Superpath problem, Merging transformation, etc.) Linear problem! J. Kaptcianos