260 likes | 411 Views
DNA Sequencing. Jessica Scheld. Recall: DNA. Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the DNA sequence, but it’s often too long to just read off
E N D
DNA Sequencing Jessica Scheld
Recall: DNA • Polymer of nucleotides which encodes information • Made up of long sequences of A,T,C,G’s • We want to read the DNA sequence, but it’s often too long to just read off • Problem: How do we know that a reassembled DNA sequence is in the right order? http://www.tokyo-med.ac.jp/genet/picts/dna.jpg
C A T G DNA Sequencing-Biology The Players primer dideoxynucleotides deoxynucleotides DNA TAQ polymerase DNA polymerase
What would our sequence read? T-C-G-A
Problem A T C A G C A T T A A G C G A A A T C G • Say we want to read this piece of single-stranded DNA • We can’t read it all in one piece, so we break it up into -length “snippets,” and then piece it back together to reconstruct the DNA sequence. • Here, we’ll use snippets of 4 nucleotides. • For the sequence above, we would get snippets of: ATCGACTATAAGGCATCGAA ATAA CGAC CATC GACT ATCG ATCG ACTA TATA CGAA AGGC TCGA CTAT TCGA TAAG AAGG GGCA GCAT
AAGG GGCA ATAA CATC ATCG ACTA GACT CGAA CGAC TCGA GCAT TAAG TATA TCGA ATCG CTAT AGGC GGCA = GGC GCA Constructing the DeBruijn Graph • These snippets can be represented by a graph. • Each snippet has length 4. Make each vertex of the graph consist of 3 of these letters. The “head” vertex contains the first three of one snippet and the “tail” vertex contains the last three. • Example:
Constructing the DeBruijn Graph cont. ATCGACTATAAGGCATCGAA • Do for all snippets – connect with directed edges. • Above is a construction of the DeBruijn graph for the subsequence of DNA above ATA TAA ATC GAA ACT TCG GAC TAT CAT CTA AAG CGA AGG GCA GGC GGC
ATA TAA ATC GAA ACT TCG GAC TAT ATC TCG CAT CTA AAG TCG ATC CGA CGA AGG GCA GGC CGA GGC Creating the 2-in 2-out digraph • Notice only 3 vertices with more than degree two. We can redraw this graph so it only has vertices with degree 4.
Recap • There is almost always more than one way to reconstruct a strand of DNA (and only 1 correct way) • DeBruijn graph visualizes these ways and can be redrawn as an Eulerian digraph • We want to find the number of ways to reconstruct the DNA • Finding the probability of getting the correct one
A B C D A B C C D Eulerian digraphs • Required: • Only 2-in 2-out digraphs • why not 1 or 3? Use circuit: ABCDCABDA Then we can rewrite this graph state as:
Chord Diagram Interlace Graph Use circuit: ABCDCABDA A B A B D C B D D A C C
Interchange edges and non-edges among Interlace polynomial Arratia, Bollobas, Sorkin, ‘00
D B A C Finding the Interlace Polynomial D B D B D B A A A C C C
D C A C A B A B Finding the Interlace Polynomial cont. D A B C C D A A C C D A D A A B C C C A B C
D A C Finding Interlace Polynomial cont. B A D C + = A + + + B C C A C C = = C B A A D C + + + + + C C
Interlace Polynomial Reconstructions • To find the number of reconstructions, we must relate the interlace polynomial to the circuit partition polynomial. • Theorem*: If is a 4-regular Eulerian digraph, C is any Eulerian circuit of , and H is the circle graph of the chord diagram determined by C, then • Thus: 6 reconstructions *J.A. Ellis-Monaghan, I. Sarmiento. Properties of the Interlace Polynomial via Isotropic Systems, Preprint.
Different Possible Cycles A A A A B B B B C C D D C C D D ABCDCABDA ABCABDCDA ABCDCABDA ABCDABDCA ABCABDCDA ABCDABDCA
What does it mean? • 6x 6 ways to reconstruct the original Eulerian graph. • Can find this by counting, but with bigger graphs, it would take much longer to find all the different circuits (if we don’t miss one) • Useful in determining probability of getting the right sequence of DNA • Problem above – probability = 1/6
A B D C C A B D Problem for the Class • Find the interlace polynomial, using these graphs: • What is the sequence we are using? How can you tell? • How many reconstructions are possible? A B D C
A B D B A C C A B D C Solution D
Solution cont. B A C D A B B A B A C C C D D D
A C Solution cont. B B C + + + + = A C C D D D D D D = + + + + + = D C B D C A D D Similar to the previous example, the interlace polynomial can be equated to the circuit partition polynomial, giving: implying that there are 6 ways to reconstruct this snippet of DNA the probability you have the correct one is 1/6.
The 6 circuits A B D C ABCDACBDA ABCDACBDA ABDACBCDA ABCBDACDA ABDACBCDA ABCBDACDA