270 likes | 439 Views
Institute of Molecular Biotechnology. Jena. Purine-Pyrimidine Patterns in the Genetic Code and in Restriction Enzyme Recognition Sequences. Swetlana Nikolajewa, Andreas Beyer, Maik Friedel, Jens Hollunder, Thomas Wilhelm. Institute of Molecular Biotechnology, Jena Germany.
E N D
Institute of Molecular Biotechnology Jena Purine-Pyrimidine Patterns in the Genetic Code and in Restriction Enzyme Recognition Sequences Swetlana Nikolajewa, Andreas Beyer, Maik Friedel, Jens Hollunder, Thomas Wilhelm Institute of Molecular Biotechnology, Jena Germany
Overview: Purine-Pyrimidine Patterns • Part 1New Classification Scheme of the Genetic code • Part 2Type II Restriction Enzyme Binding Sites
Overview: Genetic Code • Part 1. The purine-pyrimidine scheme of the genetic codes shows • amino-acids patterns and regularities of codons • symmetry characteristics • possible predecessors of our contemporary quaternary triplet code • explanation for the number (22) of tRNA genes in mammalian mitochondrial genome
PuRines vs. PYrimidines G A C T
Purine pairs with Pyrimidine 3 H Bonds 2 H Bonds
The Common Genetic Code Table • 3 nucleobases (triplets) of A, G, C, U code for 20 AAs • 64 possible codons (4x4x4=43) • 3 termination codons: UGA, UAG, UAA • Met (AUG) codon is also the start codon The Common Genetic Code Table contains 64 fields…
CG binds via 3hydrogen bonds in the complementary base pairing AU binds via 2 hydrogen bonds in the complementary base pairing Purine-Pyrimidine Classification Scheme of the Genetic Code • binary representation of nucleobases purines : A, G → 1 pyrimidines: C, U → 0 • 23 = 8 different binary triplets 000 , 001, … ,111each of these has again 8 possibilities, for instance: • 000 stands for three pyrimidines: CCC, CCU, UUC, …, UUU • 111 stands for three purines: GGG, GGA, GAA, …,AAA
Purine-Pyrimidine Table of the Genetic Code Mixedcodons 5 H bonds Codon Strong codons 6 H bonds Mixed codons 5 H bonds Weak codons 4 H bonds 000 ProCC (C/U) Proline SerUC (C/U) Serine LeuCU(C/U) Leucine PheUU(C/U) Phenylalanine SerUC(A/G) Serine LeuUU(A/G) Leucine ProCC(A/G) Proline LeuCU(A/G) Leucine 001 ThrAC(C/U) Threonine Ala GC(C/U) Alanine IleAU(C/U) Isoleucine ValGU(C/U) Valine 100 AlaGC(A/G) Alanine ThrAC(A/G) Threonine ValGU(A/G) Valine 101 Ile/MetAU(A/G) Isoleucine/Methionine ArgCG(C/U) Arginine CysUG(C/U) Cystein 010 HisCA (C/U) Histidine TyrUA(C/U) Tyrosine StopUA(A/G) GlnCA (A/G) Glutamine ArgCG(A/G) Arginine 011 Stop/TrpUG(A/G) Tryptophan AsnAA(C/U) Asparagine GlyGG(C/U) Glycine 110 SerAG(C/U) Serine AspGA(C/U) Asparaticacid GlyGG(A/G) Glycine ArgAG(A/G) Arginine LysAA(A/G) Lysine GluGA(A/G) Glutamaticacid 111 …the new scheme contains the same information in only 32 fields.
Amino Acid Patterns:Polar Requirement of NCN and NUN Codons Mixed 5 hydrogen bonds Codon Strong 6 hydrogen bonds Mixed 5 hydrogen bonds Weak 4 hydrogen bonds 000 ProCC (C/U) SerUC (C/U) LeuCU(C/U) PheUU(C/U) ProCC(A/G) SerUC(A/G) 001 LeuUU(A/G) LeuCU(A/G) 100 ThrAC(C/U) AlaGC(C/U) IleAU(C/U) ValGU(C/U) 101 Ile/MetAU(A/G) AlaGC(A/G) ThrAC(A/G) ValGU(A/G) ArgCG(C/U) CysUG(C/U) 010 HisCA (C/U) TyrUA(C/U) StopUA(A/G) GlnCA (A/G) ArgCG(A/G) 011 Stop/TrpUG(A/G) AsnAA(C/U) Asparagine GlyGG(C/U) 110 SerAG(C/U) AspGA(C/U) Asparaticacid GlyGG(A/G) ArgAG(A/G) LysAA(A/G) Lysine GluGA(A/G) Glutamaticacid 111 C. R. Woese, G. J. Olsen, M. Ibba, D. Söll Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process. MMBR 2000(64) 202-236
Amino Acid Patterns: Hydrophobicity Mixed 5 H-bonds Codon Strong 6 H-bonds Mixed 5 H-bonds Weak 4 H- bonds 000 ProCC (C/U) SerUC (C/U) LeuCU(C/U) PheUU(C/U) ProCC(A/G) SerUC(A/G) 001 LeuUU(A/G) LeuCU(A/G) 100 ThrAC(C/U) AlaGC(C/U) IleAU(C/U) ValGU(C/U) 101 Ile/MetAU(A/G) AlaGC(A/G) ThrAC(A/G) ValGU(A/G) ArgCG(C/U) CysUG(C/U) 010 HisCA (C/U) TyrUA(C/U) StopUA(A/G) GlnCA (A/G) ArgCG(A/G) 011 Stop/TrpUG(A/G) AsnAA(C/U) GlyGG(C/U) 110 SerAG(C/U) AspGA(C/U) GlyGG(A/G) ArgAG(A/G) LysAA(A/G) GluGA(A/G) 111 Kyte&Doolittle, 1982, http://biology-pages.info
Codon-Anticodon Symmetry Mixed 5 H-bonds Codon Strong 6 H-bonds Mixed 5 H-bonds Weak 4 H-bonds 000 ProCC (C/U) SerUC (C/U) LeuCU(C/U) PheUU(C/U) ProCC(A/G) SerUC(A/G) 001 LeuUU(A/G) LeuCU(A/G) 100 ThrAC(C/U) AlaGC(C/U) IleAU(C/U) ValGU(C/U) 101 Ile/MetAU(A/G) AlaGC(A/G) ThrAC(A/G) ValGU(A/G) ArgCG(C/U) CysUG(C/U) 010 HisCA (C/U) TyrUA(C/U) StopUA(A/G) GlnCA (A/G) ArgCG(A/G) 011 Stop/TrpUG(A/G) AsnAA(C/U) GlyGG(C/U) 110 SerAG(C/U) AspGA(C/U) LysAA(A/G) GlyGG(A/G) ArgAG(A/G) GluGA(A/G) 111
Point Symmetry Mixed 5 H-bonds Codon Strong 6 H-bonds Mixed 5 H- bonds Weak 4 H-bonds 000 ProCC (C/U) SerUC (C/U) LeuCU(C/U) PheUU(C/U) ProCC(A/G) SerUC(A/G) 001 LeuUU(A/G) LeuCU(A/G) 100 ThrAC(C/U) AlaGC(C/U) IleAU(C/U) ValGU(C/U) 101 Ile/MetAU(A/G) AlaGC(A/G) ThrAC(A/G) ValGU(A/G) ArgCG(C/U) CysUG(C/U) 010 HisCA (C/U) TyrUA(C/U) StopUA(A/G) GlnCA (A/G) ArgCG(A/G) 011 Stop/TrpUG(A/G) AsnAA(C/U) GlyGG(C/U) 110 SerAG(C/U) AspGA(C/U) GlyGG(A/G) ArgAG(A/G) LysAA(A/G) GluGA(A/G) 111 D. Halitsky Extending the (Hexa-)Rhombic Dodecahedral Model of the Genetic Code: the Code's Four 6-fold Degeneracies and the Ten Orthogonal Projections of the 5-cube as 3-cube. Computer Systems Technology 2004
Codon-Reverse Codon(XYZ↔ZYX) Symmetry Mixed 5 H-bonds Codon Strong 6 H-bonds Mixed 5 H- bonds Weak 4 H-bonds 000 ProCC (C/U) SerUC (C/U) LeuCU(C/U) PheUU(C/U) ProCC(A/G) SerUC(A/G) 001 LeuUU(A/G) LeuCU(A/G) 100 ThrAC(C/U) AlaGC(C/U) IleAU(C/U) ValGU(C/U) 101 Ile/MetAU(A/G) AlaGC(A/G) ThrAC(A/G) ValGU(A/G) ArgCG(C/U) CysUG(C/U) 010 HisCA (C/U) TyrUA(C/U) StopUA(A/G) GlnCA (A/G) ArgCG(A/G) 011 Stop/TrpUG(A/G) AsnAA(C/U) GlyGG(C/U) 110 SerAG(C/U) AspGA(C/U) GlyGG(A/G) ArgAG(A/G) LysAA(A/G) GluGA(A/G) 111
Stop AUC AUC GAU UAG Asp STOP Codon-Reverse Codon(XYZ↔ZYX) Symmetry Asp CUA
CGU, UAC,… Evolution of the Genetic Code • our contemporary code is the quaternary triplet code: 43=64 fields • binary doublet: 41=4 fields CGU, UAC,… • quaternary doublet code:42=16 fields
Evolution: Scenario 1 Mixed 5 H bonds Codon Strong 6 H bonds Mixed 5 H bonds Weak 4 H bonds 000 ProCC (C/U) Proline SerUC (C/U) Serine LeuCU(C/U) Leucine PheUU(C/U) Phenylalanine SerUC(A/G) Serine LeuUU(A/G) Leucine ProCC(A/G) Proline LeuCU(A/G) Leucine 001 ThrAC(C/U) Threonine AlaGC(C/U) Alanine IleAU(C/U) Isoleucine ValGU(C/U) Valine 100 AlaGC(A/G) Alanine ThrAC(A/G) Threonine ValGU(A/G) Valine 101 Ile/MetAU(A/G) Isoleucine/Methionine ArgCG(C/U) Arginine CysUG(C/U) Cystein 010 HisCA (C/U) Histidine TyrUA(C/U) Tyrosine StopUA(A/G) GlnCA (A/G) Glutamine ArgCG(A/G) Arginine 011 Stop/TrpUG(A/G) Tryptophan AsnAA(C/U) Asparagine GlyGG(C/U) Glycine 110 SerAG(C/U) Serine AspGA(C/U) Asparaticacid GlyGG(A/G) Glycine ArgAG(A/G) Arginine LysAA(A/G) Lysine GluGA(A/G) Glutamaticacid 111
Evolution: Scenario 2 Mixed 5 H bonds Codon Strong 6 H bonds Mixed 5 H bonds Weak 4 H bonds 000 ProCC (C/U) Proline SerUC (C/U) Serine LeuCU(C/U) Leucine PheUU(C/U) Phenylalanine SerUC(A/G) Serine LeuUU(A/G) Leucine ProCC(A/G) Proline LeuCU(A/G) Leucine 001 ThrAC(C/U) Threonine AlaGC(C/U) Alanine IleAU(C/U) Isoleucine ValGU(C/U) Valine 100 AlaGC(A/G) Alanine ThrAC(A/G) Threonine ValGU(A/G) Valine 101 Ile/MetAU(A/G) Isoleucine/Methionine ArgCG(C/U) Arginine CysUG(C/U) Cystein 010 HisCA (C/U) Histidine TyrUA(C/U) Tyrosine StopUA(A/G) GlnCA (A/G) Glutamine ArgCG(A/G) Arginine 011 Stop/TrpUG(A/G) Tryptophan AsnAA(C/U) Asparagine GlyGG(C/U) Glycine 110 SerAG(C/U) Serine AspGA(C/U) Asparaticacid GlyGG(A/G) Glycine ArgAG(A/G) Arginine LysAA(A/G) Lysine GluGA(A/G) Glutamaticacid 111
Mitochondrial genomes have several surprising features • genetic code of mitochondria • only22 tRNAs are required for mammalian mitochondrial protein synthesis ?
The Mammalian Mitochondrial Genetic Code Mixed 5 H bonds Codon Strong 6 H bonds Mixed 5 H bonds Weak 4 H bonds 000 ProCC (C/U) SerUC (C/U) LeuCU(C/U) PheUU(C/U) ProCC(A/G) SerUC(A/G) 001 LeuUU(A/G) LeuCU(A/G) 100 ThrAC(C/U) AlaGC(C/U) IleAU(C/U) ValGU(C/U) 101 Met/MetAU(A/G) AlaGC(A/G) ThrAC(A/G) ValGU(A/G) ArgCG(C/U) CysUG(C/U) 010 HisCA (C/U) TyrUA(C/U) StopUA(A/G) GlnCA (A/G) ArgCG(A/G) 011 Trp /TrpUG(A/G) AsnAA(C/U) GlyGG(C/U) 110 SerAG(C/U) AspGA(C/U) LysAA(A/G) GlyGG(A/G) STOPAG(A/G) GluGA(A/G) 111 http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
The Mammalian Mitochondrial Code8 tRNAs for family codons + 14 tRNAs for non-family codons = 22 Mixed 5 H bonds Codon Strong 6 H bonds Mixed 5 H bonds Weak 4 H bonds 000 tRNAPheUU(C/U) tRNALeu1CU tRNASer1UC tRNAProCC tRNALeu2UU(A/G) 001 tRNAValGU tRNAThrAC tRNAIleAU(C/U) tRNAAlaGC 100 101 tRNAMetAU(A/G) tRNAArgCG tRNACysUG (C/U) 010 tRNAHisCA (C/U) tRNATyrUA(C/U) STOPUA(A/G) tRNAGlnCA (A/G) 011 tRNATrpUG (A/G) tRNAAsnAA(C/U) tRNAGlyGG 110 tRNASer2AG (C/U) tRNAAspGA(C/U) tRNALysAA(A/G) STOPAG(A/G) tRNAGluGA(A/G) 111 http://mamit-trna.u-strasbg.fr/2DStructures.html
Part 2. Common Patterns in Type II Restriction Enzyme Binding Sites
Restriction Enzyme (Endonuclease) Restriction enzymes • recognize short specific DNA sequences • enable bacteria to destroy foreign DNA • are useful tools in biotechnology G A A T T C G A A T T C • The most well studied class of REs is type II, which cleave DNA within their recognition sequences • Many recognition sequences are palindromic
Are REase similar in the binding sites? 11↓00 11↓00 1↓11 000 1↓11 000 1↓11 000 Examples from Kimball‘s Biology Pages
How significant is the Pattern RR/YY (11/00)? Asymmetrical (2%) recognition sequences • Frequencies of • dinucleotides • trinucleotides • tetranucleotides coded in three possible coding scheme: • R vs Y (G, A vs C, T) • K vs M (G, T vs C, A) • S vs W (G, C vs A, T) Type II 3726 Symmetrical (98%) recognition sequences In the symmetrical set the most significant dinucleotides are RR (or 11) (p-value <10-63) and YY (or 00) (p-value <10-29) In the asymmetric set RRR, YYY and YYYY are even more significant, but RR and YY also stand out.
Why is the Motif RR..YY preferred? Dinucleotides RR..YY are characterized by: • specific geometrical properties • minimal slide values • strong tilt in the negative direction • positive roll • low stacking energy • stronger H-bond donor and acceptor clusters Figure 1 Example of an interaction between an H-bond donor cluster (resulting from two adjacent purines AA) and an H-bond acceptor.
Outlook • Looking for binary patterns in the genomes • Additional information http://www.imb-jena.de/tsb Thankyouforyourattention!
Purine-Pyrimidine Scheme of the Genetic Code Mixed 5 hydrogen bonds Codon Strong 6 hydrogen bonds Mixed 5 hydrogen bonds Weak 4 hydrogen bonds 000 ProCC (C/U) Proline SerUC (C/U) Serine LeuCU(C/U) Leucine PheUU(C/U) Phenylalanine SerUC(A/G) Serine LeuUU(A/G) Leucine ProCC(A/G) Proline LeuCU(A/G) Leucine 001 ThrAC(C/U) Threonine AlaGC(C/U) Alanine IleAU(C/U) Isoleucine ValGU(C/U) Valine 100 AlaGC(A/G) Alanine ThrAC(A/G) Threonine ValGU(A/G) Valine 101 Ile/MetAU(A/G) Isoleucine/Methionine ArgCG(C/U) Arginine CysUG(C/U) Cystein 010 HisCA (C/U) Histidine TyrUA(C/U) Tyrosine StopUA(A/G) GlnCA (A/G) Glutamine ArgCG(A/G) Arginine 011 Stop/TrpUG(A/G) Tryptophan AsnAA(C/U) Asparagine GlyGG(C/U) Glycine 110 SerAG(C/U) Serine AspGA(C/U) Asparaticacid GlyGG(A/G) Glycine ArgAG(A/G) Arginine LysAA(A/G) Lysine GluGA(A/G) Glutamaticacid 111