1 / 47

Evolution of regulatory interactions in bacteria

Evolution of regulatory interactions in bacteria. Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission Problems, RAS Moscow, Russia Singapore, 17-18 July 2006. Comparative genomics of regulation. Why Functional annotation of genes

morrison
Download Presentation

Evolution of regulatory interactions in bacteria

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of regulatory interactions in bacteria Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission Problems, RAS Moscow, Russia Singapore, 17-18 July 2006

  2. Comparative genomics of regulation • Why • Functional annotation of genes • Metabolic modeling • Practical applications in genetic engineering, drug targeting etc. • How • Close genomes: phylogenetic footprinting. Regulatory sites are seen as conservation islands in alignments of gene upstream regions • Distant genomes: consistency filtering. Candidate sites in one genome may be unreliable, but independent occurrence upstream of orthologous genes in many genomes yields reliable predictions • Caveats • Presense of (predicted) binding sites does not immediately imply functional regulation • Operon structure • Need to verify presence of orthologous transcription factors in the studied genomes • Orthologous factors may have different binding motifs • One functional system may be regulated by different factors within and between genomes • Many genomes • Taxon-specific regulation • Evolution • individual sites • transcription-fator families • transcription factors and their binding motifs • simple and complex regulatory systems

  3. How it works: Two simple examples • Biotin regulator of alpha-proteobacteria • Universal regulator of ribonucleotide reductases: reconstruction of the regulatory system and the mechanism of regulation

  4. Profile 1:Gram-positive bacteria, Archaea Profile 2:Gram-negative bacteria BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing

  5. Profile 1:Gram-positive bacteria, Archaea Profile 2:Gram-negative bacteria BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing BirA of alpha-proteobacteria: no DNA-binding domain

  6. Identification of the candidate regulator (BioR) in alpha-proteobacteria TTATAGATAA TTATCTATAA TTATAGATAg TTATCTATAA TTATCTATAA TTATAGATAg TTATCTATAA TcATATATtA TcATAGATAg TTATCTATAA TTATCTATAA TTATCTATtA TTATCTAcAA TTATCTATAA TTATCTATAA TTATCTATAA TcATAGATtA cTATAGATAA TTATCTAcAA • Candidate binding sites: similar palindromes upstream of biotin biosynthesis and transport genes in different genomes

  7. Positional clustering: candidate transcription factor from the GntR family is often found in the same loci (black arrows) • Phyletic patterns: phyletic distribution of candidate sites (red cirsles) exactly coincides with the phyletic distribution of the candidate regulator • Autoregulation: in many cases there are candidate sites upstream of the bioR gene itself

  8. Conserved signal upstream of nrd genes

  9. Identification of the candidate regulator by the analysis of phyletic patterns • COG1327: the only COG with exactly the same phylogenetic pattern as the signal • “large scale” on the level of major taxa • “small scale” within major taxa: • absent in small parasites among alpha- and gamma-proteobacteria • absent in Desulfovibrio spp. among delta-proteobacteria • absent in Nostoc sp. among cyanobacteria • absent in Oenococcus and Leuconostoc among Firmicutes • present only in Treponema denticola among four spirochetes

  10. COG1327 “Predicted transcriptional regulator, consists of a Zn-ribbon and ATP-cone domains”: regulator of the riboflavin pathway?

  11. Additional evidence – 1 • nrdR is sometimes clustered with nrd genes or with replication genes dnaB, dnaI, polA

  12. Additional evidence – 2 • In some genomes, candidate NrdR-binding sites are found upstream of other replication-related genes • dNTP salvage • topoisomerase I, replication initiator dnaA, chromosome partitioning, DNA helicase II

  13. Multiple sites (nrd genes): FNR, DnaA, NrdR

  14. Mode of regulation • Repressor (overlaps with promoters) • Co-operative binding: • most sites occur in tandem (> 90% cases) • the distance between the copies (centers of palindromes) equals an integer number of DNA turns: • mainly (94%) 30-33 bp, in 84% 31-32 bp – 3 turns • 21 bp (2 turns) in Vibrio spp. • 41-42 bp (4 turns) in some Firmicutes • experimental confirmation in Streptomyces (Borovok et al., 2004)

  15. Evolutionary processes that shape regulatory systems • Expansion and contraction of regulons • Duplications of regulators with or without regulated loci • Loss of regulators with or without regulated loci • Re-assortment of regulators and structural genes • … especially in complex systems • Horizontal transfer

  16. Loss of regulators, and cryptic sites Loss of the RbsR in Y. pestis (ABC-transporter also is lost) RbsR binding site Start codon of rbsD

  17. Regulon expansion: how FruR has become CRA Mannose Glucose ptsHI-crr manXYZ edd epd eda adhE aceEF icdA pykF ppsA mtlD mtlA Mannitol pgk gpmA pckA gapA fbp pfkA aceA tpiA fruBA fruK Fructose aceB Gamma-proteobacteria

  18. Common ancestor of Enterobacteriales Mannose Glucose ptsHI-crr manXYZ edd epd eda adhE aceEF icdA pykF ppsA mtlD mtlA Mannitol pgk gpmA pckA gapA fbp pfkA aceA tpiA fruBA fruK Fructose aceB Gamma-proteobacteria Enterobacteriales

  19. Common ancestor of Escherichia and Salmonella Mannose Glucose ptsHI-crr manXYZ edd epd eda adhE aceEF icdA pykF ppsA mtlD mtlA Mannitol pgk gpmA pckA gapA fbp pfkA aceA tpiA fruBA fruK Fructose aceB Gamma-proteobacteria Enterobacteriales E.coli and Salmonella spp.

  20. Trehalose/maltose catabolism, alpha-proteobacteria Duplicated LacI-family regulators: lineage-specific post-duplication loss

  21. The binding signals are very similar (the blue branch is somewhat different: to avoid cross-recognition?)

  22. Utilization of an unknown galactoside, gamma-proteobacteria Yersinia and Klebsiella: two regulons, GalR (not shown, includes genes galK and galT) and Laci-X Erwinia: one regulon, GalR Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup)

  23. Utilization of maltose/maltodextrin, Firmicutes Two different ABC transporters (shades of red) PTS (pink) Glucoside hydrolases (shades of green) Two regulators (black and grey)

  24. Modularity of the functional subsystem Two different ABC systems Three hydrolases in one operon (E. faecalis) or separately

  25. Changes of regulation Displacement:invasion of a regulator from a different subfamily (horizontal transfer from a related species?) – blue sites

  26. Orthologous TFs with completely different regulons (alpha-proteobaceria and Xanthomonadales)

  27. Catabolism of gluconate, proteobacteria

  28. extreme variability of regulation of “marginal” regulon members β γ Pseudomonas spp.

  29. Combined regulatory network for iron homeostasis genes in a-proteobacteria Irr Irr RirA RirA FeS heme degraded 2+ 3+ S i d e r o p h o r e F e / F e I r o n - r e q u i r i n g I r o n s t o r a g e F e S H e m e T r a n s c r i p t i o n u p t a k e u p t a k e e n z y m e s f e r r i t i n s s y n t h e s i s s y n t h e s i s f a c t o r s I r o n u p t a k [ i r o n c o f a c t o r ] e s y s t e m s IscR Fur Fur Fe [+Fe] [+Fe] [- Fe] [ Fe] - FeS status of cell FeS [- Fe] [+Fe] The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.

  30. Distribution of Irr, Fur/Mur, MntR, RirA, and IscR regulons in α-proteobacteria Fe and Mn regulons MUR / Irr Group RirA IscR Organism Abb. MntR F UR - - SM + + + Sinorhizobium meliloti - - + + + + Rhizobium leguminosarum RL Rhizobiaceae - - + + + Rhizobium etli RHE - - + + + Agrobacterium tumefaciens AGR A. - - + + + ML Mesorhizobium loti - - + + + + Mesorhizobium sp. BNC1 MBNC - - + + + Brucella melitensis BME Rhizobiales - - + + + BQ Bartonella quintana and spp. - - - + + + Bradyrhizobium japonicum BJ - - - + + + RPA Rhodopseudomonas palustris B. - - - + + Nham Nitrobacter hamburgensis Bradyrhizobiaceae - - - + + Nitrobacter winogradskyi Nwi - RC + + + + Rhodobacter capsulatus - + + + + Rhodobacter sphaeroides Rsph - STM + + + + Silicibacter sp. TM1040 - + + + + S PO Silicibacter pomeroyi - + + #? + Jannaschia sp.CC51 Jann Rhodo- - bacteraceae HTCC2654 + + + + Rhodobacterales bacterium RB2654 C. - + + + + Roseobacter sp. MED193 MED193 - #? ISM + + + Roseovarius nubinhibens ISM Rhodo- - - bacterales sp.217 + + + + Roseovarius ROS217 p - + + #? + r Loktanella vestfoldensis SKA53 SKA53 o - t EE-36 + + + Sulfitobacter sp. EE36 #? e o - #? HTCC2597 + + + Oceanicola batsensis OB2597 b Hyphomonadaceae a - - - HTCC2633 + + Oceanicaulis alexandrii OA2633 c t Caulobacterales e - - - CC + + Caulobacter crescentu s r i Parvularculales a - - - + + Parvularcula bermudensis HTCC2503 PB2503 - - - + + Erythrobacter litoralis ELI - - - + + Saro Novosphingobium aromaticivorans Sphingomo- - - - + + nadales Sphinopyxis alaskensis g RB2256 Sala D. - - - + + Zymomonas mobilis ZM Rhodo- - - + + + Gluconobacter oxydans GOX spirillales - - - + + + Rrub Rhodospirillum rubrum - - - + + + Amb Magnetospirillum magneticum SAR11 cluster - - + + HTCC1002 + Pelagibacter ubique PU1002 Rickettsiales - - - - + Rickettsia Ehrlichia and species #?' in RirA column denotes the absence of the rirA gene in an unfinished genomic sequence and the presence of candidate RirA-binding sites upstream of the iron uptake genes.

  31. Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - I Escherichia coli : P0A9A9 sp| ECOLI Fur Pseudomonas aeruginosa : sp|Q03456 PSEAE Neisseria meningitidis : sp|P0A0S7 NEIMA HELPY : sp|O25671 Helicobacter pylori BACSU Bacillus subtilis : P54574 sp| SM mur Sinorhizobium meliloti MBNC03003179 Mesorhizobium sp. BNC1 (I) BQ fur2 Bartonella quintana BMEI0375 Brucella melitensis EE36 12413 sp. EE-36 Sulfitobacter a MBNC03003593 sp. BNC1 (II) Mesorhizobium RB2654 19538 HTCC2654 Rhodobacterales bacterium AGR C 620 Agrobacterium tumefaciens RHE_CH00378 Rhizobium etli RL mur Rhizobium leguminosarum Nham 0990 Mur Nitrobacter hamburgensis X14 Nwi 0013 Nitrobacter winogradskyi RPA0450 Rhodopseudomonas palustris BJ fur Bradyrhizobium japonicum ROS217 18337 Roseovarius sp.217 Jann 1799 Jannaschia sp. CC51 SPO2477 Silicibacter pomeroyi STM1w01000993 Silicibacter sp. TM1040 MED193 22541 sp. MED193 Roseobacter OB2597 02997 HTCC2597 Oceanicola batsensis SKA53 03101 Loktanella vestfoldensis SKA53 Rsph03000505 Rhodobacter sphaeroides ISM 15430 Roseovarius nubinhibens ISM PU1002 04436 Pelagibacter ubique HTCC1002 GOX0771 Gluconobacter oxydans ZM01411 Zmomonas mobilis y Saro02001148 Novosphingobium aromaticivorans a Sala 1452 RB2256 Sphinopyxis alaskensis Fur ELI1325 Erythrobacter litoralis OA2633 10204 Oceanicaulis alexandrii HTCC2633 PB2503 04877 Parvularcula bermudensis HTCC2503 CC0057 Caulobacter crescentus Rrub02001143 Rhodospirillum rubrum Amb1009 (I) Magnetospirillum magneticum a Amb4460 Magnetospirillum magneticum (II) Irr Fur in g- and b- proteobacteria Fur in e- proteobacteria Fur in Firmicutes in a-proteobacteria Regulator of manganese uptake genes (sit, mntH) in a-proteobacteria Regulator of iron uptake and metabolism genes a-proteobacteria

  32. Erythrobacter litoralis Caulobacter crescentus Novosphingobium aromaticivorans Zymomonas mobilis Sequence logos for the identified Fur-binding sites in the D group of a-proteobacteria Oceanicaulis alexandrii Sphinopyxis alaskensis Rhodospirillum rubrum Gluconobacter oxydans Parvularcula bermudensis - Magnetospirillum magneticum Identified Mur-binding sites Bacillus subtilis The A, B, and C groups Sequence logos for the known Fur-binding sites in Escherichia coli and Bacillus subtilis Mur a of - proteobacteria - Escherichia coli

  33. Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - II Escherichia coli ECOLI : P0A9A9 sp| Fur Pseudomonas aeruginosa : sp|Q03456 PSEAE Neisseria meningitidis : sp|P0A0S7 NEIMA HELPY Helicobacter pylori : sp|O25671 BACSU Bacillus subtilis : P54574 sp| a Mur / Fur AGR C 249 Agrobacterium tumefaciens SM irr Sinorhizobium meliloti RHE CH00106 Rhizobium etli RL irr1 Rhizobium leguminosarum (I) RL irr2 Rhizobium leguminosarum (II) MLr5570 Mesorhizobium loti MBNC03003186 sp. BNC1 Mesorhizobium BQ fur1 Bartonella quintana BMEI1955 Brucella melitensis (I) BMEI1563 Brucella melitensis (II) BJ blr1216 (II) Bradyrhizobium japonicum RB2654 182 Rhodobacterales bacterium HTCC2654 SKA53 01126 Loktanella vestfoldensis SKA53 ROS217 15500 Roseovarius sp.217 ISM 00785 ISM Roseovarius nubinhibens OB2597 14726 Oceanicola batsensis HTCC2597 Jann 1652 sp. CC51 Jannaschia a I r r - Rsph03001693 Rhodobacter sphaeroides EE36 03493 Sulfitobacter sp. EE-36 STM1w01001534 sp. TM1040 Silicibacter MED193 17849 Roseobacter sp. MED193 SPOA0445 Silicibacter pomeroyi RC irr Rhodobacter capsulatus RPA2339 (I) Rhodopseudomonas palustris RPA0424* Rhodopseudomonas palustris (II) BJ irr* (I) Bradyrhizobium japonicum Nwi 0035* Nitrobacter winogradskyi Nham 1013* Nitrobacter hamburgensis X14 PU1002 04361 Pelagibacter ubique HTCC1002 Fur in g- and b- proteobacteria Fur in e- proteobacteria Fur in Firmicutes a-proteobacteria Irrin a-proteo- bacteria regulator of iron homeostasis

  34. Sequence logos for the identified Irr binding sites in a-proteobacteria The A group (8 species) - Irr The B group (4 species) - Irr The C group (12 species) - Irr

  35. Phylogenetic tree of the Rrf2 family of transcription factors in a-proteobacteria SPOA0186 ROS217_15206 Rsph03001477 R Ricket. r s RC NsrR N _ Sala_1049 GOX0860 CC0132 C Saro02000305 Amb1318 E Nwi_0743 SMc01160 NE NsrR OB2597_05195 BJ blr7974 M B N C 0 3 0 ROS217_02155 0 4 4 8 7 RL_5159 AGR_L_2343 SMc00785 ROS217_14291 RHE CH00735 AGR_C_402 AGR_C_344 AGR_L_1131 OA2633_11510 RL RirA SPO3722 BMEII0707 RHE_CH02777 MLr1147 RL_3336 MBNC02002196 SPO1393 R h i BQ04990 z o NsrR MBNC02000669 b i a l e RC 0780 MLl1642 s RirA SMc02238 RB2654_19993 s e AGR_C_872 l Rsph023178 a r e t c a RHE_CH00547 b SPO0432 o d o h RL_619 R MED193_09800 STM_634 ZMO0116 ROS217_16231 GOX0099 B S C y m R IscR-II 4 1 0 5 Rrub02000219 m b A 6 7 0 1 7 0 2 0 u b r R ZMO0422 7 4 1 6 r M L Sala_1236 5 1 6 L 4 l M ELI0458 9 1 6 L _ 3 R 4 3 3 6 Saro3534 H 0 E _ C R H IscR 6 7 0 2 2 DV Rrf2 M c S 6 1 3 1 5 0 L _ 2 5 R 9 4 1 2 2 0 9 9 M b 2 S 7 5 6 6 3 3 3 2 1 1 0 1 1 2 2 8 0 C H L _ E _ G R _ R H A 0 0 _ OA2633_03246 C _ 0 a C l 3 2 CC1866 a 9 9 1 4 0 0 _ C _ G R A S 5 o 2 r 5 9 2 8 2 0 0 a u b 0 R r B R A S r EC IscR P u m b Rrub_1115 b 0 Jann_2366 3 2 6 3 7 5 h 0 2 s p Amb0200 0 R 0 0 3 STM_3629 2 0 1 0 3 5 GOX1196 C _ 0 R 4 EE36_14302 Ricket. 0 RPA0663 SPO2025 M E P B 2 5 0 3 _ 0 9 8 8 4 D I 1 Rsph023725 S 9 3 M _ O _ 0 1 4 3 RC_0477 B 6 3 9 R 0 2 8 2 1 1 1 0 5 5 O 5 9 0 7 0 S 4 _ _ 0 2 3 0 5 3 1 5 A 4 7 8 K 5 9 _ S 6 2 2 0 B 5 4 R 2 Nitrite/NO-sensing regulator NsrR (Nitrosomonas europeae, Escherichia coli) Positional clustering of rrf2-like genes with: iron uptake and storage genes; Fe-S cluster synthesis operons; genes involved in nitrosative stress protection; sulfate uptake/assimilation genes; thioredoxin reductase; carboxymuconolactone decarboxylase-family genes; hmc cytochrome operon Iron repressor RirA (Rhizobium leguminosarum) Cysteine metabolism repressor CymR (Bacillus subtilis) Cytochrome complex regulator Rrf2 (Desulfovibrio vulgaris) Iron-Sulfur cluster synthesis repressor IscR (Escherichia coli) proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain proteins without a cysteine triad motif

  36. Sequence logos for the identified RirA-binding sites in a-proteobacteria The A group - RirA (8 species) The C group - RirA (12 species)

  37. Distribution of the conserved members of the Fe- and Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in a-proteobacteria Genes Functions: Iron uptake Iron storage FeS synthesis Iron usage Heme biosynthesis Regulatory genes Manganese uptake

  38. An attempt to reconstruct the history

  39. Regulators and their signals • Subtle changes at close evolutionary distances • Cases of motif conservation at surprisingly large distances • Correlation between contacting nucleotides and amino acid residues

  40. DNA signals and protein-DNA interactions Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom) CRP PurR IHF TrpR

  41. Specificity-determining positions in the LacI family • Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups – 44 SDPs 10residues contactNPF (analog of the effector) 7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ) 6 residues in the intersubunit contacts 5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ) 7residues contact the operator sequence 6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ) LacI from E.coli

  42. The LacI family: subtle changes in signals at close distances G n A CG Gn GC

  43. CRP/FNR family of regulators

  44. Correlation between contacting nucleotides and amino acid residues • CooA in Desulfovibrio spp. • CRP in Gamma-proteobacteria • HcpR in Desulfovibrio spp. • FNR in Gamma-proteobacteria Contacting residues: REnnnR TG: 1st arginine GA: glutamate and 2nd arginine DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVR DV COOA ELTMEQLAGLVGTTRQTASTLLNDMIR EC CRP KITRQEIGQIVGCSRETVGRILKMLED YP CRP KXTRQEIGQIVGCSRETVGRILKMLED VC CRP KITRQEIGQIVGCSRETVGRILKMLEE DD HCPR DVSKSLLAGVLGTARETLSRALAKLVE DV HCPR DVTKGLLAGLLGTARETLSRCLSRMVE EC FNR TMTRGDIGNYLGLTVETISRLLGRFQK YP FNR TMTRGDIGNYLGLTVETISRLLGRFQK VC FNR TMTRGDIGNYLGLTVETISRLLGRFQK TGTCGGCnnGCCGACA TTGTGAnnnnnnTCACAA TTGTgAnnnnnnTcACAA TTGATnnnnATCAA

  45. The correlation holds for other factors in the family

  46. Open problems • Model the evolution of regulatory systems (a catalog of elementary events, estimates of probabilities) • Birth of a binding site; what are the mechanisms? • Loss of a binding site • Duplication of a regulated gene and/or a regulator • Horizontal transfer of a regulated gene and/or a regulator • Loss of structural a gene and/or a regulator • General properties? • Distribution of TF family and regulon sizes • Stable cores and flexible margins of functional systems (in terms of gene presence and regulation) • Co-evolution of TFs and DNA sites: • “Neutral” model for the evolution of binding sites (with invariant functional pressure from the bound protein) • How do the signals evolve? What is the driving force – changes in TFs? • TF-family, position-specific protein-DNA recognition code? All that needs to take into account the incompleteness and noise in the data

  47. Andrei A. Mironov (algorithms and software) Alexandra B. Rakhmaninova (SDPs) Dmitry Rodionov (now at Burnham Institute) (BioR, NrdR, iron) Olga Laikova (LacI, sugars) Dmitry Ravcheev (FruR) Olga Kalinina (SDPs/LacI) Leonid Mirny, MIT (protein/DNA contacts, SDPs) Andy Johnston, University of East Anglia (iron) Howard Hughes Medical Institute Russian Fund of Basic Research Russian Academy of Sciences, program “Molecular and Cellular Biology” INTAS Acknowledgements

More Related