250 likes | 349 Views
Ab initio genotype-phenotype association reveals intrinsic modularity in genetic networks (in bacteria). Olivier Elemento, Tavazoie lab. Some bacterial phenotypes …. Motility. Spore formation. Gram-staining. Hyper-thermophily. Can we find the genes underlying these phenotypes ?.
E N D
Ab initiogenotype-phenotype association reveals intrinsic modularity in genetic networks (in bacteria) Olivier Elemento, Tavazoie lab
Some bacterial phenotypes … Motility Spore formation Gram-staining Hyper-thermophily
Motility in bacteria • Some (but not all) bacteria are motile • Motile bacteria may share genes involved in motility • These genes may be absent from non-motile bacteria
C. jejeuni B. subtilis B. anthrax E. coli M. tuberculosis S. Pneumonae S. aureus M. Leprae ~200 bacterial genomes … … Motility present absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
C. jejeuni B. subtilis B. anthrax E. coli M. tuberculosis S. Pneumonae S. aureus M. Leprae ~200 bacterial genomes … … Motility E. coli Gene X present absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
C. jejeuni B. subtilis B. anthrax E. coli M. tuberculosis S. Pneumonae S. aureus M. Leprae ~200 bacterial genomes … … Motility High correlation E. coli Gene X E. coli Gene Y … … Gene Y is likely involved in motility present absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
C. jejeuni B. subtilis B. anthrax E. coli M. tuberculosis S. Pneumonae S. aureus M. Leprae ~200 bacterial genomes … … Motility B. subtilis gene Z (e.g. CheV) … present absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
Calculate a phylogenetic profile for all 600,000 genes in bacteria (~1.2x10^8 BLASTs) • Collect the genes most correlated to the phenotype in all bacteria that have the phenotype (~3,000 for motility) • Merge homologous genes (based on sequence similarity)
Merging homologous (orthologous/paralogous) genes ~ 3,000 motility genes ~ 3,000 motility genes 75 groups of homologs (Generic Genes)
Motility E. coli Gene Y B. subtilis Gene Y B. anthrax Gene Y C. jejeuni Gene Y Generic Gene Y
Can we recover such modules ? Motility GenericGene V GenericGene W GenericGene Y GenericGene Z
Can we recover such modules ? GenericGene V GenericGene Z Module 1 GenericGene W GenericGene Y Module 2
Can we recover such modules ? • Cluster Generic Gene profiles 1,000 times using Iclust with different random initializations (obtain slightly different clusters) • Group together genes which almost always end up in the same cluster Iclust: Slonim et al, 2006
GG-3 flagellar biosynthetic protein flhB GG-4 flagellar biosynthetic protein flhA GG-5 flagellar biosynthetic protein fliP GG-22 flagellar biosynthetic protein fliR GG-56 flagellar biosynthetic protein fliQ GG-6 flagellar hook flgE/F/G GG-7 flagellar motor switch fliG GG-10 flagellar basal-body rod flgC GG-12 flagellar MS-ring fliF GG-13 flagellar hook-associated protein 1 flgK GG-18 flagellar motor switch fliN GG-21 flagellar motor switch fliM GG-27 flagellar hook-associated protein 3 flgL GG-29 flagellar hook-associated protein 2 fliD GG-8 flagellin fliC GG-17 motility protein A motA GG-74 flagellar protein fliS GG-20 motility protein B motB GG-1 methyl-accepting chemotaxis protein GG-11 chemotaxis protein cheA GG-45 methyl-accepting chemotaxis protein GG-73 methyl-accepting chemotaxis protein GG-38 chemotaxis protein cheV GG-15 chemotaxis protein cheW GG-2 chemotaxis methyltransferase cheR GG-30 glutamate methylesterase cheB GG-32 flagellar L-ring protein precursor flgH GG-36 flagellar P-ring protein precursor flgI GG-9 RNA-polymerase sigma-54 factor GG-14 transcription factor, sigma-54-dependent Motility GG index Motility GG index These results are based on no prior knowledge, apart from genome sequences along with their phenotypic annotations
E. coli chemotaxis and flagellum modules Some E. coli genes are not recovered. Why ? Motility fliI, cheY fliO, cheZ …
GG-2 3-deoxy-manno-octulosonate cytidylyltransferase GG-3 UDP-3-O glucosamine N-acyltransferase GG-4 lipid-A-disaccharide synthase GG-5 polysialic acid capsule expression protein GG-7 UDP-3-O N-acetylglucosamine deacetylase GG-8 3-deoxy-D-manno-octulosonic-acid transferase GG-11 tetraacyldisaccharide 4'-kinase GG-1 outer membrane protein yaeT GG-9 PAL peptidoglycan-associated lipoprotein GG-10 tolQ/exbB protein GG-12 tolB protein GG-72 lipid A biosynthesis lauroyl acyltransferase GG-20 HlyD family secretion protein GG-96 HlyD family secretion protein GG-53 HlyD family secretion protein GG-111 membrane fusion protein (MFP) GG-15 pyridoxal phosphate biosynthetic protein GG-52 pyridoxal phosphate biosynthetic protein GG-35 ABC transporter, permease GG-68 glutaredoxin 3 GG-29 2-octaprenyl-6-methoxyphenol hydroxylase GG-31 glutathione synthetase GG-18 glutaredoxin-related protein GG-73 coproporphyrinogen III oxidase, aerobic GG-107 hydroxyacylglutathione hydrolase Phylogenetic profiles / modules for Gram-staining
GG-63 spore-cortex-lytic enzyme GG-87 spore germination protein GG-104 spore protease GG-136 spore protease related GG-71 stage III sporulation protein AB GG-103 stage III sporulation protein AE GG-132 stage III sporulation protein AG GG-95 stage II sporulation protein E GG-137 stage II sporulation protein M GG-11 stage II sporulation protein P GG-134 stage II sporulation protein R GG-135 stage IV sporulation protein GG-76 stage IV sporulation protein A GG-46 stage IV sporulation protein B GG-40 stage V sporulation protein AC GG-34 stage V sporulation protein AD GG-15 stage V sporulation protein AF GG-37 translocation-enhancing protein GG-94 hypothetical membrane protein GG-127 hypothetical membrane protein GG-8 sporulation-blocking protein yabP GG-130 sporulation sigma-E factor processing peptidase GG-58 stage III sporulation protein AC GG-6 stage III sporulation protein AD GG-3 stage III sporulation protein D GG-49 small acid-soluble spore protein I sspI GG-69 spoVID-dependent spore coat assembly factor GG-101 spore coat protein GG-52 spore coat protein E GG-99 spore coat related, putative GG-97 spore cortex biosynthesis, putative GG-84 spore germination protein GG-90 spore germination protein GG-55 spore germination protein C1 GG-62 sporulation initiation phosphotransferase GG-113 stage III sporulation protein AF GG-64 stage IV sporulation protein FA GG-91 stage VI sporulation protein D GG-54 abi, CAAX amino terminal protease GG-42 cytochrome C-550/C-551 GG-53 cytochrome C oxidase subunit IV GG-36 menaquinol-cytochrome C reductase qcrC GG-50 lipoprotein, putative GG-18 prespore-specific transcriptional regulator GG-66 putative lipoprotein GG-56 putative ribonuclease H GG-26 reductase ribT / acetyltransferase gnaT GG-124 hypothetical membrane proetin GG-118 hypothetical membrane protein GG-29 hypothetical cytosolic protein GG-38 hypothetical cytosolic protein GG-120 hypothetical cytosolic protein GG-24 hypothetical protein GG-27 hypothetical protein GG-28 hypothetical protein GG-30 hypothetical protein GG-31 hypothetical protein GG-32 hypothetical protein GG-33 hypothetical protein GG-41 hypothetical protein GG-43 hypothetical protein GG-47 hypothetical protein GG-60 hypothetical protein GG-61 hypothetical protein GG-65 hypothetical protein GG-67 hypothetical protein GG-68 hypothetical protein GG-70 hypothetical protein GG-72 hypothetical protein GG-73 hypothetical protein GG-83 hypothetical protein GG-88 hypothetical protein, HD domain GG-100 hypothetical protein (ecsc) GG-114 hypothetical protein GG-116 hypothetical protein GG-117 hypothetical protein Focused hypotheses for experimental validation
Conclusion • Systematic association of genotype / phenotype for several phenotypes • Clustering reveals robust modules that corresponds to protein complexes, signal transduction pathways, enzymatic pathways • Many predictions that can be verified experimentally
Acknowledgements • Saeed Tavazoie • Noam Slonim • Tavazoie lab members