390 likes | 530 Views
Turning genomics data into Biology. Martijn Huynen Nijmegen Center for Molecular Life Sciences, Centre for Molecular and Biomolecular Informatics. Comparative genomics. The (somewhat) intelligent comparative genomics meat grinder. Method development. Prediction of protein function, pathways.
E N D
Turning genomics data into Biology Martijn Huynen Nijmegen Center for Molecular Life Sciences, Centre for Molecular and Biomolecular Informatics
Comparative genomics The (somewhat) intelligent comparative genomics meat grinder Method development Prediction of protein function, pathways Evolution of biosystems
A phosphomannomutase (pmm) is predicted to have acquired a phosphoribomutase (deoB) function deoxycitidine Cdd deoxyuridine, deoxythimidine DeoA Glyceraldehyde-3-p, acetaldehyde deoB deoC deoxyribose-1-P deoxyribose-5-P DeoD purine deoxyribonucleosides deoB ? M.genitalium M.tuberculosis deoD deoC deoA cddpmm
Predicting functional relations between genes using (conserved) genomic context http://string.embl.de Genomic Context Types: Co-occurrence Conserved Neighborhood Gene Fusion Dandekar et al., 1998 Overbeek et al., 1999 Marcotte et al., 1999 Enright et al., 1999 Huynen and Bork 1998 Pellegrini et al., 1999 Snel et al., NAR 1999 von Mering et al., NAR 2002 von Mering et al, NAR 2005
YJR109C D2085.1 88 96 YJL130C Rv1384 sll0370 100 AF1274 100 AQ2101 & AQ1172 HP0919 92 88 93 EC0033 MTH997 & MTH996 MJ1378 & MJ1381 83 CarB PyrAB 100 Gene fission in the evolution of carbamoyl phosphate synthase B (carB)
Predicting functional interactions between proteins by the co-occurrence of their genes in genomes. Distribution of four M.genitalium genes among 25 genomes MG299 (pta) 0 0 0 1 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 MG357(ackA) 0 0 0 1 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 MG019(dnaJ) 0 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 MG305(dnaK) 0 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 Using the mutual information between genes as a scoring heuristic for their co-occurrence. M(pta, ackA)=0.69 (phospotransacetylase, acetate kinase) M(dnaJ, dnaK)=0.55 (heat shock proteins) M(dnaJ, ackA)=0.19
Evolutionary conservation of genomic context increases the likelihood of functional interaction 1 0.8 0.6 Fraction same pathway (KEGG) 0.4 Fusion Gene Order 0.2 Co-occurrence 0 0 0.2 0.4 0.6 0.8 1 Evolutionary conservation score
Correlation between the strength of the genomic and functional associations
Genomic associations correlate with a wide array of functional interactions Huynen et al, Genome research 2000
Combining homology information with genomic association for function prediction Repeated occurrence of MG009, a phosphohydrolase, with thymidilate kinase (tmk) suggests a role of MG009 in pyrimidine metabolism.
Conservation of gene order of the hypothetical gene MG134 with dnaX, RecR suggests physical interaction between their gene products
Phylogenomics for protein function prediction An ancient paralog of N7BM has been lost in the same lineages as N7BM itself, implicating a possible role in Complex I Gabaldon et al. (2005) J. Mol. Biol.
Experimental confirmation of a role of the N7BM paralog in Complex I J. Clin. Invest. (2005)
Verified function predictions: Making predictions is easy, testing them is another matter. Protein Context type of interaction function ref Mt-Ku gene order physical interaction double-stranded DNA repair [56] GnlK gene order physical interaction signal transduction for ammonium transport [57,58] PH0272 gene order metabolic pathway methylmalonyl-CoA racemase [59] PrpD gene order metabolic pathway 2-methylcitrate dehydratase [22,60] arok gene order metabolic pathway shikimate kinase [61] ComB gene order metabolic pathway 2-phosphosulfolactate phosphatase [62] KynB gene order metabolic pathway kynurenine formamidase [63] PvlArgDC gene order metabolic pathway arginine decarboxylase [64] FabK gene order metabolic pathway enoyl-ACP reductase [65] FabM gene order metabolic pathway trans-2-decenoyl-ACP isomerase [66] COG0042 gene order tRNA modification tRNA-dihydrouridine synthase [67] Yfh1 co-occurrence process iron-sulfur cluster assembly [68,69] YchB co-occurrence metabolic pathway terpenoid synthesis [70] SmpB co-occurrence process trans-translation [5,71] ThyX complementary enzymatic activity thymidilate synthase [14,72] ThiN complementary enzymatic activity thiamine phosphate synthase [73,74] ThiE complementary enzymatic activity thiamine phosphate synthase [74] Prx fusion pathway peroxiredoxin [75] YgbB fusion/ gene order metabolic pathway terpenoid synthesis [76] SelR fusion./order/co-o. enzymatic activity methionine sulfoxide reductase [14,22,77] FadE reg. sequence metabolic pathway acyl CoA dehydrogenase [78,79] TogMNAB reg. sequence metabolic pathway Oligogalacturonide transport [80,81] MetD reg. sequence metabolic pathway Methionine transport [82] Huynen et al., Curr Op. Cell Biol. 2003
Experimentally confirmed protein functions, predicted with various types of context
Evolutionary conservation of co-expression increases the likelihood of functional interaction
Total # of pairs # of pairs > 0.6 Observed fraction > 0.6 Expected fraction > 0.6 Observed/Expected Gene-pairs with an orthologous gene-pair > 0.6 Worm 18161 803 0.0442* 0.00379 12 Yeast 36548 1215 0.0332* 0.00216 15 Gene-pairs with a paralogous gene-pair > 0.6 Worm 207214 29031 0.1401* 0.00379 37 Yeast 38253 2167 0.0566* 0.00216 26 Low but significant levels of conservation of co-expression (see Teichmann et al, TIBS 2002, Stuart et al., Science 2003) van Noort et al, TIG, 2003
Conservation of protein-protein interaction measured by yeast-2-hybrid increases the likelihood of interaction Comparison of Giot (Fly) and Ito (Yeast), Uetz (Yeast) y-2-h interactions
A “new”, conserved interaction: GTPase XAB1/CG3704 hypothetical, GTPase YOR262/CG10222 XAB1 interacts with the DNA repair protein XPA1, inferred to be required for XPA1’s import in the nucleus. Fraction hypothetical proteins in conserved Y2H interactions relatively low Hypotheticals: In conserved interactions 13 5% In complete genome~1600 27%
Dataset Comparison Protein interactions, both proteins in the other dataset Conserved interactions Fraction conserved interactions Average fraction conserved interactions Ito / Uetz Yeast vs. Yeast 858 / 697 201 23.4% / 28.8% 26.1% Ito / Giot Yeast vs. Fly 229 / 394 45 19.6% / 11.4% 15.5% Uetz / Giot Yeast vs. Fly 120 / 168 33 27.5% / 19.6% 23.5% Conservation of protein-protein interaction between species Physical interaction is reasonably well conserved between (…..compared to the “conservation” within a species…) Huynen et al, TIG, 2004
Is the low level of conservation between S. cerevisiae and C. elegans of co-expression ( < 5%) “real”, reflecting evolution and species-specific interactions, or are we just comparing noisy datasets ? Species specific (idiosyncratic) coregulation: “Efficient expression of the Saccharomyces cerevisiae glycolytic gene ADH1 is dependent upon a cis-acting regulatory element UASRPG found initially in genes encoding ribosomal proteins.” Tornow and Santangelo, Gene, 1990
Noisy genomics data Low (but significant) correlation between ChIP-on-chip data (sharing Transcription Factor Binding Sites) and expression data in S.cerevisiae
Filtering out the noise by combining ChIP-on-chip and co-expression in yeast
High level of conservation of co-regulation after speciation 76 %
Comparing co-regulation in Bacteria indicates a level of conservation of 80% (operons in B. subtilis versus regulons in E.coli) • NB: • Based on operon conservation is only 50% • Disregard cases of gene loss
Noisy genomics data lead to drastic underestimations of conservation of interactions
Conclusions co-regulation conservation • Gene co-regulation tends to be conserved in Eukaryotes (76%) and in prokaryotes (80%) • In the case of gene duplication one gene tends to maintain the co-regulatory link there appears to be one functionally equivalent ortholog Snel et al, Nucleic Acids Res 2004
Exploiting genomics data to predict the function for a hypothetical protein: BolA
An interaction of BolA with a mono-thiol glutaredoxin ? (STRING) BolA
BolA and Grx occur as neighbors in a number of genomes Bola Grx
BolA and Grx have an (almost) identical phylogenetic distribution
BolA and Grx have been shown to interact in Y2H in S.cerevisiae and D.melanogaster, and in Flag tag in S.cerevisiae BolA phylogeny
Cell division / Cell wall (oxidative) stress BolA does have (predicted) interactions with cell-division / cell-wall proteins. Those appear secondary to the link with GrX Genomic context analyses have obtained a higher resolution in function prediction than phenotypic analyses
BolA is homologous to the peroxide reductase OsmC, suggesting a similar function
Protein Family (PDB entry) 3D similarity to BolA. DALI, Z-scores Sequence profile similarity to BolA. COMPASS, SW-score (E-value) OsmC (1ml8A/1lqlA) Ohr (1n2fA) 5.8 / 5.5 5.2 73 (2.4 E-5) KH 1 (1hnxC) 5.3 46 (9.4 E-3) DUF150 (1ib8A) 3.7 44 (4.2 E-2) GMP synthase C (1gpmA) 2.9 57 (7.0 E-4) KH 2 (1egaB) 3.8 35 (2.7 E-1) RBFA (1kkgA) 4.2 40 (9.6 E-2) BolA is, relative to other class II KH folds and sequences, most similar to OsmC
OsmC uses thiol groups of two, evolutionary conserved cysteines to reduce substrates Problem: The BolA family does not have conserved cysteines. …It would have to obtain its reducing equivalents from elsewhere… BolA family alignment
Prediction of interaction partner and molecular function complement each other ? BolA interacts with GrX BolA is (homologous to) a reductase GrX provides BolA with reducing equivalents !?
There is a wealth of functional and structural genomics data that can be related to the function of individual proteins. Exploiting that data is becoming a trade in itself (biochemistry by other means)