370 likes | 481 Views
Proteome analysis in silico. Part II Protein interactions and networks. Peer Bork EMBL & MDC Heidelberg & Berlin. bork@embl.de http://www.bork.embl-heidelberg.de/. II. Protein network analysis. Genomic context analysis: Interaction predictions.
E N D
Proteome analysis in silico Part IIProtein interactions and networks Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl.de http://www.bork.embl-heidelberg.de/
II. Protein network analysis Genomic context analysis: Interaction predictions Genomic context analysis: Interaction predictions Building and destroying interaction networks STRING: a framework for network analysis Towards spatial and temporal network aspects www.bork.embl-heidelberg.de
Genomic context methods to predict protein interactions Dandekar et al. TIBS 98 Enright et al. Nature 99 Marcotte et al. Science 99 Overbeek et al. PNAS 99 Pellegrini et al. PNAS 99 Korbel et al., Nat. Biotechn. 04 Morett et al., Nat. Biotechn. 03
Prediction of analogous enzymes by anti-correlation of gene occurrences Species A B C D Gene a -+- - Gene b +-+ - Application: thiamine-PP biosynthesis Collaboration with Enrique Morett et al., Mexico Morett et al., Nature Biotech. 21(03)790
Gene neighbourhood conservation at evolutionary time scales Conservation of divergently transcribed gene pairs reveal functional constraints
The more conserved divergently transcribed neighboring genes are, the higher is their level of co-expression The resulting prediction method can reliably predict associations between>2500 pairs of genes; ca 650 of which are supported by other methods Korbel, Jensen, von Mering, Bork Nat. Biotechnol. 2004, July
Transcriptional regulators comprise the majority of conserved divergently transcribed gene pairs They are all Self- Regulatory !
Coverage: Homology vs. context (80% accuracy level, taken from STRING COG mode) Huynen, Snel, von Mering and Bork . Curr.Opin.Cell.Biol. 15(03)191
II. Protein network analysis Genomic context analysis: Interaction predictions Genomic context analysis: Interaction predictions Building and destroying interaction networks Building and destroying interaction networks STRING: a framework for network analysis Towards spatial and temporal network aspects www.bork.embl-heidelberg.de
Conserved Neighborhood Phylogenetic Co-occurence Gene fusion events …allowing the study of networks Three context methods to predict functional interactions combined and quantified in STRING Von Mering et al. NAR 31(03)258
Biochemical pathways vs functional modules comparative genomics: functional modules purine biosynthesis histidine biosynthesis www.string.embl-heidelberg.de pathway representation
Giant component of gene context network High local connectivity, (c=0.6); hence lot of substructure The more conservation (red) the higher the number of connections
Biochemical pathways vs functional modules purine biosynthesis histidine biosynthesis pathway representation unsupervised clustering comparative genomics: functional modules Coverage: >70% Specificity: ca 90% Von Mering et al. PNAS 100 (2003) 15428
Biological discoveries - Functional assignment of >3000 hypothetical proteins - Missing enzymes in known pathways - ‘Target’ for transcription regulators, transporters etc. - Pathways links (CoA and nucleotide biosynth.) - Independent modules within known pathways - Potentially novel pathways/processes/complexes
Synergies between homology and context based methods Uracil Permease Uncharacterized Query protein: Putative transcriptional regulator, uncharacterized Riboflavin biosynthesis Pyrimidine biosynthesis novel Uncharacterized response regulator Query protein: Known transcriptional regulator PyrR STRING annotations known Doerks et al. TIG, 2004
Biological discoveries - Functional assignment of >3000 hypothetical proteins - Missing enzymes in known pathways - ‘Target’ for transcription regulators, transporters etc. - Pathways links (CoA and nucleotide biosynth.) - Independent modules within known pathways - Potentially novel pathways/processes/complexes www.bork.embl-heidelberg.de
Functional Categories (COG): YfbU YeaH YcgB YeaG Information Processing: Translation, Transcription, DNA. YeaH predicted Integrin I domain Cellular Processes: Transport, Motility, Signalling YeaG predicted ATPase domain Metabolism: Anabolism, Catabolism, Energy Unassigned/Uncharacterized, or multiple assignments Functional modules in E.coli (Only modules with >3 nodes shown) About 650 modules predicted (120 metabolic) About 140 modules dominated by ‘hypotheticals’
II. Protein network analysis Genomic context analysis: Interaction predictions Building and destroying interaction networks Building and destroying interaction networks STRING: a framework for network analysis STRING: a framework for network analysis Towards spatial and temporal network aspects www.bork.embl-heidelberg.de
Functional associations between proteins 80.000 from large-scale approaches in yeast
Counting functional associations: Binary interactions vs. groups of interacting proteins SHS1 TAP purification CDC10 CIN2 CDC12 HMS-PCI purification CDC3 GIN4 two-hybrid interaction CDC11 annotated member of septin complex ARC1 SPR28 LPD1
E G M P T B F O A R D C U E G M P T B F O A R D C U Distribution of interacting proteins (TAP complexes) energy production aminoacid metabolism other metabolism translation transcription transcriptional control protein fate cellular organization transport and sensing stress and defense genome maintenance cellular fate/organization uncharacterized interaction density 0 10 (actual interactions per 1000 possible pairs)
E E G G M M P P T T B B F F O O A A R R D D C C U U E E G G M M P P T T B B F F O O A A R R D D C C U U Reference interactions manually annotated protein complexes: MIPS / YPD high-throughput interaction data: OVERLAP OF 2+ METHODS 2455 interactions 10907 interactions
Protein interaction datasets E E E E E E G G G G G G M M M M M M P P P P P P T T T T T T B B B B B B F F F F F F O O O O O O A A A A A A R R R R R R D D D D D D C C C C C C U U U U U U E E E E E E G G G G G G M M M M M M P P P P P P T T T T T T B B B B B B F F F F F F O O O O O O A A A A A A R R R R R R D D D D D D C C C C C C U U U U U U purified complexes (TAP) purified complexes (HMS-PCI) genomic associations 18027 interactions 7446 interactions 33014 interactions mRNA synexpression yeast two-hybrid synthetic lethals 16496 interactions 886 interactions 5125 interactions
A probabilistic approach for functionprediction (update to 89 species) Benchmarking high-throughput interaction data Von Mering.C, Krause. R, Snel, B., Oliver, S.G., Fields, S. and Bork, P Nature 417(2002)399 100 purified complexes TAP Purified Complexes HMS-PCI genomic associations 10 mRNA synexpression two methods fraction of reference set covered by data ( %; log scale) Coverage synthetic lethality combined evidence yeast two-hybrid 1 1 three methods raw data filtered data parameter choices 0.1 0.1 1 1 10 100 Accuracy fraction of data confirmed by reference set (%; log scale)
STRING: known and predicted functional links Please show me the functional context of these proteins? ATP1 QCR2
Conserved Neighborhood Known Pathways/Complexes Phylogenetic Profiles High-throughput Experiments QCR2 QCR2 Co-expression Literature Co-occurrence STRING: known and predicted functional links ATP synthase Ubiquinol-Cyt.C reductase
II. Protein network analysis Genomic context analysis: Interaction predictions Building and destroying interaction networks STRING: a framework for network analysis STRING: a framework for network analysis Towards spatial and temporal network aspects Towards spatial and temporal network aspects www.bork.embl-heidelberg.de
EMBL’s Structural and Computational Biology unit From molecules to organisms + Endosomes + Peroxisomes + + Mitochondria Golgi - Nucleus ER + + Microtubules + + + + + NMR Xray EM Computational Biology 3D tomography Protein/DNA Complex Synchrotons Subcellular structure Gene expression Cell Biology Cell Core facilities Developmental Biology Organism In red: other EMBL units
rules constraints Have we seen any of these domains interacting before? Interface of 3D structure of interaction Parameters Predict which subunits interact Side-chain to side-chain Side-chain to main-chain Compatible? Build assembly EM screen From interactions to 3D protein complexes: Large scale modeling and EM mapping (exosome case study: Aloy et al., EMBO Rep, 2002) Characterise the domains TAP (Cellzome) x300=
3D Structure-based assembly of protein complexes Analysis of 101 yeast complexes and their interactions From functional associations to three dimensional assemblies Aloy, P., Boettcher B., Ceulemans, H., Leutwein, C., Mellwig, C., Fischer, S., Gavin, A.-C., Bork, P., Superti-Furga, G., Serrano, L. and Russell, R.B. Science303 (2004)2026
4D Dynamic complex formation duringthe 90 min yeast cell cycle Multiple arrays reveal 600 periodically expressed genes Projection to interaction data identifies novel assemblies Details on the time dependent formation in some assemblies revealed Some unknown proteins detected in well-studied cell cycle assemblies Color: periodically expressed proteins Lichtenberg, Larsen et al
Losses/Gains of Functional Associations M. pneumoniae & M. genitalium M. pneumoniae only (Linked by conserved neighborhood or fused proteins, combined score >0.95) www.bork.embl-heidelberg.de
Comparison of the interaction networks in three mollicutes Differential analysis ribose/xylose sugar-transport Gene present (+) M. pneumoniae + + + + fructose-specific phosphotransferase system (plus assoc. enzymes) U. parvum + + + + M. pulmonis + + + + urease enzyme complex glycerol metabolism ABC-type phosphate transport system (incl. regulator) www.bork.embl-heidelberg.de
TCA cycle Modification of functional modules at evolutionary time scales Huynen et al TIM 1999
Summary (network analysis) Gene context and other concepts for interaction predictions not only complement homology approaches, but are about to offer more functional information than blast et al. Gene context methods have already ca 90% specificity/70% sensitivity in predicting functional modules in prokaryotes In eukaryotes, accurate prediction of networks and modules is still difficult and heterogenous expermental data have to be integrated Spatial and temporal aspects of protein networks have a great potential although data are still limited
Credits Context methods Enrique Morett et al. (Mex) Functional modules Christos Ouzounis et al. (EBI) STRING Berend Snel, Martijn Huynen (Nejm) Networks in 3D Rob Russell, Pattrick Aloy, Bettina Boettcher (EMBL) Cellzome AG Network in4D Ulrik de Lichtenberg, Soren Brunak (CBS) Chicken international sequencing and analysis consortium + all other group members + many experim. collaborators