690 likes | 883 Views
Bioinformatics: Applications. ZOO 4903 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Jonathan Wren Protein-Protein Interaction Networks. Lecture overview. What we’ve talked about so far Proteins & their domains Protein 3D structure Overview Proteins do not function in a vacuum
E N D
Bioinformatics: Applications ZOO 4903 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Jonathan Wren Protein-Protein Interaction Networks
Lecture overview • What we’ve talked about so far • Proteins & their domains • Protein 3D structure • Overview • Proteins do not function in a vacuum • Methods of detecting protein-protein interactions (PPI) • Structure and types of networks • Behavior of networks
Cells are crowded places! Hopper & Mayer, 1999, Prokaryotes. Am.Sci. 87:518
Importance of protein-protein interactions • Many cellular processes are regulated by multiprotein complexes • Distortions of protein interactions can cause diseases • Protein function can be predicted by knowing functions of interacting partners (“guilt by association”) A comparison of sequence (GenBank) and protein-protein interaction data (DIP database) Adapted from S. Fields, FEBS, 2005
Types of protein-protein interactions (PPI) Non-obligate PPI Obligate PPI usually permanent the protomers are not found as stable structures on their own in vivo Stable (many enzyme-inhibitor complexes) dissociation constant Kd=[A][B] / [AB] 10-7÷ 10-13 M Transient Weak (electron transport complexes) Kd mM-M Non-obligate transient homodimer, Sperm lysin (interaction is broken and formed continuously) Intermediate (antibody-antigen, TCR-MHC-peptide, signal transduction PPI), KdM-nM Strong (require a molecular trigger to shift the oligomeric equilibrium) KdnM-fM Obligate heterodimer Human cathepsin D Non-obligate permanent heterodimer Thrombin and rodniin inhibitor Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP
Multiple interactions: Guanine-nucleotide binding protein Adapted from Vetter & Wittinghofer, Science 2001
Multiple interactions: Guanine-nucleotide binding protein Question: How conserved are the interactive vs non-interactive portions of this protein? Adapted from Vetter & Wittinghofer, Science 2001
Pair of duplicated proteins Pair of duplicated proteins Shared interactions Shared interactions Protein evolution - gene duplication Right after duplication Over time
Methods of identifying PPIs • Experimental • Protein-protein arrays • Y2H assay • TAP assay • Computational/Inferential • Interolog analysis • Co-localization, co-expression • Correlated mutations • Text-mining
Interologs • Homolog • Common ancestors • Common 3D structure • Common active sites • Ortholog • Derived from Speciation • Paralog • Derived from Duplication • Interolog • Conserved Protein-Protein Interaction Thus, finding one PPI may yield dividends!
Protein Arrays H Zhu et al (2000) “Analysis of yeast protein kinases using protein chips” Nature Genetics 26: 283-289
The Two-Hybrid System • Two hybrid proteins are generated with transcription factor domains • Both fusions are expressed in a yeast cell that carries a reporter gene whose expression is under the control of binding sites for the DNA-binding domain Activation Domain Prey Protein Bait Protein Binding Domain Reporter Gene
The Two-Hybrid System • Interaction of bait and prey proteins localizes the activation domain to the reporter gene, thus activating transcription. • Since the reporter gene typically codes for a survival factor, yeast colonies will grow only when an interaction occurs. Activation Domain Prey Protein Reporter mRNA Bait Protein Reporter mRNA Reporter mRNA Reporter mRNA Binding Domain Reporter mRNA Reporter Gene
Genome-wide analysis by Y2H • Matrix approach: a matrix of prey clones is added to the matrix of bait clones. Diploids where X and Y interact are selected based on the expression of a reporter gene. • Library approach: one bait X is screened against an entire library. Positives are selected based on their ability to grow on specific substrates. --------------------------------------------------------- Uetz et al Nature 2000 – 957 putative interactions in Yeast Rain et al Nature 2001 – 1,200 putative interactions in H. Pylori Ho et al Nature 2002 – 3,617 putative interactions in Yeast (Mass Spec) Adapted from B. Causier, Mass Spectroscopy Reviews, 2004
Advantages of Y2H • In vivotechnique, good approximation of processes which occur in higher eukaryotes. • Transient interactions can be determined, can predict the affinity of an interaction. • Can be used to detect potential interactions of genes not yet observed to be translated into proteins (e.g. rarely expressed) or novel constructs (e.g. therapeutics) • Relatively fast and efficient.
Disadvantages of Y2H • Fusion of a protein into chimeras can change the structure of a target • Protein interactions can be different in yeast and the organisms where the genes came from • It is difficult to target extracellular proteins • It is hard to detect interactions between proteins active only in a complex • Proteins which can interact in two-hybrid experiments, may never interact in vivo
Tandem affinity purification method (TAP) • Target protein ORF is fused with the DNA sequences encoding TAP tag; • Tagged ORFs are expressed in yeast cells and form native complexes; • The complexes are purified by TAP method; • Components of each complex are found by gel electrophoresis or MS.
Tandem affinity purification method (TAP) TAP tag consists of two IgG binding domains of Staphylococcus protein A and calmodulin binding peptide; -------------------------------------- 7123 interactions can be clustered into 547 complexes (Krogan et al, 2006) O. Puig et al, Methods, 2001
Differences and similarities between Y2H and MS-TAP • TAP permits protein complexes to be isolated, but cannot detect weak/transient PPIs • Both methods generate a lot of false positives, only ~50% interactions are biologically significant • Y2H is in vivo technique • MS can detect large stable complexes and networks of interactions
Text Mining • Searching Medline or PubMed for words or word combinations • Co-occurrence of terms is the simplest metric, yet lends to a higher FP rate • NLP methods are more specific (e.g., “X binds to Y”; “X interacts with Y”; “X associates with Y” etc.) yet are difficult to detect so it has a higher FN rate • Normally requires a list of known gene names or protein names for a given organism
Pre-BIND • Used Support Vector Machine (SVM) to scan literature for PPIs • Precision, accuracy and recall of 92% for correctly classifying PPI abstracts • Estimated to capture 60% of all abstracted protein interactions for a given organism Donaldson et al. BMC Bioinformatics 2003 4:11
Drosophila interaction map From: A Protein Interaction Map of Drosophila Giot et al. Science 302, 1727-1136 (2003)
Comparing large scale data of protein-protein interactions • All methods except for Y2H and synthetic lethality technique are biased toward abundant proteins. • PPI are biased toward certain cellular localizations. • Evolutionarily conserved proteins have much better coverage in Y2H than the proteins restricted to a certain organism. Von Mering et al, Nature, 2002
Functional organization of yeast proteome: network of protein complexes • Essential gene products are more likely to interact with essential rather than nonessential proteins • Orthologous proteins interact with complexes enriched with orthologs Gavin et al, Nature, 2002
PPI Databases online • DIP • http://dip.doe-mbi.ucla.edu/ • MIPS (small scale) • http://mips.gsf.de/proj/ppi/ • BIND (PPI, Prot-DNA, Prot-SM) • http://www.bind.ca (now owned by Unleashed) • OPHID (predicted interactions) • http://ophid.utoronto.ca/ophid/ • MINT - Molecular Interactions Database • http://mint.bio.uniroma2.it/mint/Welcome.do • IntAct (EBI) • http://www.ebi.ac.uk/intact/site/ • InterDom (domain interactions) • http://interdom.lit.org.sg/ • STRING (EMBL) • http://string.embl.de/
Types Experiment (E) Structure detail (S) Predicted Physical (P) Functional (F) Curated (C) Homology modeling (H) *International Molecular Exchange (IMEx) consortium Interaction databases
Comparing the DBs • High FP rate in high- throughput exp. • Disagreement between benchmark sets • Experimental PPI data is sparse relative to all PPIs, so dataset overlap is small and hard to confirm with multiple sources
PPI network properties Nodes & connections
Characteristics of networks n = nodes, k = connections or “edges” K=2 K=2 K=3 K=1 • In biology, n refers to genes/proteins (and/or metabolites) while k refers to interactions
Network properties • Network Structure Metrics • Average path length • Degree distribution(connectivity) • Clustering coefficient • Network Structure Types • Regular • Random • Small-world • Scale-free
Network properties • Network Metrics • Average path length • Degree distribution(connectivity) • Clustering coefficient • Network Structures • Regular • Random • Small-world • Scale-free
Scale-free networks New nodes preferentially attach to highly connected ones Coined by A.L. Barabasi in 1998
Different network models: Barabasi-Alberts. Model of preferential attachment. • At each step, a new node is added to the graph. • The new node is attached to one of old nodes with probability proportional to the vertex degree. ln(P(k)) Degree distribution – power law distribution. ln(k) Barabasi & Albert, Science, 1999
Properties of scale-free networks. Multiplying k by a constant, does not change the shape of the distribution – scale free distribution. From T. Przytycka • Small diameter • Tolerance to errors and attacks • But: sub-networks can be scale-free while underlying degree distribution is not.
Difference between scale-free and random graph models. . Random networks are homogeneous, most nodes have the same number of links. Scale-free networks have a number of highly connected verteces. Adapted from Jeong et al, Nature, 2000