1 / 56

Network Analysis of Protein-Protein Interactions

Network Analysis of Protein-Protein Interactions. Xiaohua Tony Hu College of Information Science & Technology Drexel University Philadelphia, PA 19104 http://www.cis.drexel.edu/faculty/thu 215-8950551(O) 215-8952494(F). Outline. Introduction Research Goals Topological Analysis

clayland
Download Presentation

Network Analysis of Protein-Protein Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Analysis of Protein-Protein Interactions Xiaohua Tony Hu College of Information Science & Technology Drexel University Philadelphia, PA 19104 http://www.cis.drexel.edu/faculty/thu 215-8950551(O) 215-8952494(F)

  2. Outline • Introduction • Research Goals • Topological Analysis • The algorithm - CommBuilder • Experimental Results • Conclusions & Discussion • Q&A

  3. Introduction • Biological Networks • Protein-Protein Interaction (PPI) Networks • Network Community Structure • Community Detection • Network Growing

  4. GENOME PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle Biological Networks

  5. Biological Networks • Modeling biological systems • Genetic networks • Gene association and expression • Protein networks • Protein structure and interactions • Metabolic pathways • Challenging • Non-trivial and irregular • Incomplete and noisy

  6. Related Work • The “small world” model was first proposed by Watts and Strogatz, referring to the small average distance between any two vertices in the network. • Barabasi and Albert discovered a highly heterogeneous PPI network with scale-free connectivity properties in yeast., PPI networks of S. cerevisiae, H. pylori, C.elegans, and D. melanogaster [11-13]. Thus, the scale-free network model has been well accepted. • In PPI networks, not only the degree distribution exhibits power-law dependence, other topological properties have also been shown scale-free topology such as clustering coefficient. Yook and colleagues observe that the clustering coefficient of S. cerevisiae follows a power-law. • However, not all research agrees on the power-law behavior in all PPI networks. Thomas and colleagues find that the connectivity distribution in a human PPI network does not follow power law. They argue that current belief of power law distribution may reflect a behavior of a sampled subgraph. From a slightly different angle, • Colizza and colleagues also evaluate three PPI networks constructed from yeast data sets. Although they observe that the connectivity distribution follows power law, only one of the three networks exhibits approximate power law behavior for the clustering coefficient. Soffer and Vazquez find that the power law dependence of the clustering coefficient is to some extent caused by the degree correlations of the networks, with high degree vertices preferentially connecting with low degree vertices.

  7. PPI Networks – Why? • Proteins are executors of genetic program and rarely act alone • Functional assignments of uncharacterized proteins • Targets of new drugs

  8. PPI Networks - Properties • PPI network models • Scale-free (Barabasi & Albert, 1999) • Geometric random (Przulj et al, 2004) • Tolerant to random errors but fragile against the removal of the most connected nodes (hubs) • Modularity and community structure

  9. PPI Networks - Models A.-L. B. and Z.N. Oltvai, Nat. Rev. Gen.(2004)

  10. Yeast PPI Network Nodes: proteins Edges : physical Interactions H. Jeong et al, Nature 411, 41-42 (2001).

  11. PPI Networks - Topology H. Jeong et al., Nature 411, 41-42 (2001)

  12. PPI Networks - Topology Origin of scale-free gene duplication Preferential attachment Vazquez et al. 2003; Sole et al. 2001; Bhan et al. 2002.

  13. Network Community Structure • Gathering of vertices into groups such that the connections within groups are denser than between groups (Girvan & Newman, 2002) • An important property of PPI networks • Delineation of functional groups/processes • Transfer of information

  14. Community Detection • The GN algorithm (Girvan & Newman, 2002) • Based on betweenness • High computational cost • Well adopted • Metabolic networks (Holme et al, 2003) • Functional units • Gene networks (Wilkinson & Huberman, 2004) • Related genes

  15. Network Growing • Genetic regulatory networks (Hashimoto et al, 2004) • Based on probabilistic Boolean networks • Web (Flake et al, 2002) • Based on the self-organization of the network structure and a maximum flow method

  16. Graph Theory • Modeling real-world phenomena, e.g. World Wide Web, electronic circuits, collaborations between scientists, co-citations, biological networks, etc. • A mathematical formalism • Global properties: e.g. diameter, clustering, degree distribution • Local properties: vertex density, motif and graphlet

  17. Graph Theory - Models • Random Model • The probability of an edge between two nodes is distributed randomly. • Erdos & Renyi • Small-world Model • Small diameters and large clustering coefficients • Watts & Strogatz

  18. Graph Theory - Models • Scale-free Model • Degree distribution follows a power law of the form P(k) ~ k−γ. • Robustness and fragility • Preferential attachment (graph evolution) • A-L Barabasi • Random Geometric Model • Nodes randomly distributed in a geometric space (Przulj, et al)

  19. Research Goals • To analyze the topological properties of protein-protein interaction networks • Different organisms • Different experimental systems • Different confidence levels

  20. Research Goals • What is the community to which a given protein belongs? • Desirable and computationally more feasible to study a community containing a few proteins of interest

  21. Topological Analysis • Method • Protein-Protein Interaction Networks • Constructed from different data sets • Statistical analysis of topological properties • SPSS

  22. Topological Analysis • Data Sets • DIP – Database of Interacting Proteins • Species-specific PPI data sets: • D. melanogaster (fly), S. cerevisiae (yeast), E. coli, C. elegans (worm), H. pylori, H. sapiens (human), M. musculus (mouse) • BIND – Biomolecular Interaction Network Database • Fly PPI with assigned confidence scores (Giot, 2003) • GRID – General Repository for Interaction Datasets • Yeast and fly PPI (including experiment systems used to obtain the data)

  23. Topological Analysis The experimental systems-specific set includes (1) fly and yeast PPI networks, downloaded from the General Repository for Interactions Datasets (GRID). From fly data set, we constructed three PPI networks, representing interactions detected by one of the following experimental systems: Enhancement (Fly-E), Suppression (Fly-S), and Two Hybrid (Fly-TH). From yeast data set, we constructed PPI networks representing three experimental systems: Affinity Precipitation (Yeast-AP), Synthetic Lethality (Yeast-SL), and Two Hybrid (Yeast-TH). (2) We also constructed a network representing the entire set of protein interactions (Fly and Yeast). The confidence levels-specific set contains fly data set downloaded from the Biomolecular Interaction Network Database (BIND). We constructed three networks: one with confidence >= 0.5 (Fly50), one with confidence >= 0.3 (Fly30), and the third containing all interactions (Fly00).

  24. Topological Analysis • Definitions • Graph Vertex (or Node) Degree: number of edges connected to the vertex. G(V, E) V: vertex set E: edge set |V|, |E|: sizes V1 e.g. |V| = 4 |E| = 6 Edge

  25. Topological Analysis • Degree distribution P(k) • the probability of a vertex has degree of k. • power law: P(k) ~ k-γ • Diameter (length) • the shortest path from one vertex to another

  26. Topological Analysis • Clustering coefficient (C) Ci= 2ei / (ki*(ki – 1)) ei : # of edges between neighbors of vertex i ki : # of neighboring vertices of i i not included in both • Vertex Density (D) • Same as C butincludes i

  27. Table 1 Protein interaction networks of different organisms.

  28. Table 2 Protein interaction networks of D. melanogaster – different confidence.

  29. Table 3 Protein interaction networks of S. cerevisiae: with confidence.

  30. Table 4 Protein interaction networks of D. melanogaster: different experimental systems

  31. Table 5 Protein interaction networks of S. cerevisiae: different experimental systems

  32. Table 6 Topology of protein interaction networks.

  33. Table 6 (conti.) Topology of protein interaction networks.

  34. Table 6 (conti.) Topology of protein interaction networks.

  35. Observations:Average Topological Properties of the PPI Networks • Across species, all networks exhibit small values of average degree and diameters, even though the absolute values differ significantly. Except for C. elegans, PPI networks for all other species have larger average clustering coefficient comparing to the corresponding random clustering coefficient, indicating a non-random and hierarchical structure within these networks. • Networks with higher confidence have larger diameters, larger average clustering coefficient, and a smakker average degree. They shift further away from random structure.

  36. Figure 1 Degree Distribution P(k): PPI of Different Organisms

  37. Figure 2 Degree Distribution P(k) of yeast: with Confidence.

  38. Figure 3 Degree Distribution P(k) of Fly: with Confidence.

  39. Figure 4 Degree Distribution P(k): Methodology Difference.

  40. Observations: Degree Distribution P(k) • The log-log plot clearly demonstrates the power law dependence of P(k) on degree k. For our analysis, we select to use directly the raw data, instead of following [4] with exponential cutoff. Without exponential cutoff, our regression analysis yields power law exponent γ between 1.31 and 2.76, in fairly good agreement with previously reported results. • Using SPSS software package, we create a scatter plot of residues by fit values for the power law model. The result, shown clearly indicates a pattern in the data that is not captured by the model. This means that the power law is a model that has excellent fit statistics, but poor residuals, an indication of its inadequacy.

  41. Figure 5 Average clustering coefficient C(k): Different Organisms.

  42. Figure 6 Average clustering coefficient C(k): with Confidence.

  43. Figure 7 Average clustering coefficient C(k): Methodology Difference.

  44. Observations: The Average Clustering Coefficient • indicate that while E. coli and S. cerevisiae PPI networks show somewhat weak power law distribution, networks of other species do not follow a power law. Different experimental systems and different confidence levels do not seem to change this non-scale-free behavior.

  45. Figure 8 Average vertex density D(k): Different Organisms

  46. Figure 9 Average vertex density D(k): with Confidence.

  47. Figure 10 Average vertex density D(k): Methodology Difference.

  48. Observations: The Average Vertex Density • All networks display consistent power law behavior for the vertex density spectrum

  49. Topological Analysis • Exponents • Degree Distribution: P(k) ~ k-γ • Clustering Coeffient: C(k) ~ k-α • Vertex Density: D(k) ~ k-β

  50. Table 7 Statistical analysis of the protein interaction networks. * P > 0.05

More Related