LECTURE 3 Complex Network Models Properties of Protein-Protein Interaction Networks

LECTURE 3 • Complex Network Models • Properties of Protein-Protein Interaction Networks • Usage of KNApSack Database

Complex Network Models: • Average Path length L, Clustering coefficient C, Degree Distribution P(k) help understand the global structure of the network. • Some well-known types of Network Models are as follows: • Regular Coupled Networks • Random Graphs • Small world Networks • Scale-free Networks • Hierarchical Networks

Regular networks

Regular networks Diamond Crystal Both diamond and graphite are carbon Graphite Crystal

Regular network (A ring lattice) Average path length L is high Clustering coefficient C is high Degree distribution is delta type.

Random Graph Erdos and Renyi introduced the concept of random graph around 60 years ago.

Random Graph N=10 Emax = N(N-1)/2 =45 p=0.1 p=0 p=0.25 p=0.15

Random Graph Average path length L is Low Clustering coefficient C is low Degree distribution is exponential type. p=0.25

Random Graph Usually to compare a real network with a random network we first generate a random network of the same size i.e. with the same number of nodes and edges. Other than Erdos Reyini random graphs there are other type of random graphs A Random graph can be constructed such that it matches the degree distribution or some other topological properties of a given graph Geometric random graphs: each vertex is assigned random coordinates in a geometric space of arbitrary dimensionality and random edges are allowed between adjacent points or points constrained by a threshold distance.

Geometric random graph: Example

Small world model (Watts and Strogatz) Oftentimes,soon after meeting a stranger, one is surprised to find that they have a common friend in between; so they both cheer: “What a small world!” What a small world!!

Small world model (Watts and Strogatz) Randomly rewire each edge of the network with some probability p Begin with a nearest-neighbor coupled network

Small world model (Watts and Strogatz) Average path length L is Low Clustering coefficient C is high Degree distribution is exponential type.

Scale-free model (Barabási and Albert) Start with a small number of nodes; at every time step, a new node is introduced and is connected to already-existing nodes following Preferential Attachment (probability is high that a new node be connected to high degree nodes)

Average path length L is Low Clustering coefficient C is not clearly known. Degree distribution is power-law type. P(k) ~ k-γ

Scale-free networks exhibit robustness Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly Tolerant to random removal of nodes (mutations) Vulnerable to targeted attack of hubs (mutations) – Drug targets

Scale-free model (Barabási and Albert) The term “scale-free” refers to any functional form f(x) that remains unchanged to within a multiplicative factor under a rescaling of the independent variable x i.e. f(ax) = bf(x). This means power-law forms (P(k) ~ k-γ), since these are the only solutions to f(ax) = bf(x), and hence “power-law” is referred to as “scale-free”.

Hierarchical Graphs NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101 The starting point of this construction is a small cluster of four densely linked nodes (see the four central nodes in figure).Next, three replicas of this module are generated and the three external nodes of the replicated clusters connected to the central node of the old cluster, which produces a large 16-node module. Three replicas of this 16-node module are then generated and the 12 peripheral nodes connected to the central node of the old module, which produces a new module of 64 nodes.

Hierarchical Graphs The hierarchical network model seamlessly integrates a scale-free topology with an inherent modular structure by generating a network that has a power-law degree distribution with degree exponent γ = 1 +ln4/ln3 = 2.26 and a large, system-size independent average clustering coefficient <C> ~ 0.6. The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k –1 a straight line of slope –1 on a log–log plot NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101

NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101 Comparison of random, scale-free and hierarchical networks

protein-protein interaction Typical protein-protein interaction A protein binds with another or several other proteins in order to perform different biological functions---they are called protein complexes.

protein-protein interaction This complex transport oxygen from lungs to cells all over the body through blood circulation PROTEIN-PROTEIN INTERACTIONS by Catherine Royer Biophysics Textbook Online

protein-protein interaction PROTEIN-PROTEIN INTERACTIONS by Catherine Royer Biophysics Textbook Online

Twenty amino acids

Four nucleotides

detected complex data Bait protein Interacted protein A B A D C A E B C D E F Spoke approach B F F Matrix approach C E D Network of interactions and complexes • Usually protein-protein interaction data are produced by Laboratory experiments (Yeast two-hybrid, pull-down assay etc.) • The results of the experiments are converted to binary interactions. • The binary interactions can be represented as a network/graph where a node represents a protein and an edge represents an interaction.

Network of interactions 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 AtpB AtpA AtpG AtpE AtpA AtpH AtpB AtpH AtpG AtpH AtpE AtpH Corresponding network Adjacency matrix List of interactions

The yeast protein interaction network evolves rapidly and contain few redundant duplicate genes by A. Wagner. Mol. Biology and Evolution. 2001 giant component consists of 466 proteins 985 proteins and 899 interactions S. Cerevisiae

The yeast protein interaction network evolves rapidly and contain few redundant duplicate genes by A. Wagner. Mol. Biol. Evol. 2001 Average degree ~ 2 Clustering co-efficient = 0.022 Degree distribution is scale free

An E. coli interaction network from DIP (http://dip.mbi.ucla.edu/). Components of this graph has been determined by applying Depth First Search Algorithm There are total 62 components Giant component 93 proteins 300 proteins and 287 interactions E. coli

An E. coli interaction network from DIP (http://dip.mbi.ucla.edu/). Average degree ~ 1.913 Clustering co-efficient = 0.29 Degree distribution ~ scale free

Lethality and Centrality in protein networks by H. Jeong, S. P. Mason, A.-L. Barabasi, Z. N. Oltvai Nature, May 2001 Almost all proteins are connected 1870 proteins and 2240 interactions S. Cerevisiae Degree distribution is scale free

PPI network based on MIPS database consisting of 4546 proteins 12319 interactions Average degree 5.42 Clustering co-efficient = 0.18 Giant component consists of 4385 proteins

PPI network based on MIPS database consisting of 4546 proteins 12319 interactions Degree distribution ~ scale free

A complete PPI network tends to be a connected graph And tends to have Power law distribution

Introduction to KNApSaCK database http://kanaya.aist-nara.ac.jp/KNApSAcK/

FT-MS high accurate MW for metabolites [molecular weight (ppm)] Since 2004 Candidates of Metabolites accurate mass: 226.0477 Molecular formula 597 # of candidates for molecular formula Error level for FT-MS 251 32 1 ±1 ±0.1 ±0.01 ±0.001 (Mw  margin ) C10H10O6 KNApSAcK: Species-metabolite relation DB Chorismic acid Isochorismic acid

Now! Species Metabolite Last update information 50,048 unique metabolites 101,500 species-metabolite relations Since 2004

Current Status of KNApSAcK project Plant kingdom (Predicted) -- 200,000 D. Strack and R. Dixon (2003) Known NPs (Predicted) -- 50,000 /Plants, Luca and Pierre, (2000) KNApSAcK(last update) -- 50,048unique metabolites 101,500 species-metabolite relations Model species Arabidopsis thaliana -- 5,000 ca. 1/3 of 1200 protein types Human -- 2,500 Ryals (2004) Bacteria (E. coli, B. subtilis) -- 800 – 1700 Systematization of Species-metabolite relation DB(KNApSAcK) Basic study: --Metabolomics (Systems Biol) -- Evolution of NPs -- Gene to metabolite relations Applied works: -- Food Sciences -- Health creation -- Herbal medicine -- Drug development by Herb.

Main window http://kanaya.naist.jp/KNApSAcK/ We can retrieve metabolite information by: (a) Name (Organism, Metabolite) (g) A list of retrieved metabolites (b) Mw  margin (c) Molecular formula (h) Mode selection (d) Taxonomic hierarchy Substrucutre (e) Ion mass of FT-MS with ionization mode

Metabolites can be linked to KNApSAcK easily by Keywords (Organism, Metabolite, Molecular Formula)

(+)-Sesamin is reported in 122 species

Input: Allium cepa 38 Metabolites

KNApSAcK（http:/kanaya.naist.jp/KNApSAcK ）(Since 2004) Papers utilized KNApSAcK DB to examine metabolomics ( Thanks!) Davey, M.P., et al., Metabolomics, (2009) Hounsome, N. et al., Postharvest Biol. Technol., (2009) 6 papers-2009 (Red, Foreign country) Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009) Giavalisco, Anal.Chem.(2009) Draper et al., BMC Bioinformatics, (2009) Shroff et al., PNAS (2009) Malitsky, S.,., et al., Plant Physiol., (2008) 17 papers-2008 (Red, Foreign country) Warner, E., et al., J.Chromatography B,(2008) Fait, A., et al., Plant Physiol., 148, 730-750 (2008) Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008) Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008) Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008) Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008) Overy, D.P., et al., Nature Protocols, 3, 471-485, (2008) Dunn, W.B., Physical Biol.,5, 1-24, (2008) Akiyama, K., In Silico Biol., 8, 27, (2008) Sawada, Plant Cell Physiol., (2008) Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008) Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008) Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008) Takahashi, H., Anal. Bioanal Chem. (in press) (2008) Iijima, Y., et al., Plant J., 54, 949-962, (2008) Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007) 10 papers-2007 Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007) Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007) Gaida, A., and Neumann, S., J. Int. Bioinf., (2007) Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007 Ohta, D., et al., Anal.Biol. Chem.(2007) Nakamura, Y., et al., Planta, (2007) Suzuki, H., et al., Phytochemistry, (2007) Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007) Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007) Kikuchi, K and Kakeya, H., Natuure Chem. Biol., 2, 392-394, (2006)4 papers-2006 Oikawa, A.,et al.,Plant Physiol., 142, 398-413, (2006) Shinbo, Y., et al.,Biotchnol. Agric. Forestry, 57, 166-181, (2006) Shinbo, Y., et al.,J. Comput. Aided Chem., 7, 94-101, (2006) since 2004 Web-sites linked to KNApSAcK (WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis）　http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/ (KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack (TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/ （LECO manual） Form No. 203-821-333 (PubMed) referred by C-ID http://metabolomics.jp/wiki/ 0 5 10 15 20 Target of Research Metabolomics Non-targeted Analysis Review Article Bioinformatics Methodology Development 0 5 10 Arabidopsis thaliana Fragaria x ananassa Salanum lycopersicum Brassica oleracea Curcuma longa E. coli Rattus norvegicus

KNApSAcK（http:/kanaya.naist.jp/KNApSAcK ）(Since 2004) Papers utilized KNApSAcK DB to examine metabolomics ( Thanks!) Davey, M.P., et al., Metabolomics, (2009) Hounsome, N. et al., Postharvest Biol. Technol., (2009) 6 papers-2009 (Red, Foreign country) Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009) Giavalisco, Anal.Chem.(2009) Draper et al., BMC Bioinformatics, (2009) Shroff et al., PNAS (2009) Malitsky, S.,., et al., Plant Physiol., (2008) 17 papers-2008 (Red, Foreign country) Warner, E., et al., J.Chromatography B,(2008) Fait, A., et al., Plant Physiol., 148, 730-750 (2008) Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008) Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008) Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008) Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008) Overy, D.P., et al., Nature Protocols, 3, 471-485, (2008) Dunn, W.B., Physical Biol.,5, 1-24, (2008) Akiyama, K., In Silico Biol., 8, 27, (2008) Sawada, Plant Cell Physiol., (2008) Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008) Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008) Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008) Takahashi, H., Anal. Bioanal Chem. (in press) (2008) Iijima, Y., et al., Plant J., 54, 949-962, (2008) Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007) 10 papers-2007 Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007) Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007) Gaida, A., and Neumann, S., J. Int. Bioinf., (2007) Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007 Ohta, D., et al., Anal.Biol. Chem.(2007) Nakamura, Y., et al., Planta, (2007) Suzuki, H., et al., Phytochemistry, (2007) Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007) Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007) Kikuchi, K and Kakeya, H., Natuure Chem. Biol., 2, 392-394, (2006)4 papers-2006 Oikawa, A.,et al.,Plant Physiol., 142, 398-413, (2006) Shinbo, Y., et al.,Biotchnol. Agric. Forestry, 57, 166-181, (2006) Shinbo, Y., et al.,J. Comput. Aided Chem., 7, 94-101, (2006) since 2004 Web-sites linked to KNApSAcK (WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis）　http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/ (KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack (TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/ （LECO manual） Form No. 203-821-333 (PubMed) referred by C-ID http://metabolomics.jp/wiki/ Target of Research 0 5 10 15 20 Metabolomics Non-targeted Analysis Review Article Bioinformatics Methodology Development 0 5 10 Arabidopsis thaliana Fragaria x ananassa Salanum lycopersicum Brassica oleracea Curcuma longa E. coli Rattus norvegicus

Recent information about the research works that used/introduced the KNApSAcK database

LECTURE 3 Complex Network Models Properties of Protein-Protein Interaction Networks

LECTURE 3 Complex Network Models Properties of Protein-Protein Interaction Networks

Presentation Transcript

PROTEIN INTERACTION NETWORK – INFERENCE TOOL

Protein-protein interaction

Protein Complex and Protein-protein Interaction

Lecture 2 Complex Network Models Properties of Protein-Protein Interaction Networks Handling Multivariate data: Concept

Protein Interaction Networks

Lecture 2 Complex Network Models Properties of Protein-Protein Interaction Networks

Protein-Protein Interaction in the Rho-RhoGAP Complex

LECTURE 2 Complex Network Models Properties of Protein-Protein Interaction Networks

Protein – protein interaction

Protein-Protein Interaction Network

Lecture 3 Protein binding networks

LECTURE 2 Complex Network Models Properties of Protein-Protein Interaction Networks

Protein Properties

V12: Reliability of Protein Interaction Networks

V10: Reliability of Protein Interaction Networks

V9: Reliability of Protein Interaction Networks

Discovering functional interaction patterns in Protein-Protein Interactions Networks

Advanced Bioinformatics Lecture 3: Protein-protein interaction

Modular Organization of Protein Interaction Network

Protein Properties