240 likes | 390 Views
Hybrid model of protein interaction network: Modularity and the family constraint. Doochul Kim (Seoul National University, Korea) based on K.I- Goh, B. Kahng and D. Kim (q-bio.MN/0312009 v2) and Goh’s talk at Statphys 22. Academia Sinica, Taipei, 2004.09.15. Introduction.
E N D
Hybrid model of protein interaction network:Modularity and the family constraint Doochul Kim (Seoul National University, Korea) based on K.I- Goh, B. Kahng and D. Kim (q-bio.MN/0312009 v2) and Goh’s talk at Statphys 22 Academia Sinica, Taipei, 2004.09.15
Introduction • Most real-world networks are modular.How does the modularity emerge dynamically? • Vertices can be grouped according to their common characteristics: Vertex families. • In some systems, the vertex families can be defined explicitly. • Families themselves form a network, which may also evolve in time [cf. The social network models with a priori defined communities (Jin et al. PRE 2001, Watts et al. Science 2002)]. Family constraint Internet Domains/ASes Hosts/Routers Society Social parties Individuals Cell Protein family Proteins
Yeast protein network Topological properties of yeast protein-protein interaction network [Jeong et al., Nature 2001] [ Maslov & Sneppen Science 2002]
Yeast protein network • From K.-I. Goh, B. Kahng and D. Kim, “Graphical analysis of biocomplex networks and transport phenomena", book chapter in “Power Laws, Scale-free Networks and Genome Biology", eds. E. Koonin, Y. Wolf and G. Karev (Lanes Biosciences, 2004)
Yeast protein network Re-analysis with an integrated data • DIP+MIPS+BIND b. Ho et al. Nature ’02 (MassSpec) • c. Uetz et al. Nature ’00 + Ito et al. PNAS ’01 +Tong et al. Science’02 (Yeast Two Hybrid –Y2H) i) Scale-free ii) Modular clustering iii) Disassortative mixing pd(k)~(k+k0)-gC=0.13 knn(k)~k-n (Crandom0.02)
Network correlations Local clustering coefficient Ci Ci= # of edges between neighborski(ki-1)/2 positive correlation;assortative knn(k) C(k) modular clustering no correlation; neutral hierarchical clustering negative correlation;disassortative no clustering k k 2-node correlations: Average neighbor degree function knn(k)= 3-node correlations; clustering:Local clustering function C(k)
Yeast protein network Protein interactions • Physical interactions between proteins mostly occur on a structural basis (key-and-lock, induced fit, etc). • Protein structure is well conserved during evolution, based on which the proteins can be classified into families. How the high clustering and the strong modular organization appear? Family compatibility constraint
Yeast protein network Protein family network is also scale-free [Park et al. J Mol Biol 2001].
Yeast protein network Protein family size distribution follows a power law [Huynen & van Nimwegen, PNAS 1998].
Model Evolution by gene duplication and divergence (DD) [Ohno, 1970] => incorporated in previous models: - Protein interaction network evolution [Solé et al., Adv Compl Sys 2002; Vázquez et al., ComplexUs 2003]. - Domain occurrence frequency distribution [Qian et al., J Mol Biol 2001; Karev et al., BMC Evol Biol 2002]. Protein family compatibility: The interaction between proteins is possible only when the corresponding families they belong to are “compatible.” - Those connected in the family network are compatible.
Model Basic scenario: duplication+divergence+mutation+family constraint Protein family networkProtein interaction network Mutation with rate 1 Divergence with probabilityd Proteins can interact only with those in the compatible families: E.g, pink protein CANNOT interact with black one.
Model (details) Model: Stage 0 • N0 = 3 proteins in the beginning. • They interact with one another, forming a complete network of size N0: ki(t=0)=2 for i=1,2,3. • Each protein constitutes a protein family: Nf(t=0) = 1. • Each family contains single domain: Df(t=0) = 1. t=0 Protein Protein family Protein interaction Protein family link
Model (details) Model: Stage 1 • With rate a, a randomly chosen protein is duplicated. • Each inherited interactions are removed with probability d. • The new protein establishes a new protein family. • Initial protein family link is determined by protein interactions. • If no interaction is left, it belongs to the original family. removed with probability d duplication with rate a
Model (details) Model: Stage 1-2 • With rate 1, a randomly chosen protein mutates. • The mutating protein i gains a new interaction with another proteins previously not linked chosen with the probability, • Fl Fi sets the constraint on family compatibility. • DFi is increased by 1. mutation with rate 1
Model (details) Model: Stage 2 Family map is fixed • With rate a, a randomly chosen protein is duplicated, which becomes a member of the original family. It again diverges with rate d. (a, d same as before to make the model minimal.) • With rate 1, a randomly chosen protein mutates with the same constraint as in Stage 1.
Model • Parameters for the simulation: • The duplication rate a = 0.8 and the divergence ratio d = 0.7. - d is fixed to accommodate the high level of “sequence diversity” within a family.- a is tuned to match the empirical average degree k ~ 6.4 • Family creation lasts until N = 1000, when the number of families becomes about 500. • Evolution lasts until N 6000, the size of proteome of the yeast.
Results Snapshot N = 600NF = 62
Results Protein interaction network pd(k)~(k+k0)-gC~0.1 knn(k)~k-n Yeast dataModel with family constraintModel without family constraint
Results Degree-correlation profile Model Yeast z = log10[P(k,k’)/Prand(k,k’)] P(k,k’): probability that a randomly chosen link connects proteins with degreeskandk’.
Results: Clustering of clustering a=0.8, d=0.7 Yeast z = log10[P(c,c’)/Prand(c,c’)]
Results Results: Statistics
Results Modular clustering is conserved for edge shuffling conserving family constraint. Model network before shuffling Model network shuffled with family constraintModel network shuffled without family constraint
Results Protein family network p(x)~(x+x0)-a
Summary • “Family constraint” is introduced as a mechanism behind the emergence of modularity in evolving networks. • We applied it to the protein-protein interaction and protein family network of the yeast, and achieved detailed agreement in the topological properties with the empirical data. • Our result suggests the physical constraint encoded in the domain structure within proteins is crucial in the organization of the protein interaction networks. • The concept can be applied in other systems, e.g., the Internet (domains/hosts) and the social networks (social parties/individuals).