490 likes | 715 Views
Modularity in Biological networks. Traditional view of modularity:. Modularity in Cellular Networks. Hypothesis: Biological function are carried by discrete functional modules. Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature , 1999.
E N D
Traditional view of modularity: Modularity in Cellular Networks • Hypothesis: Biological function are carried by discrete functional modules. • Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999. • Question: Is modularity a myth, or a structural property of biological networks? (are biological networks fundamentally modular?)
Definition of a module • Loosely linked island of densely connected nodes • Groups of co-expressed genes
Computational analysis of modular structuresData clustering approach
Concept of data clustering analysis • Partitioning a data set into groups so that points in one group are similar to each other and are as different as possible from the points in other groups. • The validity of a clustering is often in the eye of beholder.
Concept of data clustering analysis • In order to describe two data points are similar or not, we need to define a similarity measure. • We also need a score function for our objectives. • A clustering algorithm can be used to partition the data set with optimized score function.
Types of clustering algorithms • Partition-based clustering algorithms • Hierarchical clustering algorithms • Probabilistic model-based clustering algorithms
Partitioning problem • Given the set of n nodes network D={x(1),x(2),∙∙∙,x(n)}, our task is to find K clusters C={C1,C2,∙∙∙,CK} such that each node x(i) is assigned to a unique cluster Ck with optimized score function S(C1,C2,∙∙∙,CK).
Community structure of biological network Community 2 Community 1 Community 3
Score function for network clustering • To maximize the intra group connections as many as possible and to minimize the inter group connection as few as possible.
Adjacency Matrix • Aij= 1 if ith protein interacts with jth protein • Aij=0 otherwise • Aij=Aji (undirected graph) • Aij is a sparse matrix, most elements of Aij are zero
Algorithm (Spectral analysis) • Randomly assign a vector X=(X1,X2,…,Xn) • Iterate X(k+1)=AX(k) untill it converges • Try another vector which is perpendicular to previous found eigenspace
Topological Structure Hidden Topological Structure Original Network
An example Protein-protein interaction network of Saccharomyces cerevisiae
Data source Assign 80000 interactions of 5400 yeast proteins a confidence value We take 11855 interactions with high and medium confidence among 2617 proteins with 353 unknown function proteins.
Quasi-bipartite Quasi-clique negative eigenvalue Positive eigenvalue
With the spectral analysis, we obtain 48 quasi-cliques and 6 quasi-bipartites. • There are annotated proteins, unannotated and unknown proteins within a quasi-clique
Hierarchical clustering algorithm • A similarity distance measure between node i and j, d(i,j) • The similarity measure can be let the network to be a weighted network Wij.
Types of hierarchical clustering • Agglomerative hierarchical clustering • Divisive hierarchical clustering
Properties of similarity measure • d(i,j)≥0 • d(i,j)=d(j,i) • d(i,j)≤d(i,k)+d(k,j)
Similarity measure for agglomerative clustering • Correlation • Shortest path length • Edge betweenness
Hierarchical tree (Dendrogram) threshold
Distance between clusters Cluster 2 Cluster 1 Single link
Distance between clusters Cluster 2 Cluster 1 Complete link
Single link 1.5 2.0 2.2 3.5 x2 x3 x1 x4 x5
Divisive hierarchical clustering M.E.J., Newman and M. Girvan, Phys. Rev. E 69, 026113, (2004)
Quantitative measurement of network modularity Modularity Q
Can we identify the modules? J(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link
Modules in the E. coli metabolism E. Ravasz et al., Science, 2002 Pyrimidine metabolism
Yeast signaling proteins in MIPS PNAS, vol.100, pp.1128, (2003).
Spotted microarray for Saccharomyces cerevisiae Similarity measure