60 likes | 248 Views
WEIGHTED CONCENSUS CLUSTERING FOR PPI NETWORK Yi Zhang, Erliang Zeng , Tao Li, Giri Narasimhan – Florida International University The 8 TH International Conference On Machine Learning And Application 2009. Base Clustering. Repeated Bisection. Betweenness Based. Similarity Matrix.
E N D
WEIGHTED CONCENSUS CLUSTERING FOR PPI NETWORKYi Zhang, ErliangZeng, Tao Li, GiriNarasimhan – Florida International UniversityThe 8TH International Conference On Machine Learning And Application 2009 Base Clustering Repeated Bisection Betweenness Based Similarity Matrix Direct K-way Partitioning Weighted Consensus Clustering (WCC) Concensus Clustering PPI Data Multilevel-way Partition Clustering Coefficient Based Introduction and Challenge Consensus Clustering Algorithm Objective Function [5]: Special Clustering Input: T clusterings : Clustering : Connectivity Matrix: Objective Function: where: • Introduction • Proteins are central components of cell machinery and life; Protein-Protein Interaction (PPI) networks are important sources related to biological processes and complex metabolic functions of the cell. • Challenge [1] • Data quality: each experimental and Insilco methods used to compute interactions, have their own strengths and weaknesses and all are believed to be noisy; • Partition the PPI network using classical graph partitioning or clustering schemes is inherently difficult due to its charactics; • Some proteins are believed to be multi-functional. Optimization Procedure [5] Two Step Iterating: • Solve for while fixing w • Using NMF-based method • Solve for w while fixing , the optimization problem becomes: • This can be solved using quadratic programming • Where: Diagram of Consensus Clustering Approach [1] Our Contribution!! References NMF Formulation [4] 1. S. Asur, D. Ucar, and S. Parthasarathy. An ensemble framework for clustering proteincprotein interaction networks. Bioinformatics, 23(13):i29–i40, 2007. 2. http://glaros.dtc.umn.edu/gkhome/views/metis. 3. U.von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4), August, 2007. 4. T. Li and Chris Ding. The Relationships among Various Nonnegative Matrix Factorization Methods for Clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM), pp 60–63, 2006. 5. T. Li and C. Ding. Weighted consensus clustering. In Proceedings of 2008 SIAM International Conference on Data Mining (SDM), 2008. 6. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks, The American Physical Society, E69, 026113, 2004. 7. The gene ontology consortium online database http://db.yeastgenome.org/cgi-bin/GO/goTermFinder. 8. A. Strehl, J. Ghosh and C. Cardie. Cluster Ensembles – A Knowledge Reuse framework for Combining Multiple Partitions, Journal of Machine Learning Research, 3:583-617, 2002. NMF Formulation [4] Cluster Indicator Soft Clustering [5] • H is not exactly orthogonal. • Suppose a protein has a posterior distribution as • this protein is clustered into one cluster. • Suppose another protein has a posterior distribution as • this protein is clustered into two clusters. Similarity Matrix • Similarity Measure: • Clustering coefficient-based [1] • Betweenness-based [1] Symmetric NMF Evaluation [1,6,7,8] Weighted Consensus Clustering [5] Base Clustering Contact Information Weights for Partition: Connectivity Matrix: • Base Clustering: • Repeated bisections; • Direct k-way Partitioning; Metis [2] • Multilevel k-way Partitioning; • Spectral Clustering [3].