130 likes | 327 Views
Efficient Semi-supervised Spectral Co-clustering with Constraints. Xiaoxiao Shi, Wei Fan, Philip S. Yu. Motivation. Co-clustering with constraints Document-word co-clustering. C. How to use?. Co-clustering. Network. Clustering. Doc 1 (ICNP). Doc 2 (ICDM). Doc 3 (AAAI). C. Doc 4 (KDD).
E N D
Efficient Semi-supervised Spectral Co-clustering with Constraints Xiaoxiao Shi, Wei Fan, Philip S. Yu
Motivation • Co-clustering with constraints • Document-word co-clustering C How to use? Co-clustering Network Clustering Doc 1 (ICNP) Doc 2 (ICDM) Doc 3 (AAAI) C Doc 4 (KDD)
Motivation • Co-clustering with constraints • Author-conference co-clustering Collaborators Collaborators John Mary Jack Cathy Tom How to use? ICDM 07 ICDM 08 ICDM 09 AAAI 08 AAAI 09 ICDM AAAI
Straightforward solution I: transform constraints as edges, and solve global graph partition problem Keyword-conference co-clustering ICDM ICDM Co-clustering Co-clustering Cut I KDD KDD Clustering Clustering AAAI AAAI Cut II Network Network ICNP ICNP
Straightforward solution II: transform constraints as nodes, and solve bipartite graph partition problem in a larger graph Pseudo node Pseudo node ICDM Co-clustering ICDM Co-clustering Cut I KDD KDD Clustering Clustering AAAI Cut II AAAI Network Network ICNP ICNP
Problems of the two straightforward solutions • Not efficient • more edges are added; more nodes are included • (10 to 80 times slower than the original co-clustering without constraint) • Not effective • The graph becomes more complicated, of which the optimal partition is more difficult to find • (In some cases, the Normalized Mutual Information drops 30% compared with the original co-clustering without constraint)
Formulate the problem as an optimization problem The solution can be directly obtained via the left and right eigenvectors of the following matrix (more details in Theorem 2 of the paper): Minimize the number of inter-group edges Maximize the number of satisfied constraints Graph Laplacian
Experiments • Document-word co-clustering
Experiments • Graph-pattern co-clustering
Conclusions • For many applications, some prior knowledge exists about the relationship among rows and columns for co-clustering applications. Problem: how to use the knowledge (constraints) to find better co-clusters? • Two straightforward solutions • Model the constraints as linkages • Model the constraints as additional pseudo nodes • Problem: not efficient; not effective • Proposed method: model the problem as an optimization problem, and solve it with the selected eigenvectors
Related Work • Traditional Co-clustering without constraint • Information based co-clustering • Information-theoretic co-clustering (Dhillon, etc 2003) • Partition based co-clustering • Spectral co-clustering (Dhillon, etc 2001) • Previous constraint-based co-clustering models • Co-clustering with row constraint (Chen, etc 2008) • Co-clustering with order based constraint (suitable for a specific type of constraint, not comparable with the proposed model; Pensa. Etc 2008) • Straightforward modifications of traditional co-clustering models to use constraints: • Link-based constraint co-clustering • Node-based constraint co-clustering