Idea of Co-Clustering

Idea of Co-Clustering • Co-clustering • To combine the row and column clustering of co-occurrence matrix together and bootstrap each other. • Simultaneously cluster the rows X and columns Y of the co-occurrence matrix.

Hierarchical Co-Clustering Based on Entropy Splitting • View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables • Objective: seeking a hierarchical co-clustering containing given number of clusters whilemaintaining as much “Mutual Information” between row and column clusters as possible.

Hierarchical Co-Clustering Based on Entropy Splitting Co-occurrence Matrices Joint probability distribution between row & column cluster random variables 0 0.4691 0.7751

Hierarchical Co-Clustering Based on Entropy Splitting Pipeline: (recursive splitting) While(Termination condition) Find optimal row/column cluster split which achieves maximal Update cluster indicators Termination Condition:

Hierarchical Co-Clustering Based on Entropy Splitting How to find an optimal split at each step? An Entropy-based Splitting Algorithm: Converge at a local optima Input: Cluster S Randomly split cluster S into S1 and S2 For all element x in S, re-assign it to cluster S1 or S2 to minimize: Update cluster indicators and probability values Until Convergence

Hierarchical Co-Clustering Based on Entropy Splitting • Example Naïve method needs trying 7 splits. Exponential time to size of S. S={X1, X2, X3, X4} Randomly split S1={X1} S2={X2, X3, X4} Re-assign X4 to S1 S2={X2, X3} S1={X1, X4}

Experiments • Data sets • Synthetic data • 20 Newsgroups data • 20 classes, 20000 documents

Results-Synthetic Data 1 1.4 0 Add noise to (a) by flipping values with probability 0.3 1000*1000 Matrix Clustering result With hierarchical structure Randomly permute rows and columns of (b)

Results-20 Newsgroups Data Micro-averaged precision: M/N M:number of documents correctly clustered; N: total number of documents Compare with baselines:

Thank You ! Questions?

Idea of Co-Clustering

Idea of Co-Clustering

Presentation Transcript

Conceptualization of Place via Spatial Clustering and Co-occurrence Analysis

Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon Uni

DisCo : Distributed Co-clustering with Map-Reduce

DisCo : Distributed Co-clustering with Map-Reduce

Co-clustering using CUDA

Clustering: Partition Clustering

Efficient Semi-supervised Spectral Co-clustering with Constraints

Co-clustering based classification for Out-of-domain Documents

Answering List Questions using Co-occurrence and Clustering

Sparsity-Cognizant Overlapping Co-clustering

An Unsupervised Learning Approach for Overlapping Co-clustering

Clustering

Testing of clustering

Accommodations, Modifications, Co-Teaching, IDEA, Progress Monitoring

Clustering

Clustering

Bayesian Co-clustering for Dyadic Data Analysis

Sparsity-Cognizant Overlapping Co-clustering

An Unsupervised Learning Approach for Overlapping Co-clustering