1 / 10

Idea of Co-Clustering

Idea of Co-Clustering. Co-clustering To combine the row and column clustering of co-occurrence matrix together and bootstrap each other. Simultaneously cluster the rows X and columns Y of the co-occurrence matrix. Hierarchical Co-Clustering Based on Entropy Splitting.

meena
Download Presentation

Idea of Co-Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Idea of Co-Clustering • Co-clustering • To combine the row and column clustering of co-occurrence matrix together and bootstrap each other. • Simultaneously cluster the rows X and columns Y of the co-occurrence matrix.

  2. Hierarchical Co-Clustering Based on Entropy Splitting • View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables • Objective: seeking a hierarchical co-clustering containing given number of clusters whilemaintaining as much “Mutual Information” between row and column clusters as possible.

  3. Hierarchical Co-Clustering Based on Entropy Splitting Co-occurrence Matrices Joint probability distribution between row & column cluster random variables 0 0.4691 0.7751

  4. Hierarchical Co-Clustering Based on Entropy Splitting Pipeline: (recursive splitting) While(Termination condition) Find optimal row/column cluster split which achieves maximal Update cluster indicators Termination Condition:

  5. Hierarchical Co-Clustering Based on Entropy Splitting How to find an optimal split at each step? An Entropy-based Splitting Algorithm: Converge at a local optima Input: Cluster S Randomly split cluster S into S1 and S2 For all element x in S, re-assign it to cluster S1 or S2 to minimize: Update cluster indicators and probability values Until Convergence

  6. Hierarchical Co-Clustering Based on Entropy Splitting • Example Naïve method needs trying 7 splits. Exponential time to size of S. S={X1, X2, X3, X4} Randomly split S1={X1} S2={X2, X3, X4} Re-assign X4 to S1 S2={X2, X3} S1={X1, X4}

  7. Experiments • Data sets • Synthetic data • 20 Newsgroups data • 20 classes, 20000 documents

  8. Results-Synthetic Data 1 1.4 0 Add noise to (a) by flipping values with probability 0.3 1000*1000 Matrix Clustering result With hierarchical structure Randomly permute rows and columns of (b)

  9. Results-20 Newsgroups Data Micro-averaged precision: M/N M:number of documents correctly clustered; N: total number of documents Compare with baselines:

  10. Thank You ! Questions?

More Related