140 likes | 151 Views
About Me. Swaroop Butala MSCS – graduating in Dec 09 Specialization: Systems and Databases Interests: Learning new technologies Application of technology to financial sectors. Coclustering Documents and words using Bipartite Spectral Graph Partitioning.
E N D
About Me Swaroop Butala • MSCS – graduating in Dec 09 • Specialization: Systems and Databases • Interests: • Learning new technologies • Application of technology to financial sectors
Coclustering Documents and words using BipartiteSpectral Graph Partitioning Author: Inderjit S. Dhillon Department of Computer Sciences University of Texas, Austin Presented by: Swaroop Butala, Fall 2008
Clustering and Current Solutions(1) Clustering: • Collection of Objects • Future Navigation and Searches
Clustering and Current Solutions(2) • Current Solutions • K-means • Fuzzy C-means • Hierarchical clustering • Document Clustering • Word Clustering
Document Clustering • Problem • Vector Space Model • Extract Unique Content-Bearing Words • Word by Document matrix • Existing Solutions: • K-means Algorithm • Self organized maps • Computationally Prohibitive
Word Clustering • Basis of documents in which they Co-occur • Words that typically associate together in documents should be associated with similar concepts. • Uses • Automatic Classification of documents
Co-clustering Documents and Words • Novel Idea • Duality of word and document clustering • Use of Bipartite Graphs • The clustering problem can now be posed as a partitioning problem • Solution: • Spectral Co-Clustering algorithm
Bipartite Graph(1) • No Edges between Words or between Documents
Bipartite Graphs(2) Adjacency Matrix:
The Partitioning Problem • Minimum cut vertex partitions in Bipartite Graphs • Optimal Solution is NP–Complete • Solutions: KL and FM algorithms exist • Spectral Algorithm gives a good global solution • Better solutions than KL and FM algorithms
Graph Partitioning • To find equally sized vertex subsets such that the cut is minimum • Eigenvectors as optimal partition vectors • Since the discrete solution is NP complete • The Bipartitioning Algorithm
Conclusions • A novel idea of Coclustering Words and Documents together is proposed • A real relaxation to optimal solution of partitioning is provided • Algorithm works well on real examples
Critique • Actual motivation for combining document and word clustering is not stated • The solution is not completely optimal since the problem of Partitioning is NP complete