220 likes | 316 Views
Clustering Categorical Data: An Approach Based on Dynamical Systems (1998) David Gibson, Jon Kleinberg, Prabhakar Raghavan VLDB Journal: Very Large Data Bases. Aaron Sherman. Presentation. What is this presentation about? Definitions and Algorithms Evaluations with Generated Data
E N D
Clustering Categorical Data: An Approach Based on Dynamical Systems (1998)David Gibson, Jon Kleinberg, Prabhakar Raghavan VLDB Journal: Very Large Data Bases Aaron Sherman
Presentation • What is this presentation about? • Definitions and Algorithms • Evaluations with Generated Data • Real World test • Conclusions + Q&A
Categorize this! • Categorizing int’s are easy, but what about words like “red,” “blue,” “august,” and “Moorthy?” • STIRR – Sieving Through iterated Relational Reinforcement
Why is STIRR Better? • No a Priori Quantization • Correlation vs. Categorical Similarity • New Methods for Hypergraph Clustering
Definitions • Table of Relational Data – Set T of Tuples • Set of K Fields – many possible values (Columns) • Abstract Node – each possible field • Г є T – consists of one node from each field • Configuration – weight wv to each node v –w • N(w) – Normalization Function – rescale all weights so their squares add up to 1 • Dynamical System – repeated application of f • Fixed Point – point u where f(u) = u
Weighting Scheme • To update the weight wv: • For each tuple Г = {v,u1,…uk-1} containing v • X Г § (u1,…uk-1 ) • Wv Σ Г X Г • N() f(w)
Combining Operator П • Product Operator П: §(w1…wk ) = w1 w2… wk • Non-linear term – encode co-occurrence strongly • Does not converge • Relatively small # of large basins • Very useful data in early iterations
Combining Operator + • Addition Operator +: §(w1…wk ) = w1 +w2+…+ wk • Linear • Does a good job converging
Combining Operator Sp • Sp – Combining Rule: §(w1…wk ) = • Non-linear term – encode co-occurrence strongly • Does a good job converging
Combining Operator Sω • Sω – Limiting version of Sp • Take the largest value among the weights • Easy to compute, sum like properties • Converges the best of all options shown
Initial Configuration • Uniform Initialization – all weights = 1 • Random Initialization – independently choose o1 for each weight then normalize • Some operators more sensitive to initial configurations then others • Masking / Modification – specific rule for certain nodes to set to higher or lower value
Quasi-Random Input • Create semi random data, and then add tuples to the data to create artificial clusters • Use this to test whether STIRR works • Questions • # of iterations • Density of cluster to background
How well does STIRR distil a cluster in nodes with above average co-occurrence • # of iterations • Purity
How well does STIRR separate distinct planted clusters? Will the data partition? How long to partition? S(A,B) = (|a0 – b0| + |a1 –b1| ) / total nodes Clusters A,B, a0 nodes from cluster, and a1 nodes at other end
How well does STIRR cope with clusters in a few columns with the rest random? • Want to mask irrelevant factors (columns)
Effect of Convergence Operator • Max function is the best • Product rule does not converge • Sum rule is good, but slow
Real World Data • Papers on theory and Database Systems • (Author 1, Author 2, Journal Year) • The two sets of papers were clearly separated in the STIRR representation • Done using Sp • Grouped most theoretical papers around 1976
Login Data from IBM Servers • Masked one user who logged in / out very frequently • 4 highest weight (similar) users – root, help, 2 administrators names • 8pm-12am very similar
Conclusion • Powerful technique to categorize data • Relatively fast algorithm O(n) • Questions?
Additional References • Data Clustering Techniques - Qualifying Oral Examination Paper - Periklis Andritsos • http://www.cs.toronto.edu/~periklis/pubs/depth.pdf