160 likes | 174 Views
MONIC - Modeling and Monitoring Cluster Transitions. M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis, and R. Schult Proceeding of the 12th International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD, 2006. 報告人 : 吳建良. Outline. Motivation Cluster Model in MONIC
E N D
MONIC - Modeling and Monitoring Cluster Transitions M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis, and R. Schult Proceeding of the 12th International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD, 2006. 報告人:吳建良
Outline • Motivation • Cluster Model in MONIC • Cluster Transitions in MONIC • Experimental Results
Motivation • Example: data records at for timepoints • Categorize and tracing the changes upon clusters • Did some clusters disappear? • Were clusters absorbed by others? • When is a cluster the same? • When is a cluster mutate? • MONIC provides insights about the nature of cluster change in the whole clustering
Cluster Model in MONIC • Data stream application • Assume re-clustering at each timepoint • Adopt arbitrary clustering methods • Monitor both changes in existing clusters and new clusters • Data record • for i≠j Initial dataset
Data ageing function • Assign lower weights for old records • Data ageing function • assign a weight to data record x at ti for each and for each ti • This function can be covered by sliding windows • The weights of records outside the window are zero
Cluster Matching • Cluster overlap • Overlap of X to Y • Cluster match • Y is a match for X in Cj subject to τ
Cluster Transitions in MONIC • External transitions • Survive: • Split into multiple clusters: where
Cluster Transitions in MONIC contd. • Absorb: • Disappear: • None of the above cases holds for X • Emerge:
Cluster Transitions in MONIC contd. • Internal transitions • Size transition: weights of the records • Shrink: • Expand: • Compactness transition: data distribution • Compacter: • Diffuser:
Cluster Transitions in MONIC contd. • Location transition • Shift of center: • Skewness: • No change • Property of transition • Inside a group of transition → mutually exclusive • Among different groups of transition → combined • Ex: a cluster X matched by Y can become larger and more compact.
Cluster Transitions in MONIC contd. • Lifetime of clustering • Use lifetime of clusterings to gain insights on the evolution of the population • Survival ratio • Absorption ratio • Passforward ratio= Survival ratio + Absorption ratio
Experimental • Dataset • ACM library section H2.8 on “database application” • 6 classes:(1) data mining, (2) spatial databases, (3) image databases, (4) statistical databases, (5) scientific databases, (6) uncategorized documents • Time: 1997~2004 • Document: Title and list of keywords • Feature space: 30 most frequent (TF×IDF-weighted words) • Clustering algorithm: K-means for K=10 • Data aging • Sliding window of size 2
Cluster transitions and threshold impact Fix τsplit = 0.1 Vary τ from 0.45 to 0.7
Cluster transitions and threshold impact Fix τ= 0.5 Vary τsplit from 0.1 to 0.35
Lifetime of clusterings • Passforward ratios for different τ