1 / 13

RIC: Parameter-Free Noise-Robust Clustering

RIC: Parameter-Free Noise-Robust Clustering. Presenter : Shu-Ya Li Authors : CHRISTIAN BO¨ HM, CHRISTOS FALOUTSOS, JIA-YU PAN, CLAUDIA PLANT. TKDD, 2007. Outline. Motivation Objective Methodology Experiments and Results Conclusion Personal Comments. Motivation.

bailey
Download Presentation

RIC: Parameter-Free Noise-Robust Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RIC: Parameter-Free Noise-Robust Clustering Presenter : Shu-Ya Li Authors : CHRISTIAN BO¨ HM, CHRISTOS FALOUTSOS, JIA-YU PAN, CLAUDIA PLANT TKDD, 2007

  2. Outline • Motivation • Objective • Methodology • Experiments and Results • Conclusion • Personal Comments

  3. Motivation • How to find a natural clustering of a real-world point set which contains • an unknown number of clusters with different shapes • the clusters may be contaminated by noise?

  4. Objectives MDL for classification VAC for clustering • Find natural clustering in a dataset • Goodness of a clustering • We use Volume after Compression (VAC) to quantify the ‘goodness’ of a grouping by. • Efficient algorithm for good clustering • Robust Fitting • Cluster Merging

  5. VAC (Volume after Compression ) • VAC • Tells which grouping is better • Lower VAC => better grouping • Formula using decorrelation matrix • Computing VAC • Compute covariance matrix of cluster C • Compute PCA and obtain decorrelation matrix • Compute VAC from the matrix

  6. Computing VAC • VAC (volume after compression) • Record bytes to record their type (guassian, uniform,..) • Record bytes for number of clusters k • The bytes to describe the parameters of each distribution (e.g., mean, variance, covariance, slope, intercept) and then the location of each point • Cluster Model 2.3+4.3=6.6bits stat = (μi, σi, lbi, ubi, ...)

  7. Methodology – RIC framework • Robust Fitting • Mahalanobis distance defined by Λ and V • Conventional estimation: covariance matrix uses Mean • Robust estimation: covariance matrix uses Median • Median is less affected by outliers than Mean PCA (Σ = V ΛV T) μR median μ

  8. Methodology – RIC framework • Cluster Merging • Merge Ci and Cj only if the combined VAC decreases • If savedCost > 0, then merge Ci and Cj • Greedy search to maximize savedCost, hence minimize VAC

  9. Experiments • Results on Synthetic Data

  10. Experiments • Performance on Real Data

  11. Experiments • Compares the result of filterOpt to the result of filterDist.

  12. Conclusion • The contributions of this work are the answers to the two questions, organized in our RIC framework. • (Q1) Goodness Measure. • We propose the VAC criterion using information-theory concepts, and specifically the volume after compression. • (Q2) Efficiency. • Robust fitting (RF) algorithm, which carefully avoids outliers. • Cluster merging (CM) algorithm, which stitches clusters together if the stitching gives a better VAC score.

  13. Personal Comments • Advantage • Description detail • Many pictures and examples • Drawback • It is difficult to identify black and white picture. • Application • Clustering

More Related