1 / 25

A Cluster Validity Measure With Outlier Detection for Support Vector Clustering

A Cluster Validity Measure With Outlier Detection for Support Vector Clustering. Presenter : Lin, Shu -Han Authors : Jeen-Shing Wang, Jen- Chieh Chiang. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008). Outline. Introduction of SVC Motivation Objective Methodology

kelda
Download Presentation

A Cluster Validity Measure With Outlier Detection for Support Vector Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008)

  2. Outline • IntroductionofSVC • Motivation • Objective • Methodology • Experiments • Conclusion • Comments

  3. SVC • SVC is from SVMs • SVMs is supervised clustering technique • Fast convergence • Good generalization performance • Robustness for noise • SVC is unsupervised approach • Data points map to HD feature space using a Gaussian kernel. • Look for smallest sphere enclose data. • Map sphere back to data space to form set of contours. • Contours are treated as the cluster boundaries. 3

  4. SVC - Sphere Analysis a To find the minimal enclose sphere with soft margin: To solve this problem, the Lagrangian function: 4

  5. SVC - Sphere Analysis 5

  6. SVC - Sphere Analysis • Bound SV; Outlier Karush-Kuhn-Tucker complementarity: 6

  7. SVC -Sphere Analysis Wolfe dual optimization problem a To find the minimal enclose sphere with soft margin: C : existence of outliersallowed 7

  8. SVC -Sphere Analysis Mercer kernel Kernel: Gaussian a Gaussian function: The distance between x and a: q :|clusters|&thesmoothness/tightnessoftheclusterboundaries. 8

  9. Motivation • The traditional cluster validity measure such as • Partition coefficient (PC) • Separation measures • Base on fuzzy membership grades and cancroids of clusters. • SVC algorithm generates boundaries to cluster are • arbitrary • no fuzzy membership grade. Which clustering is better? 9

  10. Objectives • Cluster merging • Outlier-detection • Optimal cluster number • Cluster validity measure • Outlier-detection algorithm • Cluster merging mechanism 10

  11. Methodology- Overview C=1, no outliers are allowed Outlier detection Cluster Validity Measure for the SVC Algorithm Cluster-Merging Mechanism 11

  12. Methodology – Cluster Validity Measure for the SVC Algorithm • Compactness (intra-cluster) • Separation (inter-cluster) • Cluster Validity measure (ratio) for SVC min 12

  13. Methodology – Outlier Detection • In SVC, outliers (BSV) are the data in boundary regions. singleton q = 1 q = 2 q = 1.8 C=0.02 q = 4 13

  14. Methodology – Outlier Detection singleton q = 1 q = 2 q = 1.8 C=0.02 q = 4 • C • If C=1, result clusters are smooth, but not desirable • BSV (outlier) • All outlier are SVs • Some outlier is far away from other data in clusters • SVs • More SVs make too tight to fit the data • q • Increase q makes clusters compact • Singleton • Importantcriterion 14

  15. Methodology – Outlier Detection • Suggested γ = 2 • Outlier Existence Criterion • Desirable Cluster Criterion • Singleton clusters can’t exceed threshold • Datapoint’s % of SVs can’t greater than threshold, suggested 50% • Recursively adjust C to satisfy this two criterion 15

  16. Methodology – Cluster-Merging Mechanism PA > 0 Gaussian function: PC= 0 Similarity: overlapping degree 16

  17. Methodology – Cluster-Merging Mechanism • Agglomerative outliers/noises: identification For all ci < ε, i = 1, . . . , K,where ε isdensity, chosen as 3%~5% {Set x ← mi. For each j, j = i, perform pj(x), where pj ∈ [0, 1] is the normalized overlapping index of the j cluster. If pj(x)>0, merge cluster i and cluster j. Otherwise, discard cluster i. Set K ← K − 1.} • Compatible clusters: Combination (similarity) Sort the size of the remaining K clusters in ascending order such that cK = max(ci), ∀ i ∈ K. For each i, i = 1, . . . , K, perform {Set x ← mi. For each j, j = i + 1, . . . , K, perform pj(x) Find l = arg maxi+1≤j≤K pj(x), where argmaxa denotes the value of a at which the expression that follows is maximized. If pl > 0, merge cluster i with cluster l. Set K ← K − 1 and repeat 2) until no further combination.} 17

  18. Methodology – Summary Initialize a small value of q, and set C = 1 and γ = 2 Perform SVC algorithm,get |clusters|. If |clusters|< 2, increase q, go to 2). If the outlier-detection criterion holds, decrease C, fixq, and go to 2). Otherwise, go to 5). If |SVs|< 50% of the datapoints, go to 6). Otherwise, decrease C, and go to 2). Compute validity measure index (V (m)). If |clusters|> √N, increase q, and go to 2). Otherwise, stop the SVC. Use cluster-merging mechanism to identify an ideal |clusters|.Output|clusters|. 18

  19. Experiments - Benchmark and Artificial Examples Bensaid Data Set 19

  20. Experiments - Benchmark and Artificial Examples Five-Cluster Data Set & Five-Cluster Data Set With Noise 20

  21. Experiments - Benchmark and Artificial Examples • Five-Cluster Data Set With Noise, after cluster-merge Merge 21

  22. Experiments - Benchmark and Artificial Examples • Crescent Data Set 22

  23. Experiments -IRISDataSet Misclassificatoin 23

  24. Conclusions • Thispaperintegrated forSVC: • cluster validity measure • Outlier detection • Merging mechanism • Automaticallydeterminesuitablevaluesfor • Kernelparameter • Soft-marginconstant • Clusteringwith • Compactandsmootharbitrary-shapedclustercontours • Increasingrobustnesstooutliersandnoises

  25. Comments • Advantage • Provideaclustervalidityindexforaclustermethod • Drawback • … • Application • SVC

More Related