250 likes | 397 Views
A Cluster Validity Measure With Outlier Detection for Support Vector Clustering. Presenter : Lin, Shu -Han Authors : Jeen-Shing Wang, Jen- Chieh Chiang. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008). Outline. Introduction of SVC Motivation Objective Methodology
E N D
A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008)
Outline • IntroductionofSVC • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
SVC • SVC is from SVMs • SVMs is supervised clustering technique • Fast convergence • Good generalization performance • Robustness for noise • SVC is unsupervised approach • Data points map to HD feature space using a Gaussian kernel. • Look for smallest sphere enclose data. • Map sphere back to data space to form set of contours. • Contours are treated as the cluster boundaries. 3
SVC - Sphere Analysis a To find the minimal enclose sphere with soft margin: To solve this problem, the Lagrangian function: 4
SVC - Sphere Analysis • Bound SV; Outlier Karush-Kuhn-Tucker complementarity: 6
SVC -Sphere Analysis Wolfe dual optimization problem a To find the minimal enclose sphere with soft margin: C : existence of outliersallowed 7
SVC -Sphere Analysis Mercer kernel Kernel: Gaussian a Gaussian function: The distance between x and a: q :|clusters|&thesmoothness/tightnessoftheclusterboundaries. 8
Motivation • The traditional cluster validity measure such as • Partition coefficient (PC) • Separation measures • Base on fuzzy membership grades and cancroids of clusters. • SVC algorithm generates boundaries to cluster are • arbitrary • no fuzzy membership grade. Which clustering is better? 9
Objectives • Cluster merging • Outlier-detection • Optimal cluster number • Cluster validity measure • Outlier-detection algorithm • Cluster merging mechanism 10
Methodology- Overview C=1, no outliers are allowed Outlier detection Cluster Validity Measure for the SVC Algorithm Cluster-Merging Mechanism 11
Methodology – Cluster Validity Measure for the SVC Algorithm • Compactness (intra-cluster) • Separation (inter-cluster) • Cluster Validity measure (ratio) for SVC min 12
Methodology – Outlier Detection • In SVC, outliers (BSV) are the data in boundary regions. singleton q = 1 q = 2 q = 1.8 C=0.02 q = 4 13
Methodology – Outlier Detection singleton q = 1 q = 2 q = 1.8 C=0.02 q = 4 • C • If C=1, result clusters are smooth, but not desirable • BSV (outlier) • All outlier are SVs • Some outlier is far away from other data in clusters • SVs • More SVs make too tight to fit the data • q • Increase q makes clusters compact • Singleton • Importantcriterion 14
Methodology – Outlier Detection • Suggested γ = 2 • Outlier Existence Criterion • Desirable Cluster Criterion • Singleton clusters can’t exceed threshold • Datapoint’s % of SVs can’t greater than threshold, suggested 50% • Recursively adjust C to satisfy this two criterion 15
Methodology – Cluster-Merging Mechanism PA > 0 Gaussian function: PC= 0 Similarity: overlapping degree 16
Methodology – Cluster-Merging Mechanism • Agglomerative outliers/noises: identification For all ci < ε, i = 1, . . . , K,where ε isdensity, chosen as 3%~5% {Set x ← mi. For each j, j = i, perform pj(x), where pj ∈ [0, 1] is the normalized overlapping index of the j cluster. If pj(x)>0, merge cluster i and cluster j. Otherwise, discard cluster i. Set K ← K − 1.} • Compatible clusters: Combination (similarity) Sort the size of the remaining K clusters in ascending order such that cK = max(ci), ∀ i ∈ K. For each i, i = 1, . . . , K, perform {Set x ← mi. For each j, j = i + 1, . . . , K, perform pj(x) Find l = arg maxi+1≤j≤K pj(x), where argmaxa denotes the value of a at which the expression that follows is maximized. If pl > 0, merge cluster i with cluster l. Set K ← K − 1 and repeat 2) until no further combination.} 17
Methodology – Summary Initialize a small value of q, and set C = 1 and γ = 2 Perform SVC algorithm,get |clusters|. If |clusters|< 2, increase q, go to 2). If the outlier-detection criterion holds, decrease C, fixq, and go to 2). Otherwise, go to 5). If |SVs|< 50% of the datapoints, go to 6). Otherwise, decrease C, and go to 2). Compute validity measure index (V (m)). If |clusters|> √N, increase q, and go to 2). Otherwise, stop the SVC. Use cluster-merging mechanism to identify an ideal |clusters|.Output|clusters|. 18
Experiments - Benchmark and Artificial Examples Bensaid Data Set 19
Experiments - Benchmark and Artificial Examples Five-Cluster Data Set & Five-Cluster Data Set With Noise 20
Experiments - Benchmark and Artificial Examples • Five-Cluster Data Set With Noise, after cluster-merge Merge 21
Experiments - Benchmark and Artificial Examples • Crescent Data Set 22
Experiments -IRISDataSet Misclassificatoin 23
Conclusions • Thispaperintegrated forSVC: • cluster validity measure • Outlier detection • Merging mechanism • Automaticallydeterminesuitablevaluesfor • Kernelparameter • Soft-marginconstant • Clusteringwith • Compactandsmootharbitrary-shapedclustercontours • Increasingrobustnesstooutliersandnoises
Comments • Advantage • Provideaclustervalidityindexforaclustermethod • Drawback • … • Application • SVC