200 likes | 532 Views
Semisupervised Clustering with Metric Learning Using Relative Comparisons. Nimit Kumar, Member, IEEE, and Krishna Kummamuru IEEE Transactions On Knowledge And Data Engineering Volume:20, Issue:4, Pages:496-503 指導老師:陳彥良 教授 、許秉瑜 教授 報 告 人:林欣瑾. 中華民國 97 年 8 月 14 日.
E N D
Semisupervised Clustering with Metric LearningUsing Relative Comparisons Nimit Kumar, Member, IEEE, and Krishna Kummamuru IEEE Transactions On Knowledge And Data Engineering Volume:20, Issue:4, Pages:496-503 指導老師:陳彥良教授、許秉瑜教授 報 告 人:林欣瑾 中華民國97年8月14日
Outline • Introduction • Related work • Problem definition • The learning algorithms • Experimental study • Summary and conclusions
Introduction(1/3) • Semisupervised clustering algorithms are becoming more popular mainly because of (1)the abundance of unlabeled data (2)the high cost of obtaining labeled data • The most popular form of supervision used in clustering algorithms is in terms of pairwise feedback →must-links: data points belonging to the same cluster →cannot-link: data points belonging to the different cluster
Introduction(2/3) • The pairwise constraints have two drawbacks: (1) The points in cannot-link constraints may actually lie in wrong clusters and still satisfy the cannot-link constraints (2) the must-link constraint would mislead the clustering algorithm if the points in the constraint belong to two different clusters of the same class. • Supervision to be available in terms of relative comparisons: x is close to y than to z. (as triplet constraints)
Introduction(3/3) • This paper call the proposed algorithm Semisupervised SVaD (SSSVaD) • Assume a set of labeled data, relative comparisons can be obtained from any three points from the set if two of them belong to a class different from the class of the third point. • Triplet constraints give more information on the underlying dissimilarity measure than the pairwise constraints.
Problem definition • Given a set of unlabeled samples and triplet constraints, the objective of SSSVaD is to find a partition of the data set along with the parameters of the SVaD measure that minimize the within-cluster dissimilarity while satisfying as many triplet constraints as possible.
The learning algorithms(1/2) • 1.Spatially Variant Dissimilarity (SVaD) • 2.Semisupervised SVaD (SSSVaD) • 3.Metric pairwise constrained K-Means (MPCK-Means) • 4.rMPCK-Means • 5.K-Means Algorithms (KMA)
The learning algorithms(2/2) • SSSVaD vs. MPCK-Means
Experimental study • Data sets(20 NewsGroup):
Experimental study • Effect of the Number of Clusters • (1)Binary
Experimental study • (2)Multi5
Experimental study • (3)Multi10
Experimental study • Effect of the Amount of Supervision • (1)Binary
Experimental study • (2)Multi5
Experimental study • (3)Multi10
Experimental study • Effect of Initialization • (1)Binary
Experimental study • (2)Multi10
Summary and conclusions • The efficiency of relative comparisons over pairwise constraints was established through exhaustive experimentations. • The proposed algorithm (SSSVaD) achieves higher accuracy and is more robust than similar algorithms using pairwise constraints for supervision.