Semisupervised Clustering with Metric Learning Using Relative Comparisons

Semisupervised Clustering with Metric LearningUsing Relative Comparisons Nimit Kumar, Member, IEEE, and Krishna Kummamuru IEEE Transactions On Knowledge And Data Engineering Volume:20, Issue:4, Pages:496-503 指導老師：陳彥良教授、許秉瑜教授報告人：林欣瑾中華民國97年8月14日

Outline • Introduction • Related work • Problem definition • The learning algorithms • Experimental study • Summary and conclusions

Introduction(1/3) • Semisupervised clustering algorithms are becoming more popular mainly because of (1)the abundance of unlabeled data (2)the high cost of obtaining labeled data • The most popular form of supervision used in clustering algorithms is in terms of pairwise feedback →must-links: data points belonging to the same cluster →cannot-link: data points belonging to the different cluster

Introduction(2/3) • The pairwise constraints have two drawbacks: (1) The points in cannot-link constraints may actually lie in wrong clusters and still satisfy the cannot-link constraints (2) the must-link constraint would mislead the clustering algorithm if the points in the constraint belong to two different clusters of the same class. • Supervision to be available in terms of relative comparisons: x is close to y than to z. (as triplet constraints)

Introduction(3/3) • This paper call the proposed algorithm Semisupervised SVaD (SSSVaD) • Assume a set of labeled data, relative comparisons can be obtained from any three points from the set if two of them belong to a class different from the class of the third point. • Triplet constraints give more information on the underlying dissimilarity measure than the pairwise constraints.

Related work

Problem definition • Given a set of unlabeled samples and triplet constraints, the objective of SSSVaD is to find a partition of the data set along with the parameters of the SVaD measure that minimize the within-cluster dissimilarity while satisfying as many triplet constraints as possible.

The learning algorithms(1/2) • 1.Spatially Variant Dissimilarity (SVaD) • 2.Semisupervised SVaD (SSSVaD) • 3.Metric pairwise constrained K-Means (MPCK-Means) • 4.rMPCK-Means • 5.K-Means Algorithms (KMA)

The learning algorithms(2/2) • SSSVaD vs. MPCK-Means

Experimental study • Data sets(20 NewsGroup):

Experimental study • Effect of the Number of Clusters • (1)Binary

Experimental study • (2)Multi5

Experimental study • Effect of the Amount of Supervision • (1)Binary

Experimental study • Effect of Initialization • (1)Binary

Summary and conclusions • The efficiency of relative comparisons over pairwise constraints was established through exhaustive experimentations. • The proposed algorithm (SSSVaD) achieves higher accuracy and is more robust than similar algorithms using pairwise constraints for supervision.

Thanks for your listening

Semisupervised Clustering with Metric Learning Using Relative Comparisons

Semisupervised Clustering with Metric Learning Using Relative Comparisons

Presentation Transcript

Unsupervised Learning: Clustering

unsupervised learning - clustering

Unsupervised learning: Clustering

Fine-Grained Visual Comparisons with Local Learning

Learning Neighborhoods for Metric Learning

Using the Metric Ruler

Using the Metric System

Using the Metric System

Using the Metric System

Using Comparisons to describe people

Boosting the Ranking Function Learning Process using Clustering

Software Clustering Using Bunch

Clustering Sequences in a Metric Space

Semisupervised Clustering with Metric Learning using Relative Comparisons

Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis

QoS Routing using Clustering with Interference Considerations

A Correlation Metric for Cross-Sample Comparisons Using Logit and Probit

Distance metric learning, with application to clustering with side-information

Learning Instance Specific Distance Using Metric Propagation

Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Clustering Large Datasets in Arbitrary Metric Space