Kernel Methods for Weakly Supervised Mean Shift Clustering

Kernel Methods for Weakly Supervised Mean Shift Clustering Oncel Tuzel & Fatih Porikli Mitsubishi Electric Research Labs Peter Meer Rutgers University

Outline • Motivation • Mean Shift • Method Overview • Kernel Mean Shift • Constrained Kernel Mean Shift • Experiments • Conclusion

Motivation • Clustering is an ambiguous task • In many cases, the initially designed similarity metric fails to resolve the ambiguities • Simple supervision can guide clustering to desired structure • We present a semi supervised mean shift clustering algorithm based on pair-wise similarities

Mean Shift • Given n data points xi on Rd and associated bandwidths hi, the sample point density estimator is given by where k(x) is the kernel profile • Stationary points of the density can be found via the mean shift procedure where

Mean Shift Clustering • Mean shift iterations are initialized at the data points • The cluster centers are located by the mean shift procedure • The data points associated with the same local maxima of the density function produce a partitioning of the space • There is no systematic semi supervised mean shift algorithm

Method Overview Embedded Space . . . . . . . . . . . . . . . . . . . . . . . . • The supervision is given in the form of a few pair-wise similarity constraints • We embed the input space to a space where the constraint pairs are associated with the same mode • Mode seeking is performed on the embedded space • The method preserves all the advantages of mean shift clustering . . . . . . x . . . . . . . . . . . . . . . . . . . . x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x . . . x . . . . . . . . . . . . . . . x . . x Input Space

Pair-wise Constraints on the Input Space • Data points are projected to the null space of the constraint matrix • Since the constraint point pairs overlap after projection, they are clustered together • The method fails if the clusters are not linearly separable • At most d-1 constraints can be defined Constraint Vector Input Points Clustering Projection

Pair-wise Constraints on the Feature Space • The method can be extended to handle increasing number of constraints or to linearly inseparable case using a mapping function • The mapping embeds the input space to an enlarged feature space • The projection is performed on the feature space • Defining mapping explicitly is not practical Solution: Kernel Trick Mapping to Feature Space Constraint Vector Input Points Clustering Projection

Kernel Mean Shift (Explicit Form) • Given and a p.s.d. kernel satisfying where • The density estimator at is given by • The stationary points can be found via the mean shift procedure

Kernel Mean Shift (Implicit Form) • Let be the dimensional feature matrix and be the dimensional Kernel matrix • At each iteration the estimate, , lies is the column space of and any point on the subspace can be written as • The distance between two points and is given by • The implicit form of mean shift updates the weighting vectors where denote the i-th canonical basis for Rn

Kernel Mean Shift Clustering • The clustering algorithm starts on the data points • Upon convergence the mode can be expressed via • When the rank of the kernel matrix K is smaller than n, columns of form an overcomplete basis and the modes can be identified within an equivalence relationship • The procedure is restricted to the subspace spanned by the feature points therefore • The convergence of the procedure follows from the original proof

Constrained Kernel Mean Shift Feature Space • Let be the set of point pairs to be clustered together • The constraint matrix is given by • The null space of A is the set of vectors and the matrix projects to • Under the projection the constraint point pairs are overlapped Projection

Constrained Kernel Mean Shift The constrained mean shift algorithm implicitly maps the data points to null space of the constraint matrix and performs mean shift on the embedded space This process is equivalent to applying kernel mean shift algorithm with the projected kernel function The projected Kernel matrix only involves mapping through the kernel function and can be expressed in terms of original Kernel matrix where is the part of the Kernel matrix involving constraint set and is the scaling matrix

Experiments • We conduct experiments on three datasets • Synthetic experiments • Clustering faces across illumination on CMU PIE dataset • Clustering object categories on Caltech-4 dataset • For the first two experiments we utilize Gaussian kernel function • For the last experiment we utilize kernel function • We use adaptive bandwidth mean shift where the bandwidth for each point is selected as the k-th smallest distance from the point to all the data points on the feature space

Clustering Linear Structure Data Points Mean Shift Constrained Mean Shift • We generated 240 data points originating from six different lines • Data is corrupted with normally distributed noise with standard deviation 0.1 • Three pair-wise constraints are given

Clustering Circular Structure Data Points Data Points with Outliers • We generated 200 data points originating from five concentric circles • Data is corrupted with normally distributed noise with standard deviation 0.1 • 80 outlier points are added • Four pair-wise constraints are enforced from the same circle Mean Shift Constrained Mean Shift

Clustering Faces Across Illumination Samples from CMU PIE Dataset Constraint Set • Dataset contains 441 images from 21 subjects under 21 different illumination conditions • Images are coarsely registered and scaled to the same size 128x128 • Each image is represented with a 16384-dimensional vector • Two pair-wise similarity constraints are given per subject • Approximately 1/10 of the dataset is labeled

Clustering Faces with Mean Shift Pair-wise Distances Mean Shift • Mean shift finds 5 clusters corresponding to partly illumination conditions, partly subject labels

Clustering Faces with Constrained Mean Shift Pair-wise Distances after Embedding Constrained Mean Shift • Constrained mean shift recovers all 21 subjects perfectly

Clustering Object Categories Samples from Caltech-4 Dataset • Dataset contains 400 images from four object categories: cars, motorcycles, faces, airplanes • Each image is represented with a 500 bin feature histogram • Pair-wise constraints are randomly selected within classes • Experiment is repeated with varying number of constraints (1 to 20 constraints per object class)

Clustering Object Categories with Mean Shift Pair-wise Distances Mean Shift • Some of the samples from airplanes class and half of the motorcycles class are incorrectly identified as cars • The overall clustering accuracy is 74.25%

Clustering Object Categories with Constrained Mean Shift Pair-wise Distances after Embedding Constrained Mean Shift • Clustering example after enforcing 10 constraints per class • Only a single example among 400 is misclustered

Clustering Performance vs. Number of Constraints • The results are averaged over 20 runs where at each run a different constraint set is selected • Clustering accuracy is over 99% for more than 7 constraints per class

Conclusion • We presented a novel constrained mean shift clustering method that can incorporate pair-wise must-link priors • The method preserves all the advantages of the original mean shift clustering algorithm • The presented approach also extends to inner product spaces thus, it is applicable to a wide range of problems

Kernel Methods for Weakly Supervised Mean Shift Clustering

Kernel Methods for Weakly Supervised Mean Shift Clustering

Presentation Transcript

Semi-Supervised Clustering I

Clustering Methods

Classification (Supervised Clustering)

Semi-Supervised Clustering II

Clustering Methods

Kernel Methods

Kernel Methods

Kernel methods

Supervised Clustering

Mean Shift For Tracking

Clustering Methods

Semi-Supervised Clustering

Optimizing Average Precision using Weakly Supervised Data

Kernel Methods

Kernel Methods

Pseudo-supervised Clustering for Text Documents

Semi-Supervised Clustering

Kernel Methods

Weakly Supervised Action Recognition

Clustering methods

Semi-Supervised Clustering