Fuzzy Clustering with Multiple Kernels

Fuzzy Clustering with Multiple Kernels NaouelBaili Multimedia Research Laboratory Computer Engineering & Computer Science Dept. University of Louisville, USA April 2010

Outline • Introduction • Prototype-based Fuzzy Clustering • Proposed Fuzzy C-Means with Multiple Kernels • Preliminary results • Relational data Fuzzy Clustering • Proposed Relational Fuzzy C-Means with Multiple Kernels • Preliminary results • Conclusions

Inter-cluster distances are maximized Intra-cluster distances are minimized Introduction, what is clustering? • Clustering • The goal of clustering is to separate a finite unlabeled data set into a finite and discrete set of “natural,” hidden data structures; • As a data mining task, data clustering aims at the identification of clusters, or densely populated regions, according to some measurement or similarity function • Studied and applied in many fields • Statistics; • Spatial database; • Machine learning ; • Data mining.

Introduction, Data Clustering Methods • Hierarchical clustering • Organize elements into a tree, leaves represent objects and the length of the paths between leaves represents the distances between objects. Similar objects lie within the same sub-trees. • Partitional clustering • Organize elements into disjoin groups; • Hard vs. Fuzzy clustering • Kernel based clustering • Spectral clustering • Object data vs. Relational data clustering

Kernel methods: the mapping • A kernel • is a similarity measure • defined by an implicit mapping , • from the original space to a vector space (feature space) • such that: f f f Original Space Feature (Vector) Space

Benefits from Kernels • Generalizes (nonlinearly) pattern recognition algorithms in clustering, classification, density estimation, … • When these algorithms are dot-product based, by replacing the dot product by e.g.: linear discriminant analysis, logistic regression, perceptron, SOM, PCA, ICA, … • When these algorithms are distance-based, by replacing d(x,y) by k(x,x)+k(y,y)-2k(x,y) • Freedom of choosing implies a large variety of learning algorithms

Gaussian Kernel • Probably the most popular in practice • This kernel requires tuning for the proper value of σ. • Manual tuning (trial and error); • Brute force search: involve stepping through a range of values for σ, in a gradient ascent optimization, seeking optimal performance of a model with training data Although these approaches are feasible with supervised learning, it is much more difficult to tune σ for unsupervised learning methods.

Limitations, varying densities (1) Kernel-based clustering Original Data set Gaussian kernel, σ = 5 Still feasible but after several choices of σ Original Data set Gaussian kernel, σ = 8

Limitations, varying densities (2) • the success of a kernel-based clustering relies on the choice of the kernel function; • Often unclear which kernel is the most suitable for a particular task; • kernel–based clustering maps all points using the same global similarity. Original Data set Gaussian kernel, σ = 2 Gaussian kernel, σ = 5

Contributions (1) • Construct the kernel from a number of multi-resolution Gaussian kernels • And learn a resolution-specific weight for eachkernel in each cluster • Better characterization; • Density fitting; • Adaptivity to each individual cluster. Original Data set

Contributions (2) • Fuzzy C-Means with Multiple Kernels (FCM-MK) • Unsupervised • Object data • Prototype defined in the input space • Clusters with varying sizes and densities • Relational Fuzzy C-Means with Multiple Kernels (RFCM-MK) • Unsupervised • Relational data • Clusters of different shapes with unbalanced densities • Multiple-resolution within the same cluster

Part 1 – FCM-MK Part 1 – Prototype-based Clustering • Fuzzy C-Means with Multiple Kernels

Part 1 – FCM-MK Input, Output • Input: • Output:

Part 1 – FCM-MK Kernel-based Similarity • We construct a new kernel-induced similarity defined as • The normalized kernel is given by • The distance between point and center in feature space is

Part 1 – FCM-MK Objective function • Optimization of an “objective function” or “performance index”

Part 1 – FCM-MK Minimizing objective function (1) • Zeroing the gradient of with respect to • Zeroing the gradient of with respect to

Part 1 – FCM-MK Minimizing objective function (2) • We optimize with respect to the resolution-specific weights using the gradient descent method

Part 1 – FCM-MK Experimental Evaluation

Part 1 – FCM-MK Experimental evaluation, Data set 1

Part 1 – FCM-MK Object Data vs. Relational Data Distribution of σ for Relational data Distribution of σ for Object data

Part 2 – RFCM-MK Part 2 – Relational Data Clustering • Relational Fuzzy C-Means with Multiple Kernels

Part 2 – RFCM-MK Input, Output • Input: • Output:

Part 2 – RFCM-MK Kernel-based Similarity • We construct a new kernel-induced similarity defined as • The relational data between feature points and with respect to cluster can be defined as

Part 2 – RFCM-MK Objective function • Optimization of an “objective function” or “performance index”

Part 2 – RFCM-MK Minimizing objective function (1) • Zeroing the gradient of with respect to

Part 2 – RFCM-MK Minimizing objective function (2) • We optimize with respect to the resolution-specific weights using the gradient descent method

Part 2 – RFCM-MK Experimental Evaluation

Part 2 – RFCM-MK Experimental Evaluation, Data set 1

Conclusions • Find the optimal kernel-induced feature map in a completely unsupervised way • Multiple Kernel Learning; • Resolution-specific weight for each kernel base in each cluster; • Fuzzy C-Means with Multiple Kernels approach • Object data; • Clusters of different densities; • Prototypes in the input space. • Relational Fuzzy C-Means with Multiple Kernels approach • Relational data; • Multiple resolution within the same cluster or different clusters; • Clusters of different shapes.

Future Work (1) • Improve the performance of the proposed algorithms by using supervision information • Must-link constraints: The penalty for violating a must-link constraint between distant points should be higher than that between nearby points • Cannot-link constraints: The penalty for violating a cannot-link constraint between two points that are nearby according to the current metric should be higher than for two distant points

Future Work (2) • The objective function of Semi-Supervised Fuzzy C-Means with Multiple Kernel (SS-FCM-MK) is given by

Future Work (3) • The objective function of Semi-Supervised Relational Fuzzy C-Means with Multiple Kernel (SS-RFCM-MK) is given by • Study the performance versus the amount of supervision; • Compare to other semi-supervised algorithms.

Future Work (4) • Automatically identify the optimal number of clusters and reduce the effect of noise and outliers • Competitive Agglomeration: Starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration; • Dave’s Noise Clustering technique (NC): seeks to separate noisy data by clustering them all into a c+1th conceptual cluster, based on the assumption that the center of such a cluster (called the noise cluster) is equidistant to all noise points in the data set.

Future Work (5) • The Nearest neighbor classifier (NNC) is used for many pattern recognition applications where the underlying probability distribution of the data is unknown a priori. • Traditional NNC stores all the known data points as labeled prototypes • computationally prohibitive for very large database; • limitation of computer storage; • cost of searching for the nearest neighbors of an input vector. • Apply our algorithms to real applications involving very large and high dimensional data • Content-Based Image Retrieval (CBIR); • Land mine detection using GPR.

Fuzzy Clustering with Multiple Kernels

Fuzzy Clustering with Multiple Kernels

Presentation Transcript

Kernels

Generalized Fuzzy Clustering Model with Fuzzy C-Means Hong Jiang Computer Science and Engineering, University of South

Fuzzy Clustering Using the EM

Scalable Clustering using Multiple GPUs

Fuzzy Clustering Algorithms

Tutorial On Fuzzy Clustering

How to Use Fuzzy Clustering Software

A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering

Fast accurate fuzzy clustering through data reduction

Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods

Examining Activity Patterns Using Fuzzy Clustering

 -kernels

Kernels

Kernels

Fuzzy C-means Clustering

Unsupervised Optimal Fuzzy Clustering

Density Traversal Clustering and Generative Kernels

FUZZY LOGIC CONNECTIVES IN ABDUCTIVE INFERENCE AND CLUSTERING.

Kernels