1 / 46

Fuzzy Clustering with Multiple Kernels

Fuzzy Clustering with Multiple Kernels. Naouel Baili. Multimedia Research Laboratory Computer Engineering & Computer Science Dept. University of Louisville, USA April 2010. Outline. Introduction Prototype-based Fuzzy Clustering Proposed Fuzzy C-Means with Multiple Kernels

makaio
Download Presentation

Fuzzy Clustering with Multiple Kernels

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fuzzy Clustering with Multiple Kernels NaouelBaili Multimedia Research Laboratory Computer Engineering & Computer Science Dept. University of Louisville, USA April 2010

  2. Outline • Introduction • Prototype-based Fuzzy Clustering • Proposed Fuzzy C-Means with Multiple Kernels • Preliminary results • Relational data Fuzzy Clustering • Proposed Relational Fuzzy C-Means with Multiple Kernels • Preliminary results • Conclusions

  3. Inter-cluster distances are maximized Intra-cluster distances are minimized Introduction, what is clustering? • Clustering • The goal of clustering is to separate a finite unlabeled data set into a finite and discrete set of “natural,” hidden data structures; • As a data mining task, data clustering aims at the identification of clusters, or densely populated regions, according to some measurement or similarity function • Studied and applied in many fields • Statistics; • Spatial database; • Machine learning ; • Data mining.

  4. Introduction, Data Clustering Methods • Hierarchical clustering • Organize elements into a tree, leaves represent objects and the length of the paths between leaves represents the distances between objects. Similar objects lie within the same sub-trees. • Partitional clustering • Organize elements into disjoin groups; • Hard vs. Fuzzy clustering • Kernel based clustering • Spectral clustering • Object data vs. Relational data clustering

  5. Kernel methods: the mapping • A kernel • is a similarity measure • defined by an implicit mapping , • from the original space to a vector space (feature space) • such that: f f f Original Space Feature (Vector) Space

  6. Benefits from Kernels • Generalizes (nonlinearly) pattern recognition algorithms in clustering, classification, density estimation, … • When these algorithms are dot-product based, by replacing the dot product by e.g.: linear discriminant analysis, logistic regression, perceptron, SOM, PCA, ICA, … • When these algorithms are distance-based, by replacing d(x,y) by k(x,x)+k(y,y)-2k(x,y) • Freedom of choosing implies a large variety of learning algorithms

  7. Gaussian Kernel • Probably the most popular in practice • This kernel requires tuning for the proper value of σ. • Manual tuning (trial and error); • Brute force search: involve stepping through a range of values for σ, in a gradient ascent optimization, seeking optimal performance of a model with training data Although these approaches are feasible with supervised learning, it is much more difficult to tune σ for unsupervised learning methods.

  8. Limitations, varying densities (1) Kernel-based clustering Original Data set Gaussian kernel, σ = 5 Still feasible but after several choices of σ Original Data set Gaussian kernel, σ = 8

  9. Limitations, varying densities (2) • the success of a kernel-based clustering relies on the choice of the kernel function; • Often unclear which kernel is the most suitable for a particular task; • kernel–based clustering maps all points using the same global similarity. Original Data set Gaussian kernel, σ = 2 Gaussian kernel, σ = 5

  10. Contributions (1) • Construct the kernel from a number of multi-resolution Gaussian kernels • And learn a resolution-specific weight for eachkernel in each cluster • Better characterization; • Density fitting; • Adaptivity to each individual cluster. Original Data set

  11. Contributions (2) • Fuzzy C-Means with Multiple Kernels (FCM-MK) • Unsupervised • Object data • Prototype defined in the input space • Clusters with varying sizes and densities • Relational Fuzzy C-Means with Multiple Kernels (RFCM-MK) • Unsupervised • Relational data • Clusters of different shapes with unbalanced densities • Multiple-resolution within the same cluster

  12. Part 1 – FCM-MK Part 1 – Prototype-based Clustering • Fuzzy C-Means with Multiple Kernels

  13. Part 1 – FCM-MK Input, Output • Input: • Output:

  14. Part 1 – FCM-MK Kernel-based Similarity • We construct a new kernel-induced similarity defined as • The normalized kernel is given by • The distance between point and center in feature space is

  15. Part 1 – FCM-MK Objective function • Optimization of an “objective function” or “performance index”

  16. Part 1 – FCM-MK Minimizing objective function (1) • Zeroing the gradient of with respect to • Zeroing the gradient of with respect to

  17. Part 1 – FCM-MK Minimizing objective function (2) • We optimize with respect to the resolution-specific weights using the gradient descent method

  18. Part 1 – FCM-MK Experimental Evaluation

  19. Part 1 – FCM-MK Experimental evaluation, Data set 1

  20. Part 1 – FCM-MK Experimental evaluation, Data set 1

  21. Part 1 – FCM-MK Experimental evaluation, Data set 2

  22. Part 1 – FCM-MK Experimental evaluation, Data set 2

  23. Part 1 – FCM-MK Experimental evaluation, Data set 3

  24. Part 1 – FCM-MK Experimental evaluation, Data set 3

  25. Part 1 – FCM-MK Experimental evaluation, Data set 4

  26. Part 1 – FCM-MK Experimental evaluation, Data set 4

  27. Part 1 – FCM-MK Object Data vs. Relational Data Distribution of σ for Relational data Distribution of σ for Object data

  28. Part 2 – RFCM-MK Part 2 – Relational Data Clustering • Relational Fuzzy C-Means with Multiple Kernels

  29. Part 2 – RFCM-MK Input, Output • Input: • Output:

  30. Part 2 – RFCM-MK Kernel-based Similarity • We construct a new kernel-induced similarity defined as • The relational data between feature points and with respect to cluster can be defined as

  31. Part 2 – RFCM-MK Objective function • Optimization of an “objective function” or “performance index”

  32. Part 2 – RFCM-MK Minimizing objective function (1) • Zeroing the gradient of with respect to

  33. Part 2 – RFCM-MK Minimizing objective function (2) • We optimize with respect to the resolution-specific weights using the gradient descent method

  34. Part 2 – RFCM-MK Experimental Evaluation

  35. Part 2 – RFCM-MK Experimental Evaluation, Data set 1

  36. Part 2 – RFCM-MK Experimental Evaluation, Data set 1

  37. Part 2 – RFCM-MK Experimental Evaluation, Data set 2

  38. Part 2 – RFCM-MK Experimental Evaluation, Data set 2

  39. Part 2 – RFCM-MK Experimental Evaluation, Data set 3

  40. Part 2 – RFCM-MK Experimental Evaluation, Data set 3

  41. Conclusions • Find the optimal kernel-induced feature map in a completely unsupervised way • Multiple Kernel Learning; • Resolution-specific weight for each kernel base in each cluster; • Fuzzy C-Means with Multiple Kernels approach • Object data; • Clusters of different densities; • Prototypes in the input space. • Relational Fuzzy C-Means with Multiple Kernels approach • Relational data; • Multiple resolution within the same cluster or different clusters; • Clusters of different shapes.

  42. Future Work (1) • Improve the performance of the proposed algorithms by using supervision information • Must-link constraints: The penalty for violating a must-link constraint between distant points should be higher than that between nearby points • Cannot-link constraints: The penalty for violating a cannot-link constraint between two points that are nearby according to the current metric should be higher than for two distant points

  43. Future Work (2) • The objective function of Semi-Supervised Fuzzy C-Means with Multiple Kernel (SS-FCM-MK) is given by

  44. Future Work (3) • The objective function of Semi-Supervised Relational Fuzzy C-Means with Multiple Kernel (SS-RFCM-MK) is given by • Study the performance versus the amount of supervision; • Compare to other semi-supervised algorithms.

  45. Future Work (4) • Automatically identify the optimal number of clusters and reduce the effect of noise and outliers • Competitive Agglomeration: Starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration; • Dave’s Noise Clustering technique (NC): seeks to separate noisy data by clustering them all into a c+1th conceptual cluster, based on the assumption that the center of such a cluster (called the noise cluster) is equidistant to all noise points in the data set.

  46. Future Work (5) • The Nearest neighbor classifier (NNC) is used for many pattern recognition applications where the underlying probability distribution of the data is unknown a priori. • Traditional NNC stores all the known data points as labeled prototypes • computationally prohibitive for very large database; • limitation of computer storage; • cost of searching for the nearest neighbors of an input vector. • Apply our algorithms to real applications involving very large and high dimensional data • Content-Based Image Retrieval (CBIR); • Land mine detection using GPR.

More Related