A Fuzzy k-Modes Algorithm for Clustering Categorical Data

國立雲林科技大學National Yunlin University of Science and Technology A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor：Dr. Hsu Graduate：Chien-Ming Hsiao Author：Zhexue Huang and Michael K. Ng

Outline • Motivation • Objective • Introduction • Notation • Hard and fuzzy k-means algorithms • Hard and fuzzy k-Modes algorithms • Experimental Results • Conclusions • Personal Opinion

Motivation • Working only on numeric data limits the use of these k-means-type algorithms in data mining. • Most algorithms for clustering categorical data suffer from a common efficiency problem when applied to massive categorical-only data sets.

Objective • To tackle the problem of clustering large categorical data sets in data mining

Introduction • Fuzzy versions of k-means algorithm • Each pattern is allowed to have membership functions to all clusters. • Working only on numeric data limits the use of these k-means-type algorithms in such areas data mining.

Introduction • To cluster categorical data methods • the k-means algorithm [Ralambondrainy, 1995] • hierarchical clustering methods [Gower, 1991] • the PAM algorithm [Kaufman et al, 1990] • the fuzzy-statistical algorithms [Woodbury, 1974] • The conceptual clustering methods [Michalski, 1983]

Notation • The set of objects to be clustered is stored in a database table T defined by a set of attributes A1, A2,…, Am.

Hard and fuzzy k-means algorithms • Let X be a set of n objects described by m numeric attributes.

Hard and fuzzy k-means algorithms • The usual method toward optimization of F is to use partial optimization for Z and W • fix Z and find necessary conditions on W to minimize F • Fix W and minimize F with respect to Z

Hard and fuzzy k-means algorithms • Theorem 1 • Let be fixed and consider Problem (P1)

Hard and fuzzy k-means algorithms • Theorem 2 • Let be fixed and consider Problem (P2)

Hard and fuzzy k-means algorithms • The complexity of the algorithm • O(tkmn) • The space of the algorithm • O(n(m+k) + km)

Hard and fuzzy k-Modes algorithms • Using a simple matching dissimilarity measure for categorical objects • Replacing the means of clusters with the modes • Using a frequency-based method to find the modes

Hard and fuzzy k-Modes algorithms • Let X and Y be two categorical objects • X = • Y = • The simple matching dissimilarity measure between X and Y is defined as follows:

Hard and fuzzy k-Modes algorithms • Using a frequency-based method to update Z • The Hard k-modes Update Method • The Fuzzy k-modes Update Method

Hard and fuzzy k-Modes algorithms • Theorem 3 : The Hard k-modes Update Method • The category of attribute Aj of the cluster mode Zl is determined by the mode of categories of attribute Aj in the set of objects belonging to cluster l • the quantity

Hard and fuzzy k-Modes algorithms • Theorem 4 : The Fuzzy k-modes Update Method • The category of attribute Aj of the cluster mode Zl is given by the category that achieves the maximum of the summation of wli to cluster l over all categories. • the quantity

Hard and fuzzy k-Modes algorithms • Theorem 5

Hard and fuzzy k-Modes algorithms

Experimental Results • To evaluate the performance and efficiency of the fuzzy k-modes algorithm • To compare the fuzzy k-modes algorithm with the conceptual k-means algorithm and the hard k-modes algorithm • Use real and artificial data • Soybean disease data set.

Experimental Results

Conclusions • Introduced the fuzzy k-modes algorithm for clustering categorical objects based on extensions to the fuzzy k-means algorithm. • The consequence of Theorem 4 that allows the k-means paradigm to be used in generating the fuzzy partition matrix from categorical data

Personal Opinion • The fuzzy partition matrix provides more information to help the user to determine the final clustering and to identify the boundary objects

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

Presentation Transcript

ROCK: A ROBUST CLUSTERING ALGORITHM FOR CATEGORICAL ATTRIBUTES

Clustering Algorithms for Categorical Data Sets

ROCK: A Robust Clustering Algorithm for Categorical Attributes

k - medoid clustering with genetic algorithm

A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

On Data Labeling for Clustering Categorical Data

A dissimilarity measure for the K-Modes clustering algorithm

MGR: An information theory based hierarchical divisive clustering algorithm for categorical data

A Hierarchical Clustering Algorithm for Categorical Sequence Data

A k-mean clustering algorithm for mixed numeric and categorical data

A Secure Clustering Algorithm for Distributed Data Streams

Clustering Algorithm

CACTUS-Clustering Categorical Data Using Summaries

A Genetic Algorithm Approach to K -Means Clustering

Rek-means A k-means Based Clustering Algorithm

Categorical K-means Clustering Algorithm

Clustering Categorical Data