1 / 29

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

This research introduces a Fuzzy k-Modes Algorithm for clustering large categorical data sets efficiently in data mining. The algorithm extends the k-means paradigm to handle categorical data, providing more detailed information for clustering. The study compares the efficiency and performance of the algorithm with conceptual k-means and hard k-modes algorithms using real and artificial data sets.

jgantt
Download Presentation

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 國立雲林科技大學National Yunlin University of Science and Technology A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor:Dr. Hsu Graduate:Chien-Ming Hsiao Author:Zhexue Huang and Michael K. Ng

  2. Outline • Motivation • Objective • Introduction • Notation • Hard and fuzzy k-means algorithms • Hard and fuzzy k-Modes algorithms • Experimental Results • Conclusions • Personal Opinion

  3. Motivation • Working only on numeric data limits the use of these k-means-type algorithms in data mining. • Most algorithms for clustering categorical data suffer from a common efficiency problem when applied to massive categorical-only data sets.

  4. Objective • To tackle the problem of clustering large categorical data sets in data mining

  5. Introduction • Fuzzy versions of k-means algorithm • Each pattern is allowed to have membership functions to all clusters. • Working only on numeric data limits the use of these k-means-type algorithms in such areas data mining.

  6. Introduction • To cluster categorical data methods • the k-means algorithm [Ralambondrainy, 1995] • hierarchical clustering methods [Gower, 1991] • the PAM algorithm [Kaufman et al, 1990] • the fuzzy-statistical algorithms [Woodbury, 1974] • The conceptual clustering methods [Michalski, 1983]

  7. Notation • The set of objects to be clustered is stored in a database table T defined by a set of attributes A1, A2,…, Am.

  8. Hard and fuzzy k-means algorithms • Let X be a set of n objects described by m numeric attributes.

  9. Hard and fuzzy k-means algorithms • The usual method toward optimization of F is to use partial optimization for Z and W • fix Z and find necessary conditions on W to minimize F • Fix W and minimize F with respect to Z

  10. Hard and fuzzy k-means algorithms • Theorem 1 • Let be fixed and consider Problem (P1)

  11. Hard and fuzzy k-means algorithms • Theorem 2 • Let be fixed and consider Problem (P2)

  12. Hard and fuzzy k-means algorithms • The complexity of the algorithm • O(tkmn) • The space of the algorithm • O(n(m+k) + km)

  13. Hard and fuzzy k-Modes algorithms • Using a simple matching dissimilarity measure for categorical objects • Replacing the means of clusters with the modes • Using a frequency-based method to find the modes

  14. Hard and fuzzy k-Modes algorithms • Let X and Y be two categorical objects • X = • Y = • The simple matching dissimilarity measure between X and Y is defined as follows:

  15. Hard and fuzzy k-Modes algorithms • Using a frequency-based method to update Z • The Hard k-modes Update Method • The Fuzzy k-modes Update Method

  16. Hard and fuzzy k-Modes algorithms • Theorem 3 : The Hard k-modes Update Method • The category of attribute Aj of the cluster mode Zl is determined by the mode of categories of attribute Aj in the set of objects belonging to cluster l • the quantity

  17. Hard and fuzzy k-Modes algorithms • Theorem 4 : The Fuzzy k-modes Update Method • The category of attribute Aj of the cluster mode Zl is given by the category that achieves the maximum of the summation of wli to cluster l over all categories. • the quantity

  18. Hard and fuzzy k-Modes algorithms • Theorem 5

  19. Hard and fuzzy k-Modes algorithms

  20. Experimental Results • To evaluate the performance and efficiency of the fuzzy k-modes algorithm • To compare the fuzzy k-modes algorithm with the conceptual k-means algorithm and the hard k-modes algorithm • Use real and artificial data • Soybean disease data set.

  21. Experimental Results

  22. Experimental Results

  23. Experimental Results

  24. Experimental Results

  25. Experimental Results

  26. Conclusions • Introduced the fuzzy k-modes algorithm for clustering categorical objects based on extensions to the fuzzy k-means algorithm. • The consequence of Theorem 4 that allows the k-means paradigm to be used in generating the fuzzy partition matrix from categorical data

  27. Personal Opinion • The fuzzy partition matrix provides more information to help the user to determine the final clustering and to identify the boundary objects

More Related