1 / 21

Determining the number of clusters using information entropy for mixed data

Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang PR, 2012. Determining the number of clusters using information entropy for mixed data. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

Download Presentation

Determining the number of clusters using information entropy for mixed data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presenter : Hong-Yi, Cai Authors : JiyeLiang, XingwangZhao, DeyuLi, FuyuanCao, Chuangyin Dang PR, 2012 Determining the number of clusters using information entropy for mixed data

  2. Outlines Motivation Objectives Methodology Experiments Conclusions Comments

  3. Motivation The determination of the initial parametersof cluster is the most difficult problem. None of cluster algorithms can cluster effectively mixed data set.

  4. Objectives To propose a generalized mechanism on mixed data set by integrating Renyi entropy and complement entropy. To improve k-prototype algorithm by using new generalized mechanism.

  5. Methodology K-Prototype…

  6. Methodology By the convolution theorem… Renyi Entropy : Within-Cluster Entropy: Parzen window density estimation: Between-Cluster Entropy: Improved Entropy for numerical data: A generalized mechanism for numerical data…

  7. Methodology Indiscernibility relation… Within-Cluster Entropy: Complement Entropy: Between-Cluster Entropy: Huang Dissimilarity for categorical data: Improved Entropy for categorical data: A generalized mechanism for categorical data…

  8. Methodology • A generalized mechanism for mixed data set…

  9. Methodology For numerical data… For categorical data… For mixed data… Cluster validity index for mixed data…

  10. Methodology

  11. Experiments Ten Cluster

  12. Experiments STUDENT

  13. Experiments Real data sets…

  14. Experiments Wine Breast

  15. Experiments Voting Car

  16. Experiments DNA TAE

  17. Experiments Heart Credit

  18. Experiments CMC Adult

  19. Experiments

  20. Conclusions The generalized mechanism and algorithm can cluster effectively and determine the optimal number of clusters for mixed data sets.

  21. Comments • Advantages • The entropy can apply on mixed data set. • Applications • Cluster for mixed-type data

More Related