210 likes | 351 Views
Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang PR, 2012. Determining the number of clusters using information entropy for mixed data. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Presenter : Hong-Yi, Cai Authors : JiyeLiang, XingwangZhao, DeyuLi, FuyuanCao, Chuangyin Dang PR, 2012 Determining the number of clusters using information entropy for mixed data
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Motivation The determination of the initial parametersof cluster is the most difficult problem. None of cluster algorithms can cluster effectively mixed data set.
Objectives To propose a generalized mechanism on mixed data set by integrating Renyi entropy and complement entropy. To improve k-prototype algorithm by using new generalized mechanism.
Methodology K-Prototype…
Methodology By the convolution theorem… Renyi Entropy : Within-Cluster Entropy: Parzen window density estimation: Between-Cluster Entropy: Improved Entropy for numerical data: A generalized mechanism for numerical data…
Methodology Indiscernibility relation… Within-Cluster Entropy: Complement Entropy: Between-Cluster Entropy: Huang Dissimilarity for categorical data: Improved Entropy for categorical data: A generalized mechanism for categorical data…
Methodology • A generalized mechanism for mixed data set…
Methodology For numerical data… For categorical data… For mixed data… Cluster validity index for mixed data…
Experiments Ten Cluster
Experiments STUDENT
Experiments Real data sets…
Experiments Wine Breast
Experiments Voting Car
Experiments DNA TAE
Experiments Heart Credit
Experiments CMC Adult
Conclusions The generalized mechanism and algorithm can cluster effectively and determine the optimal number of clusters for mixed data sets.
Comments • Advantages • The entropy can apply on mixed data set. • Applications • Cluster for mixed-type data