200 likes | 297 Views
Clustering Algorithms Meta Applier (CAMA) Toolbox. Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov. Clustering. Goals To detect the underlying structure in data To reduce data set capacity To extract unique objects Usage Data mining Machine learning Financial mathematics
E N D
Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov
Clustering • Goals • To detect the underlying structure in data • To reduce data set capacity • To extract unique objects • Usage • Data mining • Machine learning • Financial mathematics • Optimization • Statistics • Pattern recognition • Control strategies development SYRCoSE’09
Clustering Problem Clustering and Classification SYRCoSE’09
Variety of Clustering Algorithms • Hierarchical • Aglomerative • Partitioning • Iterative • Hard (K-means, SVM, SPSA) • Fuzzy (FCM) Important parameters -Distance norm -Number of clusters -Initial values of cluster centers SYRCoSE’09
Cluster Stability Algorithms • Indexes • Stability (similarity, merit) functions • Probabilistic measures assessing the likelihood of a decision • Density estimation approaches SYRCoSE’09
Stochastic Approximation Recursive stochastic approximation FDSA SPSA SYRCoSE’09
Effectiveness of SPSA SYRCoSE’09
Finding the number of clusters in data set • Run the SPSA algorithm for different numbers of clusters, K, and calculate the corresponding distortions • Select a transformation power, Y • Calculate the “jumps” in transformed distortion • Estimate the number of clusters in the data set by SYRCoSE’09
Structure of data set detection SYRCoSE’09
Examples • Iris (3 clusters, 4 features, 150 instances) • Wine (3 clusters, 13 features, 178 instances) • Breast Cancer (2 clusters, 32 features, 569 instances) • Image Segmentation (7 clusters, 19 features, 2310 instances) SYRCoSE’09
Software Tools for Clustering Analysis • Research • COMPACT • DCPR (Data Clustering & Pattern Recognition) • FCDA (Fuzzy Clustering and Data Analysis Toolbox) • ClusterPack Matlab Toolbox • The Curve Clustering Toolbox • SOM (Self-Organizing Map) • Spectral Clustering Toolbox • Yashil's FCM Clustering • License software • SPSS • STATISTICA • Characteristics • Visualization • Efectiveness analysis with patterns • Tools to check performance • Shortcomings • Limited number of data sets and algorithms • No possibilities to load own algorithm • No on-line services • MATLAB SYRCoSE’09
Clustering Algorithms Meta Applier SYRCoSE’09
Clustering Algorithms Meta Applier SYRCoSE’09
CAMA. Kernel SYRCoSE’09
CAMA. Kernel SYRCoSE’09
CAMA Toolboxhttp://ancient.punklan.net:8084/CAMA2/index.jsp SYRCoSE’09
CAMA Toolbox SYRCoSE’09
CAMA Toolbox SYRCoSE’09
Thank you! SYRCoSE’09