490 likes | 1.28k Views
Fuzzy Clustering Algorithms. SSIE 617 2 nd Presentation Benjamin James Bush 05/02/2012. What is Clustering?. Crisp & Fuzzy Clustering. C-Means Clustering. Fixed number of clusters. One centroid per cluster. Each data point belongs to the cluster corresponding to the closest centroid .
E N D
Fuzzy Clustering Algorithms SSIE 617 2nd Presentation Benjamin James Bush 05/02/2012
C-Means Clustering Fixed number of clusters. One centroid per cluster. Each data point belongs to the cluster corresponding to the closest centroid. Figure Animation by AndreyA. Shabalin, Ph.D.
C-Means Clustering distance between data point and cluster center # of clusters cost function cost of the ith cluster data points belonging to the ith group
C-Means Clustering pick c centroids at random assign each data point to the cluster corresponding to the nearest centroid. move each centroid to the mean value of its cluster’s data points. Animation by AndreyA. Shabalin, Ph.D.
Fuzzy C-Means Clustering (FCM) Fuzzy C-Means Clustering Fixed number of clusters. One centroidper cluster. Clusters are fuzzy sets. Membership degree of a point can be any number between 0 and 1. Sum of all degrees for a point must add up to 1. Figure Animation by MatteoMatteucci, Ph.D.
Fuzzy C-Means Clustering (FCM) Fuzzy C-Means Clustering summing overall data points fuzziness exponent membership degree
Fuzzy C-Means Clustering pick c centroids at random assign membership degrees according to: move each centroid to the following position: Note: formulas are result of the method of Lagrange multipliers as applied to aforementioned cost function. Proof left as exercise.
Fuzzy Min-Max Clustering NN Variable number of clusters. Each cluster has a Hyperbox Fuzzy Set. Degrees inside the box are 1. Degrees outside the hyperbox decrease linearly with distance from the box. Total degrees for a point need not add up to 1. Boxes may not overlap.
Hyperbox Fuzzy Sets Start Mathematica...
Hyperbox Fuzzy Sets Easy to implement as ANNs. Potential to take advantage of massive parallel processing.
Initialize population of 250 randomly chosen individuals, each with arandom # of boxes. For each box, choose min point and max point at random. Evaluate the fitness of each individual based on its Minimum Description Length (MDL) Create an child individual from each member of the population. When creating a child, add a Gausseanr.v. to each component of the min and max point, and change the # of boxes with probability 0.5. Penalty for # of clusters. goodness of fit Eliminate half of the individuals via round-robin tournament competition.
Bibliography Ch. 15 Videolectures.net: MDL Tutorial http://videolectures.net/icml08_grunwald_mld/ Ch. 1