330 likes | 498 Views
Fast accurate fuzzy clustering through data reduction. Outline. Motivation Objective Introduction Related Work BRFCM BRFCM Implementation Experiments Conclusion Personal Opinion Review. Motivation. The problem of the clustering. Fuzzy c-mean(FCM). Objective.
E N D
Outline • Motivation • Objective • Introduction • Related Work • BRFCM • BRFCM Implementation • Experiments • Conclusion • Personal Opinion • Review
Motivation • The problem of the clustering. • Fuzzy c-mean(FCM).
Objective • As title “Fast Accurate Fuzzy Clustering Through Data Reduction”.~brFCM. • Be able to reduce the number of distinct patterns which must be clustered without adversely affecting partition quality. • The reduction is done by aggregating similar examples and then using a weighted exemplar in the clustering process.
Introduction • Clustering in images. • Some modifications to the fuzzy c-means clustering algorithm. • Two experiment to test speedup and FCM correspondence results. • Infrared images of natural scenes. • Magnetic resonance images of the human brain.
Related Work(1/2) • For large data sets, the problem of FCM is significant amounts of CPU times. • The variants of FCM. • AFCM. • mrFCM. • subsampling algorithm. • In this paper, the combination of similar feature vectors is used to speed up FCM.
Related Work (2/2) • Our work on speeding up fuzzy c-means has some connection to vector quantization. • In the sense that our first step can be seen to be a quantization of the data.
BRFCM • 2rFCM • Reducing the precision of the data, in order to speed up the clustering. • The brFCM algorithm consists of two phases: • Data reduction. • Fuzzy clustering using FCM. • We attempt to reduce the number of distinct examples to be clustered from n to no, for some no << n.
BRFCM-Data Reduction:Overview • The first step is quantization. • Quantization forces different continuous values into the same quantization level or bin. • The second step is aggregation. • Aggregation combines identical feature vectors into a single, weighted exemplar which representing the quantization bin.ex: the mean value of all full-precision feature vectors. • When both quantization and aggregation are used, significant data reduction can be obtained.
BRFCM-Data Reduction:Overview • The quantization is an optional step in data reduction. • The brFCM with only aggregation is functionally equivalent to the original FCM. • If data redundancy is significant, the dataset can be represented in a more compact form for clustering.
BRFCM-brFCM Details • Data reduction -> brFCM. • In more formal terms • X’ of example vectors representing a reduced-precision view of the dataset X. • There are no such vectors, . • Each represents the mean of all full-precision members in the quantization bin. . • representing the number of feature vectors aggregated into .
BRFCM-brFCM Details • The cluster centroids are calculated by • The cluster membership values are calculated by
BRFCM-brFCM Details • Two particular features of this algorithm. • When no quantization occurs and the aggregation step doesn’t reduce the dataset, and for all . The algorithm reduces to FCM. • When the aggregation step is used by itself, the algorithm also reduces to FCM. This formulation can significantly improve the speed of clustering, without a loss of accuracy.
BRFCM-Image Characteristics • RGB image consisting of possible values.(4096 * 4096 pixel image) • Consider quantizing RGB space by r = 2 , this will create a space of size .(512*512 pixel image)
BRFCM Implementation • For this work, quantization was implemented via bit-masking and aggregation was done using a hashing scheme. • A. Formula Implementation • The cluster centroids in (1). . • The membership values in (2). When i = j.
BRFCM Implementation • B. Quantization • Quantization of a feature space can be done either using fixed-size bins or variable-sized bins. • The brFCM can be implemented efficiently using fixed-size bins. • A more general approach to quantization can be
BRFCM Implementation • C. Aggregation Using Hashing. • The function is given by
Experiments • The experiments in two image domains. • A set of infrared images. • Magnetic resonance images of the normal human brain which are segmented into gray matter,white matter and cerebro-spinalfluid. • Data reduction. • Clustering time. • Cluster result.
Experiments-Infrared Images • Our 172 ATR images are 8-bit(256 value) infrared images of size 398400 pixels. • The image were clustered into c=5 clusters. • We use two features:intensity and one Laws’ Texture Energy feature. • Table 3 shows the remarkable level of reduction seen in these images.
Experiments-Correspondence With FCM • To measure, the cluster correspondence in clustering results with FCM. • Consider two partitions of X={x1,x2,…,xn}: • We define the maximal intersection of • The correspondence mapping can then be defined as the mapping of cluster such that , for all cluster in .
Experiments-Correspondence With FCM • The algorithm for calculating the cluster correspondence. • Find correspondence mapping • Correspondence rate Corr1 is the sum of all maximal intersections in the correspondence mapping, divided by number of examples in X. • Repeat for Corr2 (using ). • Correspondence rate CR=max(Corr1, Corr2).
Experiments-Correspondence With FCM • How significant are the brFCM-FCM correspondence rates as r increases? • brFCM generally creates partitions very similar to FCM, given the same centroid initializations for this dataset.
Experiments-Magnetic Resonance Images • The set of MR images consisted of 256*256 12-bit images. Each pixel consisted of three features (T1, T2 and PD). 32 MRI slices. • Each MR image has an associated ground truth. • The images were created by the KNN with k=7, where the training data was chosen by a person who could be labeled a radiology technician. • There are three classes of interest in the magnetic resonance images, cerebro-spinal fluid,gray matter and white matter.
Experiments-Magnetic Resonance Images • 1) Performance Speedups
Experiments-Magnetic Resonance Images • 2)Correspondence With FCM on Ground Truth
Experiments-Discussion • The brFCM algorithm generates significant speedup over literal FCM in the infrared image dataset and the MRI dataset. • A trade off exists between the FCM correspondence and speedup, Fig.2.
Conclusion • Speedups versus the bit reduction. • The higher the value of r, the higher speedup, the lower accurate. • This approach to speeding up clustering can be applied equally well to hard c-means and EM clustering or the optimization to FCM. • For many image clustering problems, brFCM is a fast alternative to traditional FCM.
Personal Opinion • A trade off between accurate and speedup. • Data reduction • Numical data => bit mask. • Categorical data => Conceptual hierarchical.
Review • Fuzzy C-Mean(FCM) • Data Reduction • Quantization Using Bit Mask. • Aggregation Using Hashing. • Fuzzy clustering using FCM. • Two experiments • Infrared images. • Magnetic resonance images of the normal human brain.