Fast accurate fuzzy clustering through data reduction

Fast accurate fuzzy clustering through data reduction

Outline • Motivation • Objective • Introduction • Related Work • BRFCM • BRFCM Implementation • Experiments • Conclusion • Personal Opinion • Review

Motivation • The problem of the clustering. • Fuzzy c-mean(FCM).

Objective • As title “Fast Accurate Fuzzy Clustering Through Data Reduction”.~brFCM. • Be able to reduce the number of distinct patterns which must be clustered without adversely affecting partition quality. • The reduction is done by aggregating similar examples and then using a weighted exemplar in the clustering process.

Introduction • Clustering in images. • Some modifications to the fuzzy c-means clustering algorithm. • Two experiment to test speedup and FCM correspondence results. • Infrared images of natural scenes. • Magnetic resonance images of the human brain.

Related Work(1/2) • For large data sets, the problem of FCM is significant amounts of CPU times. • The variants of FCM. • AFCM. • mrFCM. • subsampling algorithm. • In this paper, the combination of similar feature vectors is used to speed up FCM.

Related Work (2/2) • Our work on speeding up fuzzy c-means has some connection to vector quantization. • In the sense that our first step can be seen to be a quantization of the data.

BRFCM • 2rFCM • Reducing the precision of the data, in order to speed up the clustering. • The brFCM algorithm consists of two phases： • Data reduction. • Fuzzy clustering using FCM. • We attempt to reduce the number of distinct examples to be clustered from n to no, for some no << n.

BRFCM－Data Reduction:Overview • The first step is quantization. • Quantization forces different continuous values into the same quantization level or bin. • The second step is aggregation. • Aggregation combines identical feature vectors into a single, weighted exemplar which representing the quantization bin.ex: the mean value of all full-precision feature vectors. • When both quantization and aggregation are used, significant data reduction can be obtained.

BRFCM－Example

BRFCM－Data Reduction:Overview • The quantization is an optional step in data reduction. • The brFCM with only aggregation is functionally equivalent to the original FCM. • If data redundancy is significant, the dataset can be represented in a more compact form for clustering.

BRFCM－brFCM Details • Data reduction －> brFCM. • In more formal terms • X’ of example vectors representing a reduced-precision view of the dataset X. • There are no such vectors, . • Each represents the mean of all full-precision members in the quantization bin. . • representing the number of feature vectors aggregated into .

BRFCM－brFCM Details • The cluster centroids are calculated by • The cluster membership values are calculated by

BRFCM－brFCM Details • Two particular features of this algorithm. • When no quantization occurs and the aggregation step doesn’t reduce the dataset, and for all . The algorithm reduces to FCM. • When the aggregation step is used by itself, the algorithm also reduces to FCM. This formulation can significantly improve the speed of clustering, without a loss of accuracy.

BRFCM－Image Characteristics • RGB image consisting of possible values.(4096 * 4096 pixel image) • Consider quantizing RGB space by r = 2 , this will create a space of size .(512*512 pixel image)

BRFCM Implementation • For this work, quantization was implemented via bit-masking and aggregation was done using a hashing scheme. • A. Formula Implementation • The cluster centroids in (1). . • The membership values in (2). When i = j.

BRFCM Implementation • B. Quantization • Quantization of a feature space can be done either using fixed-size bins or variable-sized bins. • The brFCM can be implemented efficiently using fixed-size bins. • A more general approach to quantization can be

BRFCM Implementation • C. Aggregation Using Hashing. • The function is given by

Experiments • The experiments in two image domains. • A set of infrared images. • Magnetic resonance images of the normal human brain which are segmented into gray matter,white matter and cerebro-spinalfluid. • Data reduction. • Clustering time. • Cluster result.

Experiments－Infrared Images • Our 172 ATR images are 8-bit(256 value) infrared images of size 398400 pixels. • The image were clustered into c=5 clusters. • We use two features:intensity and one Laws’ Texture Energy feature. • Table 3 shows the remarkable level of reduction seen in these images.

Experiments－Infrared Images

Experiments－Correspondence With FCM • To measure, the cluster correspondence in clustering results with FCM. • Consider two partitions of X={x1,x2,…,xn}: • We define the maximal intersection of • The correspondence mapping can then be defined as the mapping of cluster such that , for all cluster in .

Experiments－Correspondence With FCM • The algorithm for calculating the cluster correspondence. • Find correspondence mapping • Correspondence rate Corr1 is the sum of all maximal intersections in the correspondence mapping, divided by number of examples in X. • Repeat for Corr2 (using ). • Correspondence rate CR=max(Corr1, Corr2).

Experiments－Correspondence With FCM • How significant are the brFCM-FCM correspondence rates as r increases? • brFCM generally creates partitions very similar to FCM, given the same centroid initializations for this dataset.

Experiments－Magnetic Resonance Images • The set of MR images consisted of 256*256 12-bit images. Each pixel consisted of three features (T1, T2 and PD). 32 MRI slices. • Each MR image has an associated ground truth. • The images were created by the KNN with k=7, where the training data was chosen by a person who could be labeled a radiology technician. • There are three classes of interest in the magnetic resonance images, cerebro-spinal fluid,gray matter and white matter.

Experiments－Magnetic Resonance Images

Experiments－Magnetic Resonance Images • 1) Performance Speedups

Experiments－Magnetic Resonance Images • 2)Correspondence With FCM on Ground Truth

Experiments－Discussion • The brFCM algorithm generates significant speedup over literal FCM in the infrared image dataset and the MRI dataset. • A trade off exists between the FCM correspondence and speedup, Fig.2.

Conclusion • Speedups versus the bit reduction. • The higher the value of r, the higher speedup, the lower accurate. • This approach to speeding up clustering can be applied equally well to hard c-means and EM clustering or the optimization to FCM. • For many image clustering problems, brFCM is a fast alternative to traditional FCM.

Personal Opinion • A trade off between accurate and speedup. • Data reduction • Numical data => bit mask. • Categorical data => Conceptual hierarchical.

Review • Fuzzy C-Mean(FCM) • Data Reduction • Quantization Using Bit Mask. • Aggregation Using Hashing. • Fuzzy clustering using FCM. • Two experiments • Infrared images. • Magnetic resonance images of the normal human brain.

Fast accurate fuzzy clustering through data reduction

Fast accurate fuzzy clustering through data reduction

Presentation Transcript

Data reduction for weighted and outlier-resistant clustering

Exploring Data using Dimension Reduction and Clustering

Fuzzy Clustering with Multiple Kernels

Fast and Accurate MRFs through Evidence-Specific Structures

Fuzzy Clustering Using the EM

Fuzzy Clustering Algorithms

Statistical analysis of array data: Dimensionality reduction, Clustering

Tutorial On Fuzzy Clustering

FUZZY CLUSTERING OF BRAND PRODUCT CUSTOMER LOYALTY DATA

Delivering fast and accurate name and address data

Examining Activity Patterns Using Fuzzy Clustering

Data Clustering

Recent Trends in Fuzzy Clustering: From Data to Knowledge

Fuzzy C-means Clustering

Unsupervised Optimal Fuzzy Clustering

Data Clustering

Fast. Accurate. Affordable

Statistical analysis of array data: Dimensionality reduction, Clustering

A Fuzzy k-Modes Algorithm for Clustering Categorical Data