MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION: NEW ALGORITHMS AND APPLICATIONS

MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION: NEW ALGORITHMS AND APPLICATIONS A short presentation of two interesting unsupervised learning algorithms for vector quantization recently published in the literature

Biography • Andrea Baraldi • Laurea in Elect. Engineering, Univ. Bologna, 1989 • Consultant at ESA-ESRIN, 1991-1993 • Research associate at ISAO-CNR, Bologna, 1994-1996 • Post-doctoral fellowship at ICSI, Berkeley, 1997-1999 Scientific interests • Remote sensing applications • Image processing • Computer vision • Artificial intelligence (neural networks)

About this presentation • Basic concepts related to minimum-distance-to-means clustering • Applications in data analysis and image processing • Interesting clustering models taken from the literature: • Fully self-Organizing Simplified Adaptive Resonance Theory (FOSART, IEEE TSMC, 1999) • Enhanced Linde-Buzo-Gray (ELBG, IJKIES, 2000)

Minimum-distance-to-means clustering • Clustering as an ill-posed problem (heuristic techniques for grouping the data at hand) • Cost function minimization (inductive learning to characterize future samples) • Mean-square-error minimization = minimum-distance-to-means (vector quantization) • Entropy maximization (equiprobable cluster detection) • Joint probability maximization (pdf estimation)

Applications of unsupervised vector quantizers • Detection of hidden data structures (data clustering, perceptual grouping) • First stage unsupervised learning in RBF networks (data classification, function regression) (Bruzzone, IEEE TGARS, 1999) • Pixel-based initialization of context-based image segmentation techniques (image partitioning and classification)

FOSARTby A. Baraldi, ISAO-CNR, IEEE TSMC, 1999 • Constructive: generates (resp. removes) units and lateralconnections on an example-driven (resp. mini-batch) basis • Topology-preserving • Minimum-distance-to means clustering • On-line learning • Soft-to-hard competitive • Incapable of shifting codewords through non-contiguous Voronoi regions • Input parameters: •   (0,1] (ART-based vigilance threshold) •  (convergence threshold, e.g., 0.001)

Non-convex data set. Circular ring plus three Gaussian clusters. 140 data points. FOSART processing: 11 templates, 3 maps. FOSART APPLICATIONS: Perceptual grouping of non-convex data sets

FOSART APPLICATIONS: 3-D surface reconstruction Input: 3-D digitized human face, 9371 data points. Output: 3370 nodes, 60 maps.

ELBG by M. Russo and G. Patane`, Univ. Messina, IJKIES, 2000 • c-means minimum-distance-to-means clustering (McQueen, 1967; LBG, 1980) • Initialized by means of random selection or splitting by two (Moody and Darken, 1988) • Non-constructive • Batch learning • Hard competitive • Capable of shifting codewords through non-contiguous Voronoi regions (in line with LBG-U, Fritzke, 1977) • Input parameters: • c number of clusters •  (convergence threshold, e.g., 0.001)

Combination of ELBG with FOSART • FOSART initializes ELBG • Input parameters of the two-stage clustering system are: •   (0,1] (ART-based vigilance threshold) •  (convergence threshold, e.g., 0.001)

ELBG algorithm • Ym:codebook at iteration m • P(Ym): Voronoi (ideal) partition • S(Ym): non-Voronoi (sub-optimal) partition • D{Ym, S(Ym)}  D{Ym, P(Ym)} • Voronoi cell Si, i = 1, …, Nc, such that Si = {x  X : d(x, yi)  d(x, yj), j=1,…,Nc, j i}

ELBG block • Utility Ui = Di / Dmean, Ui  [0, ), i = 1,…, Nc, adimensional distorsion • “low” utility (< 1): distorsion below average  codeword to be shifted • “high” utility (> 1): distorsion above average  codeword to be split

ELBG block: iterative scheme • C.1) Sequential search of cell Si to be shifted (distorsion below average) • C.2) Stochastic search of cell Sp to be split (distorsion above average) • C.3) • a) Detection of codeword yn closest to yi; • b) “Local” LBG arrangement of codewords yi and yp; • c) Arrangement of yn such that S’n = Sn  Si; • C.4) Compute D’n, D’p and D’i • C.5) if (D’n + D’p + D’i) < (Dn + D’p + D’i) then accept the shift

ELBG block: initial situation before the shift of codeword attempt • C.1) Sequential search of cell Si to be shifted • C.2) Stochastic search of cell Sp to be split • C.3.a) Detection of codeword yn closest to yi;

ELBG block: initialization of the “local” LBG arrangement of yi and yp • C.3.b) “Local” LBG arrangement of codewords yi and yp;

ELBG block: situation after the initialization of the shift of codeword attempt • C.3.a) Detection of codeword yn closest to yi; • C.3.b) “Local” LBG arrangement of codewords yi and yp;

ELBG block: situation after the shift of codeword attempt • C.3.b) “Local” LBG arrangement of codewords yi and yp; • C.3.c) Arrangement of Yn such that S’n = Sn  Si; • C.4) Compute D’n, D’p and D’i • C.5) if (D’n + D’p + D’i) < (Dn + D’p + D’i) then accept the shift

Examples • Polynomial case (Russo and Patane`, IJKIES 2000) • Cantor distribution (same as above) • Fritzke’s 2-D data set (same as above) • RBF network classification (Baraldi and Blonda, IGARSS 2000) • Lena image compression

Conclusions ELBG (+ FOSART): • is stable with respect to changes in initial conditions (i.e., it is effective in approaching the absolute minimum of the cost function) • is fast to converge • features low overhead with respect to traditional LBG (< 5%) • performs better than or equal to other minimum-distance-to-means clustering algorithms found in the literature

MINIMUM-DISTANCE-TO-MEANS CLUSTERING FOR VECTOR QUANTIZATION: NEW ALGORITHMS AND APPLICATIONS