240 likes | 585 Views
2806 Neural Computation Self-Organizing Maps Lecture 9. 2005 Ari Visa. Agenda. Some historical notes Some theory Self-Organizing Map Learning Vector Quantization C onclusions . Some Historical Notes . Local ordering (von der Malsbyrg, 1973)
E N D
2806 Neural ComputationSelf-Organizing Maps Lecture 9 2005 Ari Visa
Agenda • Some historical notes • Some theory • Self-Organizing Map • Learning Vector Quantization • Conclusions
Some Historical Notes Local ordering (von der Malsbyrg, 1973) (Amari, 1980) a matematical analysis elucidates the dynamic stability of a cortical map Self-organizing feature map (SOM), (Kohonen 1982) (Erwin, 1992) a convex neighbourhood function should be used (~ Gaussian) The relationship between the SOM and principal curves is dicussed (Ritter, 1992 & Cherkassky and Mulier, 1995)
Some Historical Notes • Vector quantization: Lloyd algorithm 1957 for scalar quantization (= Max quantizer ) • The generilized Lloyd algorithm for vector quantization (= k-means algorithm McQueen 1969 = LBG algorithm Linde et al 1980) • The idea of learning vector quantization (LVQ) (Kohonen, 1986) • Convergence properties of the LVQ algorithm using the ordinary differential equation (ODE) (Baras and LaVigna, 1990)
Some Theory • The spatial location of an output neuron in a topographic map corresponds to a particular domain or feature of data drawn from the input space. (Kohonen 1990) • Two approaches: Willshaw-von der Malsburg (explain neurobiological details) and Kohonen ( a more general than the first model in the sense that it is capable of performing data reduction).
Some Theory • The principle goal of the self-organizing map is to transform an incoming signal pattern of arbitary dimension into a one- or two dimensional discrete map, and to perform this transformation adaptively in a topologically ordered fashion.
The formation of the self-organizing map 1. Competition 2. Cooperation 3. Synaptic Adaptation Competitive Process x = [x1 , x2 , ... ,xm ]T wj = [wj1,wj2 , ... ,wjm]T where j = 1,2,...,l (=total number of neurons in the network) Select the neuron with largest wjTx i(x) = arg minj||x – wj || A continuous input space of activation patterns is mapped onto a discrete output space of neurons by a process of competition among the neurons in the network . SOM
SOM Cooperation • How to define a topological neigborhood that is neurobiologically correct? • Let hij denote the topological neighborhood centered on winning neuron i, and encompassing a set of excited neurons denoted by j. • The topological neighborhood is symmetric about the maximum point • The amplitude of the topological neighborhood decreases monotonically with the increasing lateral distance hj,i(x) (n) = exp(d²j,i/2²(n)) • (n) = 0exp(-n/1), n =0,1,2,… .
SOM Adaptation • wj(n+1) = wj(n) + (n) hj,i(x)(n)[x-wj(n)], note Hebbian learning • The synaptic weight vector wj of winning neuron i move toward the input vector x. Upon repeated presentations of the training data, the synaptic weight vector tend to follow the distribution of the input vectors due to the neighborhood updating. topological ordering • The learning-rate parameter (n)should be time varying. (n) = 0exp(-n/2), n =0,1,2,…
SOM • Ordering and Convergence • Self-organizing or ordering phase: (max 1000 iterations) • (n) = [0.1, 0.01], 2 = 1000 • hj,i(n) = [”radius” of the lattice, the winning neuron and a couple neighboring neurons around], 1= 1000/ log 0 • Convergence phase: (fine tune the feature map, 500*the number of neurons in the network) (n) = 0.01, hj,i(n) = the winning neuron and one or zero neighboring neurons. (summary of SOM)
Some Theory • Property 1. Approximation of the Input Space. The feature map , represented by the set of synaptic weight vectors {wj} in the input space A, provides a good approximation to the input space H. • The theoretical basis of the idea is rooted in vector quantization theory.
Some Theory • c(x) acts as an encoder of the input vector x and x’(c) acts as a decoder of c(x). The vector x is selected at random from the training sample, subject to an underlying probability density function fx(x). The optimum encoding-decoding scheme is detemined by varying the functions c(x) and x’(c), so as to minimize the expected distortion defined by .
Some Theory • d(x,x’) = ||x-x’||² = (x-x’)T(x-x’) Generalized Lloyd algorithm Condition 1. Given the input vector x, choose the code c = c(x) to minimize the squared error distortion ||x-x’(c)||². Condition 2. Given the code c, compute the reconstuction vector x =x’(c) as the centroid of the input vectors x that satisfy condition 1. The Generalized Lloyd algorithm is closely related to the SOM.
Some Theory Condition 1. Given the input vector x, choose the code c = c(x) to minimize the distortion measure D2 . Condition 2. Given the code c, compute the reconstuction vector x =x’(c) to satisfy the conditions. • x’new(c) ← x’old(c) + (c-c(x))[ x-x’old(c)]
Some Theory • Property 2. Topological Ordering. The feature map computed by the SOM algorithm is topologically ordered in the sense that the spatial location of a neuron in the lattice corresponds to a particular domain or feature of input patterns.
Some Theory • Property 3. Density Matching. The feature map reflects variations in the statistics of the input distributions: regions in the input space H from which the sample vectors x are drawn with a high probability of occurrence are mapped onto larger domains of the output space A, and therefore with better resolution than regions in H from which sample vectors x are drawn with a low probability of occurrence. • Minimum-distortion encoding, according to which the curvature terms and all higher-order terms in the distortion measure due to noise model () are retained. m(x) fx ⅓(x) • Nearest-neighbor encoding, which emerges if the curvature terms are ignored, as in the standard form of the SOM algorithm. m(x) fx ⅔(x)
Some Theory • Property 4. Feature Selection. Given data from an input space with an nonlinear distribution, the self-organizing map is able to select a set of best features for approximating the underlying distribution. .
Learning Vector Quantizer • Vector quantization: an input space is divided into a number of distinct regions, and for each region a reconstruction vector is defined. • A vector quantizer with minimum encoding distortion is called a Voronoi or nearest-neighbor quantizer. • The collection of possible reproduction vectors is called the code book of the quantizer, and its members are called code vectors.
Learning Vector Quantizer The SOM algorithm provides an approximate method for computing the Voronoi vectors in unsupervised manner. Learning vector quantization (LVQ) is a supervised learning technique that uses class information to move the Voronoi vectors slightly, so as to improve the quality of the classifier decision regions.
An input vector x is picked at random from the input space. If the class labels of the input vector x and a Voronoi vector w agree, the Voronoi vector w is moved in the direction of the input vector x. If the class labels of the input vector x and the Voronoi vector w disagree, the Voronoi vector w is moved away from the input vector x. Let {wi}1i=1 denote the set of Voronoi vectors, and the {xi}Ni=1 denote the set of input vectors. LVQ: Suppose that the Voronoi vector wc is the closest to the input vector xi . Let Lwc denote the class associated with the Voronoi vector wc and Lxi denote the class label of the input vector xi. The Voronoi vector wc is adjusted as follows: If Lwc = Lxi ,then wc(n+1) = wc(n) + αn[xi - wc(n)] where 0< αn<1. If Lwc ≠ Lxi ,then wc(n+1) = wc(n) - αn[xi - wc(n)] where 0< αn<1. II. The other Voronoi vectors are not modified. Learning Vector Quantizer
Learning Vector Quantizer Vector quantization is a form of lossy data compression. Rate distortion theory (Gray 1984): Better data compression performance can always be achieved by coding vectors instead of scalars, even if the source of data is memoryless, or if the data compression system has memory. A multistage hierarchical vector quantizer (Luttrell 1989)
Learning Vector Quantizer • First-order autoregressive (AR) model: x(n+1) = x(n) +(n) where is the AR coefficient and the (n) are independent and identical distributed (iid) Gaussian random variables of zero mean and unit variance.
Some Theory • Attribute code xa • Symbol code xs • x =[xs, 0]T+ [0,xa]T • Contextual map or semantic map
Summary • The SOM algorithm is neurobiologically inspired, incorporating all the mechanisms that are basic to self-organization: competition, cooperation, and self-amplification. • The Kohonen’s SOM algorithm is so simple to implement, yet matematically so difficult to analyze its properties in a general setting. • The self-organizing map may be viewed as a vector quantizer.