3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ) References M. Biehl, A. Freking, G. Reents Dynamics of on-line competitive learning Europhysics Letters 38 (1997) 73-78 M. Biehl, A. Ghosh, B. Hammer Dynamics and generalization ability of LVQ algorithms J. Machine Learning Research 8 (2007) 323-360 and references in the latter

example: identification and grouping in clusters of similar data assignment of feature vector  to the closest prototypew (similarity or distance measure, e.g. Euclidean distance ) Vector Quantization (VQ) aim: representation of large amounts of data by (few) prototype vectors

• initialize K prototype vectors • present a single example • identify the closest prototype, i.ethe so-calledwinner • move the winner even closertowards the example unsupervised competitive learning intuitively clear, plausible procedure - places prototypes in areas with high density of data - identifies the most relevant combinations of features - (stochastic) on-line gradient descent with respect to the cost function ...

quantization error here: Euclidean distance wjis the winner ! prototypes data aim: faithful representation (in general: ≠ clustering ) Result depends on - the number of prototype vectors - the distance measure / metric used

•initialize prototype vectors for different classes basic, heuristic LVQ scheme: LVQ1 [Kohonen] • present a single example   •identify the closest prototype, i.ethe so-calledwinner classification:    assignment of a vector to the class of the closest prototypew  •move the winner -closertowards the data (same class)  -away from the data (different class) piecewise linear decision boundaries Learning Vector Quantization ∙identification of prototype vectors from labelled example data ∙distance based classification (e.g. Euclidean, Manhattan, …) N-dim.feature space  aim: generalization ability classificationof novel data after learning from examples

often based on heuristic arguments or cost functions with unclear relation to generalization here: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - typical properties in a model situation LVQ algorithms ... • frequently applied in a variety • of practical problems • plausible, intuitive, flexible • - fast, easy to implement • limited theoretical understanding of • - dynamics and convergence properties • - achievable generalization ability

orthonormal center vectors: B+, B-∈ ℝN, ( B )2 =1, B+·B- =0 prior weights of classes p+,p- p+ + p- = 1 (p-) ℓ cluster distance ∝ ℓ B- B+ (p+) Model situation: two clusters of N-dimensional data random vectors ∈ ℝN according to mixture of two Gaussians: ℝN indep. components with and variance:

projections on two independent random directions w1,2 μ w ξ = × x 2 2 μ ξ = × y B - - μ w ξ μ × = ξ x = × y B + + 1 1 high-dimensional data (formally: N∞) ξμ∈ℝN , N=200, ℓ=1, p+=0.4, v+=1.44, v-=0.64 (● 240) (○ 160) projections into the plane of center vectors B+,B-

learning rate, step size change of prototype towards or away from the current data update of two prototype vectors w+, w- : competition, direction of update etc. Dynamics of on-line training sequence of new, independent random examples drawn according to example: LVQ1, original formulation [Kohonen] Winner-Takes-All (WTA) algorithm

1. description in terms of a few characteristic quantitities random vector according to : avg. length ( here: ℝ2N  ℝ7) length and relative position of prototypes projections into the (B+, B-)-plane 2. average over the current example correlated Gaussian random quantities in the thermodynamic limit N   completely specified in terms of first and second moments (w/o indices μ): Mathematical analysis of the learning dynamics algorithm  recursions

(mean and variance) • computer simulations (LVQ1) • mean results approach • theoretical prediction • - variance vanishes as N   of characteristic quantities R++ (α=10) 3. self-averaging property - depend on the random sequence of example data - their fluctuations vanish with N   learning dynamics is completely described in terms of averages  averaged recursionsclosed in 1/N

4. continuous learning time stochastic recursions  deterministic ODE integration yields evolution of projections 5. learning curve probability for misclassification of a novel example # of examples # of learning steps per degree of freedom  generalization error εg(α)after training with α N examples

initialization ws(0)=0 Q-- R++ (α=10) Q++ RSσ Q+- 1/N α theory and simulation(N=100) p+=0.8, v+=4, p+=9, ℓ=2.0, =1.0 averaged over 100 indep. runs self-averaging property (mean and variances) LVQ1: The winner takes it all only the winner is updated according to the class label 1 winner ws

initializationws(0)≈0 Q-- w+ ℓ B- w- Q++ RSσ ℓ B+ Q+- w+ α theory and simulation(N=100) p+=0.8, v+=4, v+=9, ℓ=2.0, =1.0 averaged over100indep. runs LVQ1: The winner takes it all only the winner is updated according to the class label 1 winnerws RS- RS+ Trajectories in the(B+,B- )-plane (•)=20,40,....140 ....... optimal decision boundary ____ asymptotic position

η= 2.0 1.0 0.2  η Learning curve • suboptimal, non-monotonic • behavior for small η εg p+ = 0.2, ℓ=1.0 v+= v- = 1.0 - stationary state: εg (α∞) grows linearly withη -well-defined asymptotics: η 0, α∞, (ηα ) ∞ achievable generalization error: εg εg v+= v- =1.0 v+ =0.25 v-=0.81 .... best linear boundary ― LVQ1 p+ p+

problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification for α∞: εg = min { p+,p- } theory and simulation (N=100) p+=0.8, ℓ=1, v+=v-=1, =0.5 averages over 100 independent runs LVQ 2.1 [Kohonen] here:update correct and wrong winner RS- RS+

εg η= 2.0, 1.0, 0.5  η suggested strategy: selection of data in a window close to the current decision boundary slows down the repulsion, system remains instable Early stopping: end training process at minimal εg (idealized) • pronounced minimum in εg (α) • depends on initialization and • cluster geometry • here: lowest minimum • value reached for η0 εg v+ =0.25 v-=0.81 ―LVQ1 __early stopping p+

Learning curves: εg p+=0.8, ℓ=3.0 v+=4.0, v-=9.0 η= 2.0, 1.0, 0.5  η-independent asymptotic εg Learning From Mistakes (LFM) LVQ2.1 update only if the current classification is wrong crisp limit version of Soft Robust LVQ [Seo and Obermayer, 2003] projected trajetory: RS- ℓ B- ℓ B+ RS+ p+=0.8, ℓ= 1.2, v+=v-=1.0

Comparison: achievable generalization ability v+=v-=1.0 v+=0.25 v-=0.81 equal cluster variances unequal variances εg p+ p+ ..... best linear boundary ―LVQ1 --- LVQ2.1 (early stopping) ·-·LFM ―trivial classification

competitive learning ws winner numerical integration for ws(0)≈0 ( p+=0.2, ℓ=1.0, =1.2 ) R-- system is invariant under exchange of the prototypes  weakly repulsive fixed points εg VQ 1.0 R++ R+- LVQ+ LVQ1 R-+ 0 α α 200 100 α 0 300 Vector Quantization class membership is unknown or identical for all data

εg asymptotics (,0, ) p+≈0 p-≈1 p+ - low quantization error - high gen. error εg interpretations: • VQ, unsupervised learning • unlabelled data • LVQ, two prototypes of the • same class, identical labels • LVQ, different classes, but • labels are not used in training

Summary • a model scenario of LVQ training • two clusters, two prototypes • dynamics of online training • comparison of algorithms (within the model): • LVQ 1 : original formulation of LVQ • with close to optimal asymptotic generalization • LVQ 2.1.: intuitive extension creates instability • trivial (stationary) classification • ...+ stopping: potentially good performance • practical difficulties, depends on initialization • LFM : crisp limit of Soft Robust LVQ, stable behavior • far from optimal generalization • VQ : description of in-class competition

Generalized Relevance LVQ [e.g. Hammer & Villmann] • adaptive metrics, e.g. distance measure training neighborhood preserving SOM Neural Gas (distance rank based) Outlook • multi-class, multi-prototype problems • optimized procedures: learning rate schedules • variational approach / Bayes optimal on-line • Self-Organizing Maps (SOM) • applications

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

Presentation Transcript

Subband Coding

LEARNING VECTOR QUANTIZATION Presentation By : Mihajlo Grbovic

Vector Quantization

Chap 8: Adaptive Networks

Polaris Coordinates of a Vector

Subband Coding

The Quantization of Charge

An adaptive image authentication scheme f or vector quantization compressed image

Cached Vector Quantization.

Image Compression Using Address-Vector Quantization NASSER M. NASRABADI, and YUSHU FENG

n - dim Vector

High-capacity image hiding scheme based on vector quantization

Quantization

Chapter 19 Speech Encoding by Wave Data

A New Dynamic Finite-State Vector Quantization Algorithm for Image Compression

A Multi-level Approach to Quantization

EE354 : Communications System I

模式识别理论及应用 Pattern Recognition - Methods and Application

VECTOR CALCULUS

Subband Coding

Quantization