1 / 23

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ). References M. Biehl, A. Freking, G. Reents Dynamics of on-line competitive learning Europhysics Letters 38 (1997) 73-78 M. Biehl, A. Ghosh, B. Hammer Dynamics and generalization ability of LVQ algorithms

mira-travis
Download Presentation

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ) References M. Biehl, A. Freking, G. Reents Dynamics of on-line competitive learning Europhysics Letters 38 (1997) 73-78 M. Biehl, A. Ghosh, B. Hammer Dynamics and generalization ability of LVQ algorithms J. Machine Learning Research 8 (2007) 323-360 and references in the latter

  2. example: identification and grouping in clusters of similar data assignment of feature vector  to the closest prototypew (similarity or distance measure, e.g. Euclidean distance ) Vector Quantization (VQ) aim: representation of large amounts of data by (few) prototype vectors

  3. • initialize K prototype vectors • present a single example • identify the closest prototype, i.ethe so-calledwinner • move the winner even closertowards the example unsupervised competitive learning intuitively clear, plausible procedure - places prototypes in areas with high density of data - identifies the most relevant combinations of features - (stochastic) on-line gradient descent with respect to the cost function ...

  4. quantization error here: Euclidean distance wjis the winner ! prototypes data aim: faithful representation (in general: ≠ clustering ) Result depends on - the number of prototype vectors - the distance measure / metric used

  5. •initialize prototype vectors for different classes basic, heuristic LVQ scheme: LVQ1 [Kohonen] • present a single example   •identify the closest prototype, i.ethe so-calledwinner classification:    assignment of a vector to the class of the closest prototypew  •move the winner -closertowards the data (same class)  -away from the data (different class) piecewise linear decision boundaries Learning Vector Quantization ∙identification of prototype vectors from labelled example data ∙distance based classification (e.g. Euclidean, Manhattan, …) N-dim.feature space  aim: generalization ability classificationof novel data after learning from examples

  6. often based on heuristic arguments or cost functions with unclear relation to generalization here: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - typical properties in a model situation LVQ algorithms ... • frequently applied in a variety • of practical problems • plausible, intuitive, flexible • - fast, easy to implement • limited theoretical understanding of • - dynamics and convergence properties • - achievable generalization ability

  7. orthonormal center vectors: B+, B-∈ ℝN, ( B )2 =1, B+·B- =0 prior weights of classes p+,p- p+ + p- = 1 (p-) ℓ cluster distance ∝ ℓ B- B+ (p+) Model situation: two clusters of N-dimensional data random vectors ∈ ℝN according to mixture of two Gaussians: ℝN indep. components with and variance:

  8. projections on two independent random directions w1,2 μ w ξ = × x 2 2 μ ξ = × y B - - μ w ξ μ × = ξ x = × y B + + 1 1 high-dimensional data (formally: N∞) ξμ∈ℝN , N=200, ℓ=1, p+=0.4, v+=1.44, v-=0.64 (● 240) (○ 160) projections into the plane of center vectors B+,B-

  9. learning rate, step size change of prototype towards or away from the current data update of two prototype vectors w+, w- : competition, direction of update etc. Dynamics of on-line training sequence of new, independent random examples drawn according to example: LVQ1, original formulation [Kohonen] Winner-Takes-All (WTA) algorithm

  10. 1. description in terms of a few characteristic quantitities random vector according to : avg. length ( here: ℝ2N  ℝ7) length and relative position of prototypes projections into the (B+, B-)-plane 2. average over the current example correlated Gaussian random quantities in the thermodynamic limit N   completely specified in terms of first and second moments (w/o indices μ): Mathematical analysis of the learning dynamics algorithm  recursions

  11. (mean and variance) • computer simulations (LVQ1) • mean results approach • theoretical prediction • - variance vanishes as N   of characteristic quantities R++ (α=10) 3. self-averaging property - depend on the random sequence of example data - their fluctuations vanish with N   learning dynamics is completely described in terms of averages  averaged recursionsclosed in 1/N

  12. 4. continuous learning time stochastic recursions  deterministic ODE integration yields evolution of projections 5. learning curve probability for misclassification of a novel example # of examples # of learning steps per degree of freedom  generalization error εg(α)after training with α N examples

  13. initialization ws(0)=0 Q-- R++ (α=10) Q++ RSσ Q+- 1/N α theory and simulation(N=100) p+=0.8, v+=4, p+=9, ℓ=2.0, =1.0 averaged over 100 indep. runs self-averaging property (mean and variances) LVQ1: The winner takes it all only the winner is updated according to the class label 1 winner ws

  14. initializationws(0)≈0 Q-- w+ ℓ B- w- Q++ RSσ ℓ B+ Q+- w+ α theory and simulation(N=100) p+=0.8, v+=4, v+=9, ℓ=2.0, =1.0 averaged over100indep. runs LVQ1: The winner takes it all only the winner is updated according to the class label 1 winnerws RS- RS+ Trajectories in the(B+,B- )-plane (•)=20,40,....140 ....... optimal decision boundary ____ asymptotic position

  15. η= 2.0 1.0 0.2  η Learning curve • suboptimal, non-monotonic • behavior for small η εg p+ = 0.2, ℓ=1.0 v+= v- = 1.0 - stationary state: εg (α∞) grows linearly withη -well-defined asymptotics: η 0, α∞, (ηα ) ∞ achievable generalization error: εg εg v+= v- =1.0 v+ =0.25 v-=0.81 .... best linear boundary ― LVQ1 p+ p+

  16. problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification for α∞: εg = min { p+,p- } theory and simulation (N=100) p+=0.8, ℓ=1, v+=v-=1, =0.5 averages over 100 independent runs LVQ 2.1 [Kohonen] here:update correct and wrong winner RS- RS+

  17. εg η= 2.0, 1.0, 0.5  η suggested strategy: selection of data in a window close to the current decision boundary slows down the repulsion, system remains instable Early stopping: end training process at minimal εg (idealized) • pronounced minimum in εg (α) • depends on initialization and • cluster geometry • here: lowest minimum • value reached for η0 εg v+ =0.25 v-=0.81 ―LVQ1 __early stopping p+

  18. Learning curves: εg p+=0.8, ℓ=3.0 v+=4.0, v-=9.0 η= 2.0, 1.0, 0.5  η-independent asymptotic εg Learning From Mistakes (LFM) LVQ2.1 update only if the current classification is wrong crisp limit version of Soft Robust LVQ [Seo and Obermayer, 2003] projected trajetory: RS- ℓ B- ℓ B+ RS+ p+=0.8, ℓ= 1.2, v+=v-=1.0

  19. Comparison: achievable generalization ability v+=v-=1.0 v+=0.25 v-=0.81 equal cluster variances unequal variances εg p+ p+ ..... best linear boundary ―LVQ1 --- LVQ2.1 (early stopping) ·-·LFM ―trivial classification

  20. competitive learning ws winner numerical integration for ws(0)≈0 ( p+=0.2, ℓ=1.0, =1.2 ) R-- system is invariant under exchange of the prototypes  weakly repulsive fixed points εg VQ 1.0 R++ R+- LVQ+ LVQ1 R-+ 0 α α 200 100 α 0 300 Vector Quantization class membership is unknown or identical for all data

  21. εg asymptotics (,0, ) p+≈0 p-≈1 p+ - low quantization error - high gen. error εg interpretations: • VQ, unsupervised learning • unlabelled data • LVQ, two prototypes of the • same class, identical labels • LVQ, different classes, but • labels are not used in training

  22. Summary • a model scenario of LVQ training • two clusters, two prototypes • dynamics of online training • comparison of algorithms (within the model): • LVQ 1 : original formulation of LVQ • with close to optimal asymptotic generalization • LVQ 2.1.: intuitive extension creates instability • trivial (stationary) classification • ...+ stopping: potentially good performance • practical difficulties, depends on initialization • LFM : crisp limit of Soft Robust LVQ, stable behavior • far from optimal generalization • VQ : description of in-class competition

  23. Generalized Relevance LVQ [e.g. Hammer & Villmann] • adaptive metrics, e.g. distance measure training neighborhood preserving SOM Neural Gas (distance rank based) Outlook • multi-class, multi-prototype problems • optimized procedures: learning rate schedules • variational approach / Bayes optimal on-line • Self-Organizing Maps (SOM) • applications

More Related