1 / 30

Unsupervised recurrent networks

Unsupervised recurrent networks. Barbara Hammer, Institute of Informatics, Clausthal University of Technology. Clausthal - Zellerfeld. Brocken. Prototype-based clustering …. Prototype based clustering. data contained in a real-vector space

bob
Download Presentation

Unsupervised recurrent networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised recurrent networks Barbara Hammer, Institute of Informatics, Clausthal University of Technology

  2. Clausthal - Zellerfeld Brocken

  3. Prototype-based clustering …

  4. Prototype based clustering • data contained in a real-vector space • prototypes characterized by locations in the data space • clustering induced by the receptive fields based on the euclidean metric

  5. Vector quantization • init prototypes • repeat • present a data point • adapt the winner into the direction of the data point

  6. Cost function • minimizes the cost function • online: stochastic gradient descent 

  7. wj wj Neighborhood cooperation Self-Organizing Map: regular lattice Neural gas: data optimum topology j=(j1,j2)  

  8. Clustering recurrent data …

  9. Old models…

  10. Old models Temporal Kohonen Map: leaky integration x1,x2,x3,x4,…,xt, … d(xt,wi) = |xt-wi| + α·d(xt-1,wi) training: wi  xt Recurrent SOM: d(xt,wi) = |yt| where yt = (xt-wi) + α·yt-1 training: wi  yt

  11. Our model…

  12. Merge neural gas/SOM explicit temporal context xt-1,xt-2,…,x0 xt,xt-1,xt-2,…,x0 xt (w,c) |xt – w|2 |Ct - c|2 merge-context:: content of the winner Ct training: w  xt c  Ct

  13. (wj,cj) in ℝnxn Merge neural gas/SOM • explicit context, global recurrence • wj : represents entry xt • cj: repesents the context which equals the winner content of the last time step • distance: d(xt,wj) = α·|xt-wj| + (1-α)·|Ct-cj| where Ct = γ·wI(t-1) + (1-γ)·cI(t-1), I(t-1) winner in step t-1 (merge) • trainingwj xt, cj Ct

  14. Merge neural gas/SOM Example: 42  33 33 34 C1 = (42 + 50)/2 = 46 C2= (33+45)/2 = 39 C3= (33+38)/2 = 35.5

  15. Merge neural gas/SOM • speaker identification, japanese vovel ‘ae’ [UCI-KDD archive] • 9 speakers, 30 articulations each time 12-dim. cepstrum MNG, 150 neurons: 2.7% test error MNG, 1000 neurons: 1.6% test error rule based: 5.9%, HMM: 3.8%

  16. Merge neural gas/SOM Experiment: • classification of donor sites for C.elegans • 5 settings with 10000 training data, 10000 test data, 50 nucleotides TCGA embedded in 3 dim, 38% donor [Sonnenburg, Rätsch et al.] • MNG with posterior labeling • 512 neurons, γ=0.25, η=0.075, α: 0.999  [0.4,0.7] • 14.06%±0.66% training error, 14.26%±0.39% test error • sparse representation: 512 · 6 dim

  17. Merge neural gas/SOM Theorem – context representation: Assume • a map with merge context is given (no neighborhood) • a sequence x0, x1, x2, x3,… is given • enough neurons are available Then: • the optimum weight/context pair for xt is w = xt, c = ∑i=0..t-1 γ(1-γ)t-i-1·xi • Hebbian training converges to this setting as a stable fixed point • Compare to TKM: • optimum weights are w = ∑i=0..t (1-α)i·xt-i / ∑i=0..t (1-α)i • but: no fixed point for TKM • MSOM is the correct implementation of TKM

  18. More models…

  19. More models what is the correct temporal context ? xt,xt-1,xt-2,…,x0 (w,c) |xt – w|2 xt |Ct - c|2 Context: RSOM/TKM – neuron itself MSOM – winner content SOMSD – winner index RecSOM – all activations Ct training: w  xt c  Ct xt-1,xt-2,…,x0

  20. More models * for normalised WTA context

  21. More models Experiment: • Mackey-Glass time series • 100 neurons • different lattices • different contexts • evaluation by the temporal quantization error: average(mean activity k steps into the past - observed activity k steps into the past)2

  22. More models SOM quantization error RSOM NG RecSOM SOMSD HSOMSD MNG now past

  23. So what?

  24. So what? • inspection / clustering of high-dimensional events within their temporal context could be possible • strong regularization as for standard SOM / NG • possible training methods for reservoirs • some theory • some examples • no supervision • the representation of context is critical and not clear at all • training is critical and not clear at all

More Related