300 likes | 460 Views
Unsupervised recurrent networks. Barbara Hammer, Institute of Informatics, Clausthal University of Technology. Clausthal - Zellerfeld. Brocken. Prototype-based clustering …. Prototype based clustering. data contained in a real-vector space
E N D
Unsupervised recurrent networks Barbara Hammer, Institute of Informatics, Clausthal University of Technology
Clausthal - Zellerfeld Brocken
Prototype based clustering • data contained in a real-vector space • prototypes characterized by locations in the data space • clustering induced by the receptive fields based on the euclidean metric
Vector quantization • init prototypes • repeat • present a data point • adapt the winner into the direction of the data point
Cost function • minimizes the cost function • online: stochastic gradient descent
wj wj Neighborhood cooperation Self-Organizing Map: regular lattice Neural gas: data optimum topology j=(j1,j2)
Old models Temporal Kohonen Map: leaky integration x1,x2,x3,x4,…,xt, … d(xt,wi) = |xt-wi| + α·d(xt-1,wi) training: wi xt Recurrent SOM: d(xt,wi) = |yt| where yt = (xt-wi) + α·yt-1 training: wi yt
Merge neural gas/SOM explicit temporal context xt-1,xt-2,…,x0 xt,xt-1,xt-2,…,x0 xt (w,c) |xt – w|2 |Ct - c|2 merge-context:: content of the winner Ct training: w xt c Ct
(wj,cj) in ℝnxn Merge neural gas/SOM • explicit context, global recurrence • wj : represents entry xt • cj: repesents the context which equals the winner content of the last time step • distance: d(xt,wj) = α·|xt-wj| + (1-α)·|Ct-cj| where Ct = γ·wI(t-1) + (1-γ)·cI(t-1), I(t-1) winner in step t-1 (merge) • trainingwj xt, cj Ct
Merge neural gas/SOM Example: 42 33 33 34 C1 = (42 + 50)/2 = 46 C2= (33+45)/2 = 39 C3= (33+38)/2 = 35.5
Merge neural gas/SOM • speaker identification, japanese vovel ‘ae’ [UCI-KDD archive] • 9 speakers, 30 articulations each time 12-dim. cepstrum MNG, 150 neurons: 2.7% test error MNG, 1000 neurons: 1.6% test error rule based: 5.9%, HMM: 3.8%
Merge neural gas/SOM Experiment: • classification of donor sites for C.elegans • 5 settings with 10000 training data, 10000 test data, 50 nucleotides TCGA embedded in 3 dim, 38% donor [Sonnenburg, Rätsch et al.] • MNG with posterior labeling • 512 neurons, γ=0.25, η=0.075, α: 0.999 [0.4,0.7] • 14.06%±0.66% training error, 14.26%±0.39% test error • sparse representation: 512 · 6 dim
Merge neural gas/SOM Theorem – context representation: Assume • a map with merge context is given (no neighborhood) • a sequence x0, x1, x2, x3,… is given • enough neurons are available Then: • the optimum weight/context pair for xt is w = xt, c = ∑i=0..t-1 γ(1-γ)t-i-1·xi • Hebbian training converges to this setting as a stable fixed point • Compare to TKM: • optimum weights are w = ∑i=0..t (1-α)i·xt-i / ∑i=0..t (1-α)i • but: no fixed point for TKM • MSOM is the correct implementation of TKM
More models what is the correct temporal context ? xt,xt-1,xt-2,…,x0 (w,c) |xt – w|2 xt |Ct - c|2 Context: RSOM/TKM – neuron itself MSOM – winner content SOMSD – winner index RecSOM – all activations Ct training: w xt c Ct xt-1,xt-2,…,x0
More models * for normalised WTA context
More models Experiment: • Mackey-Glass time series • 100 neurons • different lattices • different contexts • evaluation by the temporal quantization error: average(mean activity k steps into the past - observed activity k steps into the past)2
More models SOM quantization error RSOM NG RecSOM SOMSD HSOMSD MNG now past
So what? • inspection / clustering of high-dimensional events within their temporal context could be possible • strong regularization as for standard SOM / NG • possible training methods for reservoirs • some theory • some examples • no supervision • the representation of context is critical and not clear at all • training is critical and not clear at all