570 likes | 675 Views
Self-organizing learning for non-standard data. Barbara Hammer, AG LNM, Universität Osnabrück, Germany and colleagues: Thorsten Bojer, Marc Strickert (Uni Osnabrück), Thomas Villmann (Uni Leipzig), Alessandro Sperduti (Uni Padova), Alessio Micheli (Uni Pisa). Outline. Non-standard data
E N D
Self-organizing learning for non-standard data Barbara Hammer, AG LNM, Universität Osnabrück, Germany and colleagues: Thorsten Bojer, Marc Strickert (Uni Osnabrück), Thomas Villmann (Uni Leipzig), Alessandro Sperduti (Uni Padova), Alessio Micheli (Uni Pisa) Ruhr-Universität Bochum
Outline • Non-standard data • Self-organization • GRLVQ • Recursive SOMs • Conclusions Ruhr-Universität Bochum
Non-standard data… Ruhr-Universität Bochum
Standard data… mushroom dataset real vectors: (x1,x2,…,xn) with euclidean metric, standard dot product wine data … and many other Ruhr-Universität Bochum
Standard data … … occur in real life problems … ouch!!! rrrrrringgggg!!! Ruhr-Universität Bochum
Standard data… labeling of the training set 1 training 0 feature encoding: prediction: shape apple/pear softness In this case, we’re happy with standard data Ruhr-Universität Bochum
tree structures paragraphs of BGB = text sequences laser sensor spectrum = functional data smashed apple = high-dimensional, unscaled vector forgotten fruits = set hair = DNA-sequence foot print = graph structure Non-standard data… Ruhr-Universität Bochum
process the basic constituents separately and integrate the information using recurrence encode ( high dimensional vector/loss of information) and use euclidean metric/dot product ( not appropriate) process structures as a whole using a ‚better‘ dot product or metric Non-standard data… sets, functions, sequences, tree structures, graph structures, … a priori unlimited number of basic constituents often described by real vectors relations which include important information recursive SOM GRLVQ Ruhr-Universität Bochum
Self-organization… Ruhr-Universität Bochum
Self-organization… intuitive paradigms: prototypes; competition, winner takes all; neighborhood, cooperation elastic net [Willshaw, von der Malsburg] self-organizing map, learning vector quantization [Kohonen] neural gas [Martinetz, Berkovich, Schulten] generative topographic mapping [Bishop, Svensen] adaptive resonance theory [Carpenter, Grossberg] … Ruhr-Universität Bochum
Self-organization... Learning Vector Quantization(LVQ) [Kohonen]: supervised prototype-based classification given by prototypes (wi,c(wi)) ∈ ℝn x {1,…,m} winner-takes-all classification, x c(wi) s.t. |x-wi| minimal Hebbian Learning, given examples (xi,c(xi)) i.e. adapt the winner wj by wj ±= η·(xi-wj) Ruhr-Universität Bochum
wi Self-organization… Self-Organizing Map(SOM) [Kohonen]: unsupervised topology preserving mapping given by prototypes wi∈ ℝn with neighborhood given by a lattice i=(i1,i2) visualizes data by a topology representing map; f: ℝn I, x i where |x-wi| minimal x Hebbian learning, given examples (xi) i.e. adapt all neurons corresponding to the distance from the winner wj := wj + η·exp(-|j-jw|/σ2) (xi-wj) i : |x - wi| minimal Ruhr-Universität Bochum
GRLVQ … Ruhr-Universität Bochum
GRLVQ... basic situation: structures are processed as a whole, use feature encoding LVQ heavily relies on the euclidean metric, not appropriate for high-dimensional feature encoding Ruhr-Universität Bochum
substitute the euclidean metric by a metric with adaptive relevance terms: adapt the relevance terms with Hebbian learning: GRLVQ… + normalize RLVQ Ruhr-Universität Bochum
GRLVQ… squared weighted Euclidean distance to closest correct/incorrect prototype minimize i.e. GRLVQ + neighborhood cooperation, arbitrary adaptive differentiable similarity measure Ruhr-Universität Bochum
GRLVQ… noise: 1+N(0.05), 1+N(0.1),1+N(0.2),1+N(0.5),U(0.5),U(0.2),N(0.5),N(0.2) Ruhr-Universität Bochum
GRLVQ… • online-detection of faults for piston-engines thanks: PROGNOST Ruhr-Universität Bochum
GRLVQ… detection based on heterogeneous data: time dependent signals from sensors measuring pressure and oscillation, process characteristics, characteristics of the pV diagramm, … sensors Ruhr-Universität Bochum
adaptive fixed GRLVQ… data: • ca. 30 time series with 36 entries per series • ca. 20 values from a time interval • ca. 40 global features ca. 15 classes, ca. 100 training patterns similarity measure: Ruhr-Universität Bochum
GRLVQ… • splicing for higher eucariotes: copy of DNA branch site A64G73G100T100G62A68G84T63 C65A100G100 reading frames 18-40 bp pyrimidines, i.e. T,C donor acceptor • ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC no yes Ruhr-Universität Bochum
GRLVQ… • C.elegans (Sonneburg et al.): only acceptor/decoys, 1000/10000 training examples, 10000 test examples, window size 50, decoys are close to acceptors • GRLVQ with few (5 per class) prototypes • LIK-similarity local correlations Ruhr-Universität Bochum
GRLVQ… C.elegans: .. GRLVQ yields sparser solutions .. we can derive dimensionality independent large margin generalization bounds for GRLVQ with adaptive metric comparable to SVM-bounds Ruhr-Universität Bochum
Recursive SOMs … Ruhr-Universität Bochum
Recursive SOMs… basic situation: sequences should be processed recursively, step by step using a SOM for comparison: supervised recurrent networks for sequences ... frec([x1,x2,x3,...])= f(x1,frec([x2,x3,..])) … similar ideas for SOM? Ruhr-Universität Bochum
wi Recursive SOMs… Temporal Kohonen Map[Chappell,Taylor]: (similar RSOM – Varsta/Milan/Heikkonen) i=(i1,i2) standard Hebbian learning for wi sequence x1,x2,x3,x4,… i1: d1(i) := |x1 - wi|2 minimal i2: d2(i) := |x2 - wi|2 + α·d1(i) minimal leaky integration: |x3 - wi|2 + α·|x2 - wi|2 + α2·|x1 - wi|2 i3: d3(i) := |x3 - wi|2 + α·d2(i) minimal recurrence Ruhr-Universität Bochum
Recursive SOMs… Hebbian learning for wi, ci SOMSD[Hagenbuchner,Sperduti,Tsoi]: i=(i1,i2) (wi,ci) with ci in ℝ2 x1,x2,x3,x4,… i1: |x1 – wi |2 minimal i2: |x2 – wi |2 + α·|i1 – c i|2 minimal i3: |x3 – wi |2 + α·|i2 – ci |2 minimal recurrence Ruhr-Universität Bochum
Recursive SOMs… Example: 42 33 33 ... (1,1) (1,1) (1,3) (2,3) (3,3) (3,1) Ruhr-Universität Bochum
Recursive SOMs… Recursive SOM[Voegtlin]: Hebbian learning for wi and ci i=(i1,i2) (wi,ci) with ci in ℝN sequence x1,x2,x3,x4,… i1: d1(i) := |x1 - wi|2 minimal i2: d2(i) := |x2 - wi|2 + α·|(exp(-d1(1)),…,exp(-d1(N))) - ci|2 minimal i3: d3(i) := |x3 - wi|2 + α·|(exp(-d2(1)),…,exp(-d2(N))) - ci|2minimal recurrence Ruhr-Universität Bochum
Recursive SOMs [x1|[x2…]] [x2,…] x1 (w,c) |w – x1|2 |rep([x2,…]) - c|2 rep([x2,…]) context Hebbian learning for w and c Ruhr-Universität Bochum
Recursive SOMs… context model winner content MSOM N n arbitrary winner content: (w,c) (w,c) dimensionality too high merge: (1-γ)∙w + γ∙c Ruhr-Universität Bochum
Recursive SOMs… MSOM[Strickert/Hammer]: Hebbian learning for wi and ci i=(i1,i2) (wi,ci) with ci in ℝN sequence x1,x2,x3,x4,… i1: d1(i) := |x1 - wi|2 minimal i2: d2(i) := |x2 - wi|2 + α·| ((1-γ)∙wi1 + γ∙ci1) - ci|2 minimal i3: d3(i) := |x3 - wi|2 + α·| ((1-γ)∙wi2 + γ∙ci2) - ci|2minimal recurrence Ruhr-Universität Bochum
Recursive SOMs… McKay-Glass time series .. train different context and lattice models with 100 neurons each … .. measure the quantization error for the present and past steps .. Ruhr-Universität Bochum
SOM RSOM NG RecSOM SOMSD MNG Recursive SOMs …. .2 .15 .1 .05 0 0 5 10 15 20 25 30 Index of past inputs (index 0: present) past present Ruhr-Universität Bochum
Recursive SOMs… SOMSD MNG Ruhr-Universität Bochum
12-dim. cepstrum coefficient vectors cepstrum vectors per articulation: 7 … 29. Recursive SOMs Speaker identification: exemplary patterns of » Æ « articulations from different speakers Ruhr-Universität Bochum
Recursive SOMs 9 speakers, 30 training examples per speaker, some test examples UCI Classification results for MNG Errors for 150 neurons Training set: 0 % (all neurons used) Test set: 2.7 % Errors for 1000 neurons Training set: 0 % (21 idle neurons) Test set: 1.6 % Reference error: 5.9 % (supervised, rule based, Kudo et al. 1999) Ruhr-Universität Bochum
Recursive SOMs… extension to tree structures: for comparison: supervised recursive networks for trees frec(a(t1,t2))= f(a,frec(t1),frec(t2)) Ruhr-Universität Bochum
Recursive SOMs… a(t,t’) t a t’ (w,r,r’) |w – a|2 |rep(t) - r|2 |rep(t’) – r’|2 rep(t) rep(t’) Ruhr-Universität Bochum
efficient, can represent every finite set, can represent tree automata, Hebbian learning has Markovian flavor principled limitations fractal encoding of sequences, but not a fixed point of Hebbian training (only RSOM), mixing of representation and learning fractal encoding is a stable fixed point of Hebbian learning, but commutativity of children for a tree not efficient Recursive SOMs… Ruhr-Universität Bochum
Conclusions … Ruhr-Universität Bochum
Conclusions… • GRLVQ = LVQ + adaptive metric + cost function with diagonal metric: for high-dimensional and heterogeneous data, includes margin optimization with general similarity measure: more general structures possible • Recursive SOMs: different models depending on the choice of context nice models for time series processing (MSOM, SOMSD) one model for tree structures (SOMSD) some theory available Ruhr-Universität Bochum
End of slide show, click to exit.. Ruhr-Universität Bochum
Advanced Ruhr-Universität Bochum
Advanced The SRNG cost function can be formulated for arbitrary adaptive differentiable distance measures e.g. … alternative exponents … shift invariance … local correlations for time series Ruhr-Universität Bochum
Experiments IPsplice: Ruhr-Universität Bochum
algorithm GRLVQ… F := binary function class given by GRLVQ with p prototypes (xi,yi)i=1..m training data, i.i.d. w.r.t. Pf in F Goal: EP(f) := P(y≠f(x)) should be small Learning theory:EP(f) ≤ |{ i | yi≠f(xi)}|/m + structural risk empirical error, optimized during training amount of surprise possible in the function class Ruhr-Universität Bochum
GRLVQ… It holds for GRLVQ: EP(f) ≤ |{ i | yi ≠ f(xi)}|/m + Ʃ0<Mf(xi)<ρ(1-Mf(xi)/ρ)/m + O(p2(B3+(ln 1/δ)1/2)/(ρm1/2)) whereby Mf(xi) := - dλ+(xi)+ dλ-(xi) is the margin (= security of classification) • dimension independent large-margin bound! GRLVQ optimizes the margin: training error correct points with small margin bound depending on m = number of data p = number of prototypes, B = support, δ = confidence ρ = margin Ruhr-Universität Bochum