1 / 56

Self-organizing learning for non-standard data

Self-organizing learning for non-standard data. Barbara Hammer, AG LNM, Universität Osnabrück, Germany and colleagues: Thorsten Bojer, Marc Strickert (Uni Osnabrück), Thomas Villmann (Uni Leipzig), Alessandro Sperduti (Uni Padova), Alessio Micheli (Uni Pisa). Outline. Non-standard data

psyche
Download Presentation

Self-organizing learning for non-standard data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-organizing learning for non-standard data Barbara Hammer, AG LNM, Universität Osnabrück, Germany and colleagues: Thorsten Bojer, Marc Strickert (Uni Osnabrück), Thomas Villmann (Uni Leipzig), Alessandro Sperduti (Uni Padova), Alessio Micheli (Uni Pisa) Ruhr-Universität Bochum

  2. Outline • Non-standard data • Self-organization • GRLVQ • Recursive SOMs • Conclusions Ruhr-Universität Bochum

  3. Non-standard data… Ruhr-Universität Bochum

  4. Standard data… mushroom dataset real vectors: (x1,x2,…,xn) with euclidean metric, standard dot product wine data … and many other Ruhr-Universität Bochum

  5. Standard data … … occur in real life problems … ouch!!! rrrrrringgggg!!! Ruhr-Universität Bochum

  6. Standard data… labeling of the training set 1 training 0 feature encoding: prediction: shape apple/pear softness In this case, we’re happy with standard data Ruhr-Universität Bochum

  7. tree structures paragraphs of BGB = text sequences laser sensor spectrum = functional data smashed apple = high-dimensional, unscaled vector forgotten fruits = set hair = DNA-sequence foot print = graph structure Non-standard data… Ruhr-Universität Bochum

  8. process the basic constituents separately and integrate the information using recurrence encode ( high dimensional vector/loss of information) and use euclidean metric/dot product ( not appropriate) process structures as a whole using a ‚better‘ dot product or metric Non-standard data… sets, functions, sequences, tree structures, graph structures, … a priori unlimited number of basic constituents often described by real vectors relations which include important information  recursive SOM  GRLVQ Ruhr-Universität Bochum

  9. Self-organization… Ruhr-Universität Bochum

  10. Self-organization… intuitive paradigms: prototypes; competition, winner takes all; neighborhood, cooperation elastic net [Willshaw, von der Malsburg] self-organizing map, learning vector quantization [Kohonen] neural gas [Martinetz, Berkovich, Schulten] generative topographic mapping [Bishop, Svensen] adaptive resonance theory [Carpenter, Grossberg] … Ruhr-Universität Bochum

  11. Self-organization... Learning Vector Quantization(LVQ) [Kohonen]: supervised prototype-based classification given by prototypes (wi,c(wi)) ∈ ℝn x {1,…,m} winner-takes-all classification, x  c(wi) s.t. |x-wi| minimal Hebbian Learning, given examples (xi,c(xi)) i.e. adapt the winner wj by wj ±= η·(xi-wj) Ruhr-Universität Bochum

  12. wi Self-organization… Self-Organizing Map(SOM) [Kohonen]: unsupervised topology preserving mapping given by prototypes wi∈ ℝn with neighborhood given by a lattice i=(i1,i2) visualizes data by a topology representing map; f: ℝn  I, x  i where |x-wi| minimal x Hebbian learning, given examples (xi) i.e. adapt all neurons corresponding to the distance from the winner wj := wj + η·exp(-|j-jw|/σ2) (xi-wj) i : |x - wi| minimal   Ruhr-Universität Bochum

  13. GRLVQ … Ruhr-Universität Bochum

  14. GRLVQ... basic situation: structures are processed as a whole, use feature encoding LVQ heavily relies on the euclidean metric, not appropriate for high-dimensional feature encoding Ruhr-Universität Bochum

  15. substitute the euclidean metric by a metric with adaptive relevance terms: adapt the relevance terms with Hebbian learning: GRLVQ… + normalize RLVQ Ruhr-Universität Bochum

  16. GRLVQ… squared weighted Euclidean distance to closest correct/incorrect prototype minimize i.e.  GRLVQ + neighborhood cooperation, arbitrary adaptive differentiable similarity measure Ruhr-Universität Bochum

  17. GRLVQ… noise: 1+N(0.05), 1+N(0.1),1+N(0.2),1+N(0.5),U(0.5),U(0.2),N(0.5),N(0.2) Ruhr-Universität Bochum

  18. GRLVQ… • online-detection of faults for piston-engines thanks: PROGNOST Ruhr-Universität Bochum

  19. GRLVQ… detection based on heterogeneous data: time dependent signals from sensors measuring pressure and oscillation, process characteristics, characteristics of the pV diagramm, … sensors Ruhr-Universität Bochum

  20. adaptive fixed GRLVQ… data: • ca. 30 time series with 36 entries per series • ca. 20 values from a time interval • ca. 40 global features ca. 15 classes, ca. 100 training patterns similarity measure: Ruhr-Universität Bochum

  21. GRLVQ… • splicing for higher eucariotes: copy of DNA branch site A64G73G100T100G62A68G84T63 C65A100G100 reading frames 18-40 bp pyrimidines, i.e. T,C donor acceptor • ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC no yes Ruhr-Universität Bochum

  22. GRLVQ… • C.elegans (Sonneburg et al.): only acceptor/decoys, 1000/10000 training examples, 10000 test examples, window size 50, decoys are close to acceptors • GRLVQ with few (5 per class) prototypes • LIK-similarity local correlations Ruhr-Universität Bochum

  23. GRLVQ… C.elegans: .. GRLVQ yields sparser solutions  .. we can derive dimensionality independent large margin generalization bounds for GRLVQ with adaptive metric comparable to SVM-bounds  Ruhr-Universität Bochum

  24. Recursive SOMs … Ruhr-Universität Bochum

  25. Recursive SOMs… basic situation: sequences should be processed recursively, step by step using a SOM for comparison: supervised recurrent networks for sequences ... frec([x1,x2,x3,...])= f(x1,frec([x2,x3,..])) … similar ideas for SOM? Ruhr-Universität Bochum

  26. wi Recursive SOMs… Temporal Kohonen Map[Chappell,Taylor]: (similar RSOM – Varsta/Milan/Heikkonen) i=(i1,i2) standard Hebbian learning for wi sequence x1,x2,x3,x4,… i1: d1(i) := |x1 - wi|2 minimal i2: d2(i) := |x2 - wi|2 + α·d1(i) minimal leaky integration: |x3 - wi|2 + α·|x2 - wi|2 + α2·|x1 - wi|2 i3: d3(i) := |x3 - wi|2 + α·d2(i) minimal recurrence Ruhr-Universität Bochum

  27. Recursive SOMs… Hebbian learning for wi, ci SOMSD[Hagenbuchner,Sperduti,Tsoi]: i=(i1,i2) (wi,ci) with ci in ℝ2 x1,x2,x3,x4,… i1: |x1 – wi |2 minimal i2: |x2 – wi |2 + α·|i1 – c i|2 minimal i3: |x3 – wi |2 + α·|i2 – ci |2 minimal recurrence Ruhr-Universität Bochum

  28. Recursive SOMs… Example: 42  33 33 ... (1,1) (1,1) (1,3) (2,3) (3,3) (3,1) Ruhr-Universität Bochum

  29. Recursive SOMs… Recursive SOM[Voegtlin]: Hebbian learning for wi and ci i=(i1,i2) (wi,ci) with ci in ℝN sequence x1,x2,x3,x4,… i1: d1(i) := |x1 - wi|2 minimal i2: d2(i) := |x2 - wi|2 + α·|(exp(-d1(1)),…,exp(-d1(N))) - ci|2 minimal i3: d3(i) := |x3 - wi|2 + α·|(exp(-d2(1)),…,exp(-d2(N))) - ci|2minimal recurrence Ruhr-Universität Bochum

  30. Recursive SOMs [x1|[x2…]] [x2,…] x1 (w,c) |w – x1|2 |rep([x2,…]) - c|2 rep([x2,…]) context Hebbian learning for w and c Ruhr-Universität Bochum

  31. Recursive SOMs… context model winner content MSOM N n arbitrary winner content: (w,c) (w,c) dimensionality too high merge: (1-γ)∙w + γ∙c Ruhr-Universität Bochum

  32. Recursive SOMs… MSOM[Strickert/Hammer]: Hebbian learning for wi and ci i=(i1,i2) (wi,ci) with ci in ℝN sequence x1,x2,x3,x4,… i1: d1(i) := |x1 - wi|2 minimal i2: d2(i) := |x2 - wi|2 + α·| ((1-γ)∙wi1 + γ∙ci1) - ci|2 minimal i3: d3(i) := |x3 - wi|2 + α·| ((1-γ)∙wi2 + γ∙ci2) - ci|2minimal recurrence Ruhr-Universität Bochum

  33. Recursive SOMs… McKay-Glass time series .. train different context and lattice models with 100 neurons each … .. measure the quantization error for the present and past steps .. Ruhr-Universität Bochum

  34. SOM RSOM NG RecSOM SOMSD MNG Recursive SOMs …. .2 .15 .1 .05 0 0 5 10 15 20 25 30 Index of past inputs (index 0: present) past present Ruhr-Universität Bochum

  35. Recursive SOMs… SOMSD MNG Ruhr-Universität Bochum

  36. 12-dim. cepstrum coefficient vectors cepstrum vectors per articulation: 7 … 29. Recursive SOMs Speaker identification: exemplary patterns of » Æ « articulations from different speakers Ruhr-Universität Bochum

  37. Recursive SOMs 9 speakers, 30 training examples per speaker, some test examples  UCI Classification results for MNG Errors for 150 neurons  Training set: 0 % (all neurons used)  Test set: 2.7 % Errors for 1000 neurons  Training set: 0 % (21 idle neurons)  Test set: 1.6 % Reference error: 5.9 % (supervised, rule based, Kudo et al. 1999) Ruhr-Universität Bochum

  38. Recursive SOMs… extension to tree structures: for comparison: supervised recursive networks for trees frec(a(t1,t2))= f(a,frec(t1),frec(t2)) Ruhr-Universität Bochum

  39. Recursive SOMs… a(t,t’) t a t’ (w,r,r’) |w – a|2 |rep(t) - r|2 |rep(t’) – r’|2 rep(t) rep(t’) Ruhr-Universität Bochum

  40. efficient, can represent every finite set, can represent tree automata, Hebbian learning has Markovian flavor principled limitations fractal encoding of sequences, but not a fixed point of Hebbian training (only RSOM), mixing of representation and learning fractal encoding is a stable fixed point of Hebbian learning, but commutativity of children for a tree not efficient Recursive SOMs… Ruhr-Universität Bochum

  41. Conclusions … Ruhr-Universität Bochum

  42. Conclusions… • GRLVQ = LVQ + adaptive metric + cost function  with diagonal metric: for high-dimensional and heterogeneous data, includes margin optimization  with general similarity measure: more general structures possible • Recursive SOMs: different models depending on the choice of context  nice models for time series processing (MSOM, SOMSD)  one model for tree structures (SOMSD)  some theory available Ruhr-Universität Bochum

  43. Ruhr-Universität Bochum

  44. End of slide show, click to exit.. Ruhr-Universität Bochum

  45. Ruhr-Universität Bochum

  46. Advanced Ruhr-Universität Bochum

  47. Advanced The SRNG cost function can be formulated for arbitrary adaptive differentiable distance measures e.g. … alternative exponents … shift invariance … local correlations for time series Ruhr-Universität Bochum

  48. Experiments IPsplice: Ruhr-Universität Bochum

  49. algorithm GRLVQ… F := binary function class given by GRLVQ with p prototypes (xi,yi)i=1..m training data, i.i.d. w.r.t. Pf in F Goal: EP(f) := P(y≠f(x)) should be small Learning theory:EP(f) ≤ |{ i | yi≠f(xi)}|/m + structural risk empirical error, optimized during training amount of surprise possible in the function class Ruhr-Universität Bochum

  50. GRLVQ… It holds for GRLVQ: EP(f) ≤ |{ i | yi ≠ f(xi)}|/m + Ʃ0<Mf(xi)<ρ(1-Mf(xi)/ρ)/m + O(p2(B3+(ln 1/δ)1/2)/(ρm1/2)) whereby Mf(xi) := - dλ+(xi)+ dλ-(xi) is the margin (= security of classification) • dimension independent large-margin bound! GRLVQ optimizes the margin: training error correct points with small margin bound depending on m = number of data p = number of prototypes, B = support, δ = confidence ρ = margin Ruhr-Universität Bochum

More Related