1 / 45

Relevance learning

Relevance learning. Barbara Hammer, AG LNM, Universität Osnabrück, Germany and coworkers: Thorsten Bojer, Marc Strickert, Thomas Villmann. Outline. LVQ Relevance learning Advanced Experiments Generalization ability Conclusions. LVQ …. LVQ.

mark-wood
Download Presentation

Relevance learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevance learning Barbara Hammer, AG LNM, Universität Osnabrück, Germany and coworkers: Thorsten Bojer, Marc Strickert, Thomas Villmann RU Groningen

  2. Outline • LVQ • Relevance learning • Advanced • Experiments • Generalization ability • Conclusions RU Groningen

  3. LVQ … RU Groningen

  4. LVQ Learning Vector Quantization(LVQ) [Kohonen]: supervised prototype-based classification given by prototypes (wi,c(wi)) ∈ ℝn x {1,…,m} winner-takes-all classification, x  c(wi) s.t. |x-wi| minimal Hebbian Learning, given examples (xi,c(xi)) i.e. adapt the winner wj by wj ±= η·(xi-wj) RU Groningen

  5. x2 x1 LVQ distinguish apples and pears: represented by ( Øx/Øy , hardness ) in ℝ2 RU Groningen

  6. LVQ cannot solve interesting problems: RU Groningen

  7. LVQ ... crucially depends on the Euclidean metric and is thus inappropriate for high-dimensional, heterogeneous, complex data ... is not stable for overlapping classes ... is very sensitive to initialization RU Groningen

  8. Relevance learning … RU Groningen

  9. substitute the Euclidean metric by a metric with adaptive relevance terms: adapt the relevance terms with Hebbian learning: Relevance learning RLVQ RU Groningen

  10. Advanced … RU Groningen

  11. I: stability… RU Groningen

  12. LVQ is a stochastic gradient descent on the cost function squared distance to closest correct/incorrect prototype where Advanced RLVQ uses the weighted Euclidean distance in f  RU Groningen

  13. GLVQ is a stochastic gradient descent on Advanced where [Sato/Yamada] GRLVQ uses the weighted Euclidean distance in f  RU Groningen

  14. Advanced squared weighted Euclidean distance to closest correct/incorrect prototype minimize i.e. RU Groningen

  15. Advanced noise: 1+N(0.05), 1+N(0.1),1+N(0.2),1+N(0.5),U(0.5),U(0.2),N(0.5),N(0.2) RU Groningen

  16. II: initialization… RU Groningen

  17. a stochastic gradient descent is highly sensitive to initialization for multimodal functions Advanced    squared weighted Euclidean distance to closest correct/incorrect prototype global (but unsupervised) update: Neural Gas (NG) [Martinetz] RU Groningen

  18. NG is a stochastic gradient descent on the cost function Advanced + GRLVQ SRNG minimizes the cost function ... i.e. all correct prototypes are adapted according to their rank RU Groningen

  19. Advanced RU Groningen

  20. III: greater flexibility… RU Groningen

  21. Advanced The SRNG cost function can be formulated for arbitrary adaptive differentiable distance measures e.g. … alternative exponents … shift invariance … local correlations for time series RU Groningen

  22. Experiments … RU Groningen

  23. I: time series prediction … RU Groningen

  24. discretization Experiments ? RU Groningen

  25. Experiments RU Groningen

  26. II: fault detection … RU Groningen

  27. Experiments • online-detection of faults for piston-engines thanks: PROGNOST RU Groningen

  28. Experiments detection based on heterogeneous data: time dependent signals from sensors measuring pressure and oscillation, process characteristics, characteristics of the pV diagramm, … sensors RU Groningen

  29. adaptive fixed Experiments data: • ca. 30 time series with 36 entries per series • ca. 20 values from a time interval • ca. 40 global features ca. 15 classes, ca. 100 training patterns similarity measure: RU Groningen

  30. III: splice site recognition… RU Groningen

  31. Experiments • splicing for higher eucariotes: copy of DNA branch site A64G73G100T100G62A68G84T63 C65A100G100 reading frames 18-40 bp pyrimidines, i.e. T,C donor acceptor • ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC no yes RU Groningen

  32. Experiments • IPsplice (UCI): human DNA, 3 classes, ca.3200 points, window size 60, old • C.elegans (Sonneburg et al.): only acceptor/decoys, 1000/10000 training examples, 10000 test examples, window size 50, decoys are close to acceptors • SRNG with few (8 resp. 5 per class) prototypes • LIK-similarity local correlations RU Groningen

  33. Experiments IPsplice: RU Groningen

  34. Experiments C.elegans: .. GRLVQ yields sparser solutions  RU Groningen

  35. Generalization ability … RU Groningen

  36. algorithm Generalization ability   F := binary function class given by GRLVQ with p prototypes (xi,yi)i=1..m training data, i.i.d. w.r.t. Pf in F Goal: EP(f) := P(y≠f(x)) should be small RU Groningen

  37. Generalization ability Goal: EP(f) := P(y≠f(x)) should be small Learning theorie:EP(f) ≤ |{ i | yi≠f(xi)}|/m + structural risk It holds for GRLVQ: EP(f) ≤ |{ i | yi ≠ f(xi)}|/m + Ʃ0<Mf(xi)<ρ(1-Mf(xi)/ρ)/m + O(p2(B3+(ln 1/δ)1/2)/(ρm1/2)) whereby Mf(xi) := - dλ+(xi)+ dλ-(xi) is the margin (= security of classification) • dimension independent large-margin bound! GRLVQ optimizes the margin: empirical error, optimized during training amount of surprise possible in the function class training error correct points with small margin bound depending on m = number of data p = number of prototypes, B = support, δ = confidence ρ = margin RU Groningen

  38. Conclusions … RU Groningen

  39. Conclusions • SRNG as generalization of LVQ with • adaptive diagonal metric  much more flexible (RLVQ) • cost function  stable (GRLVQ) • neighborhood cooperation  global (SRNG) • competitive to state of the art algorithms in various applications, thereby fast and simple • generalization bounds, training includes structural risk minimization RU Groningen

  40. RU Groningen

  41. RU Groningen

  42. LVQ Alternative Kodierung der Birnen/Äpfel: (Stengellänge,Anzahl Kerne,Farbe,Preis,Wurm) Problem: LVQ basiert auf der Euklidischen Metrik. RU Groningen

  43. Experiments • Lysimeter in St.Arnold (thanks: H.Lange) RU Groningen

  44. LVQ provides excellent generalization: [Crammer,Gilad-Bachrach,Navot,Tishby]: dimensionality inpependent large-margin generalization bound for LVQ RU Groningen

  45. Generalization ability • margin: Mf(x) := dλ+ - dλ- • empirical loss with margin: ELm(f,x) := |{yi ≠ f(xi) | i=1...m}| + ƩMf(xi)<ρ1 - Mf(xi)/ρ • then (using tricks from [Bartlett/Mendelson]): • EP(f)  ELm(f,x) + term(Gaussian complexity of F) • f = fixed Boolean formula (two-prototype classifier) • f = fixed Boolean formula (linear classifier) •  bound of order p2 (B3 + (ln 1/δ)1/2) / ρm1/2 • SRNG optimizes the margin! RU Groningen

More Related