Relevance learning

Relevance learning Barbara Hammer, AG LNM, Universität Osnabrück, Germany and coworkers: Thorsten Bojer, Marc Strickert, Thomas Villmann RU Groningen

Outline • LVQ • Relevance learning • Advanced • Experiments • Generalization ability • Conclusions RU Groningen

LVQ … RU Groningen

LVQ Learning Vector Quantization(LVQ) [Kohonen]: supervised prototype-based classification given by prototypes (wi,c(wi)) ∈ ℝn x {1,…,m} winner-takes-all classification, x  c(wi) s.t. |x-wi| minimal Hebbian Learning, given examples (xi,c(xi)) i.e. adapt the winner wj by wj ±= η·(xi-wj) RU Groningen

x2 x1 LVQ distinguish apples and pears: represented by ( Øx/Øy , hardness ) in ℝ2 RU Groningen

LVQ cannot solve interesting problems: RU Groningen

LVQ ... crucially depends on the Euclidean metric and is thus inappropriate for high-dimensional, heterogeneous, complex data ... is not stable for overlapping classes ... is very sensitive to initialization RU Groningen

Relevance learning … RU Groningen

substitute the Euclidean metric by a metric with adaptive relevance terms: adapt the relevance terms with Hebbian learning: Relevance learning RLVQ RU Groningen

Advanced … RU Groningen

I: stability… RU Groningen

LVQ is a stochastic gradient descent on the cost function squared distance to closest correct/incorrect prototype where Advanced RLVQ uses the weighted Euclidean distance in f  RU Groningen

GLVQ is a stochastic gradient descent on Advanced where [Sato/Yamada] GRLVQ uses the weighted Euclidean distance in f  RU Groningen

Advanced squared weighted Euclidean distance to closest correct/incorrect prototype minimize i.e. RU Groningen

Advanced noise: 1+N(0.05), 1+N(0.1),1+N(0.2),1+N(0.5),U(0.5),U(0.2),N(0.5),N(0.2) RU Groningen

II: initialization… RU Groningen

a stochastic gradient descent is highly sensitive to initialization for multimodal functions Advanced    squared weighted Euclidean distance to closest correct/incorrect prototype global (but unsupervised) update: Neural Gas (NG) [Martinetz] RU Groningen

NG is a stochastic gradient descent on the cost function Advanced + GRLVQ SRNG minimizes the cost function ... i.e. all correct prototypes are adapted according to their rank RU Groningen

Advanced RU Groningen

III: greater flexibility… RU Groningen

Advanced The SRNG cost function can be formulated for arbitrary adaptive differentiable distance measures e.g. … alternative exponents … shift invariance … local correlations for time series RU Groningen

Experiments … RU Groningen

I: time series prediction … RU Groningen

discretization Experiments ? RU Groningen

Experiments RU Groningen

II: fault detection … RU Groningen

Experiments • online-detection of faults for piston-engines thanks: PROGNOST RU Groningen

Experiments detection based on heterogeneous data: time dependent signals from sensors measuring pressure and oscillation, process characteristics, characteristics of the pV diagramm, … sensors RU Groningen

adaptive fixed Experiments data: • ca. 30 time series with 36 entries per series • ca. 20 values from a time interval • ca. 40 global features ca. 15 classes, ca. 100 training patterns similarity measure: RU Groningen

III: splice site recognition… RU Groningen

Experiments • splicing for higher eucariotes: copy of DNA branch site A64G73G100T100G62A68G84T63 C65A100G100 reading frames 18-40 bp pyrimidines, i.e. T,C donor acceptor • ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC no yes RU Groningen

Experiments • IPsplice (UCI): human DNA, 3 classes, ca.3200 points, window size 60, old • C.elegans (Sonneburg et al.): only acceptor/decoys, 1000/10000 training examples, 10000 test examples, window size 50, decoys are close to acceptors • SRNG with few (8 resp. 5 per class) prototypes • LIK-similarity local correlations RU Groningen

Experiments IPsplice: RU Groningen

Experiments C.elegans: .. GRLVQ yields sparser solutions  RU Groningen

Generalization ability … RU Groningen

algorithm Generalization ability   F := binary function class given by GRLVQ with p prototypes (xi,yi)i=1..m training data, i.i.d. w.r.t. Pf in F Goal: EP(f) := P(y≠f(x)) should be small RU Groningen

Generalization ability Goal: EP(f) := P(y≠f(x)) should be small Learning theorie:EP(f) ≤ |{ i | yi≠f(xi)}|/m + structural risk It holds for GRLVQ: EP(f) ≤ |{ i | yi ≠ f(xi)}|/m + Ʃ0<Mf(xi)<ρ(1-Mf(xi)/ρ)/m + O(p2(B3+(ln 1/δ)1/2)/(ρm1/2)) whereby Mf(xi) := - dλ+(xi)+ dλ-(xi) is the margin (= security of classification) • dimension independent large-margin bound! GRLVQ optimizes the margin: empirical error, optimized during training amount of surprise possible in the function class training error correct points with small margin bound depending on m = number of data p = number of prototypes, B = support, δ = confidence ρ = margin RU Groningen

Conclusions … RU Groningen

Conclusions • SRNG as generalization of LVQ with • adaptive diagonal metric  much more flexible (RLVQ) • cost function  stable (GRLVQ) • neighborhood cooperation  global (SRNG) • competitive to state of the art algorithms in various applications, thereby fast and simple • generalization bounds, training includes structural risk minimization RU Groningen

RU Groningen

LVQ Alternative Kodierung der Birnen/Äpfel: (Stengellänge,Anzahl Kerne,Farbe,Preis,Wurm) Problem: LVQ basiert auf der Euklidischen Metrik. RU Groningen

Experiments • Lysimeter in St.Arnold (thanks: H.Lange) RU Groningen

LVQ provides excellent generalization: [Crammer,Gilad-Bachrach,Navot,Tishby]: dimensionality inpependent large-margin generalization bound for LVQ RU Groningen

Generalization ability • margin: Mf(x) := dλ+ - dλ- • empirical loss with margin: ELm(f,x) := |{yi ≠ f(xi) | i=1...m}| + ƩMf(xi)<ρ1 - Mf(xi)/ρ • then (using tricks from [Bartlett/Mendelson]): • EP(f)  ELm(f,x) + term(Gaussian complexity of F) • f = fixed Boolean formula (two-prototype classifier) • f = fixed Boolean formula (linear classifier) •  bound of order p2 (B3 + (ln 1/δ)1/2) / ρm1/2 • SRNG optimizes the margin! RU Groningen

Relevance learning