450 likes | 630 Views
Relevance learning. Barbara Hammer, AG LNM, Universität Osnabrück, Germany and coworkers: Thorsten Bojer, Marc Strickert, Thomas Villmann. Outline. LVQ Relevance learning Advanced Experiments Generalization ability Conclusions. LVQ …. LVQ.
E N D
Relevance learning Barbara Hammer, AG LNM, Universität Osnabrück, Germany and coworkers: Thorsten Bojer, Marc Strickert, Thomas Villmann RU Groningen
Outline • LVQ • Relevance learning • Advanced • Experiments • Generalization ability • Conclusions RU Groningen
LVQ … RU Groningen
LVQ Learning Vector Quantization(LVQ) [Kohonen]: supervised prototype-based classification given by prototypes (wi,c(wi)) ∈ ℝn x {1,…,m} winner-takes-all classification, x c(wi) s.t. |x-wi| minimal Hebbian Learning, given examples (xi,c(xi)) i.e. adapt the winner wj by wj ±= η·(xi-wj) RU Groningen
x2 x1 LVQ distinguish apples and pears: represented by ( Øx/Øy , hardness ) in ℝ2 RU Groningen
LVQ cannot solve interesting problems: RU Groningen
LVQ ... crucially depends on the Euclidean metric and is thus inappropriate for high-dimensional, heterogeneous, complex data ... is not stable for overlapping classes ... is very sensitive to initialization RU Groningen
Relevance learning … RU Groningen
substitute the Euclidean metric by a metric with adaptive relevance terms: adapt the relevance terms with Hebbian learning: Relevance learning RLVQ RU Groningen
Advanced … RU Groningen
I: stability… RU Groningen
LVQ is a stochastic gradient descent on the cost function squared distance to closest correct/incorrect prototype where Advanced RLVQ uses the weighted Euclidean distance in f RU Groningen
GLVQ is a stochastic gradient descent on Advanced where [Sato/Yamada] GRLVQ uses the weighted Euclidean distance in f RU Groningen
Advanced squared weighted Euclidean distance to closest correct/incorrect prototype minimize i.e. RU Groningen
Advanced noise: 1+N(0.05), 1+N(0.1),1+N(0.2),1+N(0.5),U(0.5),U(0.2),N(0.5),N(0.2) RU Groningen
II: initialization… RU Groningen
a stochastic gradient descent is highly sensitive to initialization for multimodal functions Advanced squared weighted Euclidean distance to closest correct/incorrect prototype global (but unsupervised) update: Neural Gas (NG) [Martinetz] RU Groningen
NG is a stochastic gradient descent on the cost function Advanced + GRLVQ SRNG minimizes the cost function ... i.e. all correct prototypes are adapted according to their rank RU Groningen
Advanced RU Groningen
III: greater flexibility… RU Groningen
Advanced The SRNG cost function can be formulated for arbitrary adaptive differentiable distance measures e.g. … alternative exponents … shift invariance … local correlations for time series RU Groningen
Experiments … RU Groningen
I: time series prediction … RU Groningen
discretization Experiments ? RU Groningen
Experiments RU Groningen
II: fault detection … RU Groningen
Experiments • online-detection of faults for piston-engines thanks: PROGNOST RU Groningen
Experiments detection based on heterogeneous data: time dependent signals from sensors measuring pressure and oscillation, process characteristics, characteristics of the pV diagramm, … sensors RU Groningen
adaptive fixed Experiments data: • ca. 30 time series with 36 entries per series • ca. 20 values from a time interval • ca. 40 global features ca. 15 classes, ca. 100 training patterns similarity measure: RU Groningen
III: splice site recognition… RU Groningen
Experiments • splicing for higher eucariotes: copy of DNA branch site A64G73G100T100G62A68G84T63 C65A100G100 reading frames 18-40 bp pyrimidines, i.e. T,C donor acceptor • ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC no yes RU Groningen
Experiments • IPsplice (UCI): human DNA, 3 classes, ca.3200 points, window size 60, old • C.elegans (Sonneburg et al.): only acceptor/decoys, 1000/10000 training examples, 10000 test examples, window size 50, decoys are close to acceptors • SRNG with few (8 resp. 5 per class) prototypes • LIK-similarity local correlations RU Groningen
Experiments IPsplice: RU Groningen
Experiments C.elegans: .. GRLVQ yields sparser solutions RU Groningen
Generalization ability … RU Groningen
algorithm Generalization ability F := binary function class given by GRLVQ with p prototypes (xi,yi)i=1..m training data, i.i.d. w.r.t. Pf in F Goal: EP(f) := P(y≠f(x)) should be small RU Groningen
Generalization ability Goal: EP(f) := P(y≠f(x)) should be small Learning theorie:EP(f) ≤ |{ i | yi≠f(xi)}|/m + structural risk It holds for GRLVQ: EP(f) ≤ |{ i | yi ≠ f(xi)}|/m + Ʃ0<Mf(xi)<ρ(1-Mf(xi)/ρ)/m + O(p2(B3+(ln 1/δ)1/2)/(ρm1/2)) whereby Mf(xi) := - dλ+(xi)+ dλ-(xi) is the margin (= security of classification) • dimension independent large-margin bound! GRLVQ optimizes the margin: empirical error, optimized during training amount of surprise possible in the function class training error correct points with small margin bound depending on m = number of data p = number of prototypes, B = support, δ = confidence ρ = margin RU Groningen
Conclusions … RU Groningen
Conclusions • SRNG as generalization of LVQ with • adaptive diagonal metric much more flexible (RLVQ) • cost function stable (GRLVQ) • neighborhood cooperation global (SRNG) • competitive to state of the art algorithms in various applications, thereby fast and simple • generalization bounds, training includes structural risk minimization RU Groningen
LVQ Alternative Kodierung der Birnen/Äpfel: (Stengellänge,Anzahl Kerne,Farbe,Preis,Wurm) Problem: LVQ basiert auf der Euklidischen Metrik. RU Groningen
Experiments • Lysimeter in St.Arnold (thanks: H.Lange) RU Groningen
LVQ provides excellent generalization: [Crammer,Gilad-Bachrach,Navot,Tishby]: dimensionality inpependent large-margin generalization bound for LVQ RU Groningen
Generalization ability • margin: Mf(x) := dλ+ - dλ- • empirical loss with margin: ELm(f,x) := |{yi ≠ f(xi) | i=1...m}| + ƩMf(xi)<ρ1 - Mf(xi)/ρ • then (using tricks from [Bartlett/Mendelson]): • EP(f) ELm(f,x) + term(Gaussian complexity of F) • f = fixed Boolean formula (two-prototype classifier) • f = fixed Boolean formula (linear classifier) • bound of order p2 (B3 + (ln 1/δ)1/2) / ρm1/2 • SRNG optimizes the margin! RU Groningen