1 / 67

Supervised Learning: Linear Perceptron NN

Supervised Learning: Linear Perceptron NN. Distinction Between Approximation-Based vs. Decision-Based NNs. Teacher in Approximation-Based NN are quantitative in real or complex values Teacher in Decision-Based NNs are symbols, instead of numeric complex values. Decision-Based NN (DBNN).

kyne
Download Presentation

Supervised Learning: Linear Perceptron NN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supervised Learning: Linear Perceptron NN

  2. Distinction Between Approximation-Based vs. Decision-Based NNs Teacher in Approximation-Based NN are quantitative in real or complex values Teacher in Decision-Based NNs are symbols, instead of numeric complex values.

  3. Decision-Based NN (DBNN) Linear Perceptron Discriminant function (Score function) Reinforced and Anti-reinforced Learning Rules Hierarchical and Modular Structures

  4. next pattern incorrect/correct classes f2(x,w) fM(x,w) f1(x,w)

  5. Supervised Learning: Linear Perceptron NN

  6. Two-Classes: Linear Perceptron Learning Rule fj(x,wj) = (xTwj+w0) = zTŵj (= zTw) ▽fj ( z, wj) = z Upon the presentation of the m-th training pattern z(m) , the weight vector w(m) is updated as w(m+1) = w(m) + h (t (m) - d (m) ) z(m) where h is a positive learning rate.

  7. Linear Perceptron: Convergence Theorem (Two Classes) If a set of training patterns is linearly separable, then the linear perceptron learning algorithm converges to a correct solution in a finite number of iterations.

  8. w(m+1) = w(m) + h (t (m) - d (m) ) z(m) It converges when learning rate h is small enough.

  9. Multiple Classes strongly linearly separable linearly separable

  10. Linear Perceptron Convergence Theorem (Multiple Classes) If the given multiple-class training set is linearly separable, then the linear perceptron learning algorithm converges to a correct solution after a finite number of iterations.

  11. Multiple Classes:Linear Perceptron Learning Rule (linearly separability)

  12. P1j= [ z 0 0 … -z 0 … 0]

  13. DBNN Structure for Nonlinear Discriminant Function y MAXNET f2(x,w) f3(x,w) f1(x,w) x

  14. DBNN teacher Training if teacher indicates the need y MAXNET w2 w3 w1 x

  15. Decision-based learning rule is based on a minimal updating principle. The rule tends to avoid or minimize unnecessary side-effects due to overtraining. • One scenario is that the pattern is already correctly classified by the current network, then there will be no updating attributed to that pattern, and the learning process will proceed with the next training pattern. • The second scenario is that the pattern is incorrectly classified to another winning class. In this case, parameters of two classes must be updated. The score of the winning class should be reduced, by the anti-reinforced learning rule, while the score of the correct (but not winning) class should be enhanced by the reinforced learning rule.

  16. Reinforced and Anti-reinforced Learning Suppose that the m -th training patternn x(m) , x(m) is known to belong to the i-th class. The leading challenger is denoted by j = arg maxi≠j φ( x(m), Θj ) D wj = - h wj fj ( x, w) D wi = + h wi fi ( x, w) Reinforced Learning Anti-Reinforced Learning

  17. For Simple RBF Discriminant Function fj(x, wj) = .5(x- wj)2 ▽fj(x, wj) = (x- wj) Upon the presentation of the m-th training pattern z(m) , the weight vector w(m) is updated as D wi = + h( x- wi) Reinforced Learning D wj = - h( x- wj) Anti-Reinforced Learning

  18. Decision-Based Learning Rule • The learning scheme of the DBNN consists of two phases: • locally unsupervised learning. • globally supervised learning.

  19. Locally Unsupervised Learning Via VQ or EM Clustering Method Several approaches can be used to estimate the number of hidden nodes or the initial clustering can be determined based on VQ or EM clustering methods. • EM allows the final decision to incorporate prior information. This could be instrumental to multiple-expert or multiple-channel information fusion.

  20. Globally Supervised Learning Rules • The objective of learning is minimum classification error (not maximum likelihood estimation) . • Inter-class mutual information is used to fine tune the decision boundaries (i.e., the globally supervised learning). • In this phase, DBNN applies reinforced-antireinforced • learning rule [Kung95] , or discriminative learning rule • [Juang92] , to adjust network parameters. Only misclassified patterns need to be involved in this training phase.

  21. Pictorial Presentation of Hierarchical DBNN c c c c c c c c c c c c c c c c c c c c c c c b b b b b b b a a b b b b b a a a b b b b b b a a a b b a b b a a b a a a b b b a a b a a a a a a a a a a

  22. Discriminant function (Score function) LBF Function (or Mixture of) RBF Function (or Mixture of) Prediction Error Function Likelihood Function : HMM

  23. Hierarchical and Modular DBNN Subcluster DBNN Probabilistic DBNN Local Experts via K-mean or EM Reinforced and Anti-reinforced Learning

  24. Subcluster DBNN MAXNET

  25. Subcluster DBNN

  26. Subcluster Decision-Based Learning Rule

  27. Probabilistic DBNN

  28. Probabilistic DBNN MAXNET

  29. Probabilistic DBNN

  30. Probabilistic DBNN MAXNET

  31. Subnetwork of a Probabilistic DBNN is basically a mixture of local experts P(y|x,fk) P(y|x,q1) P(y|x,q3) P(y|x,q2) RBF RBF RBF x k-th subnetwork

  32. Probabilistic Decision-Based Neural Networks

  33. Training of Probabilistic DBNN • Selection of initial local experts: • Intra-class training • Unsupervised training • EM (Probabilistic) Training • Training of the experts: • Inter-class training • Supervised training • Reinforced and Anti-reinforced Learning

  34. Probabilistic Decision-Based Neural Networks Training procedure Feature Vectors Class ID K-means Classification Misclassified vectors K-NNs Reinforced Learning EM N Converge ? Y Locally Unsupervised Phase Globally supervised Phase

  35. Probabilistic Decision-Based Neural Networks 2-D Vowel Problem: GMM PDBNN

  36. Difference of MOE and DBNN For MOE, the influence from the training patterns on each expert is regulated by the gating network (which itself is under training) so that as the training goes, the training patterns will have higher influence on the closer-by experts, and lower influence on the far-away ones. (The MOE updates all the classes.) Unlike the MOE, the DBNN makes use of both unsupervised (EM-type) and supervised (decision-based) learning rules. The DBNN uses only mis-classified training patterns for its globally supervised learning. The DBNN updates only the ``winner" class and the class which the mis-classified pattern actually belongs to. Its training strategy is to abide by a ``minimal updating principle“.

  37. DBNN/PDBNN Applications • OCR (DBNN) • Texture Segmentation(DBNN) • Mammogram Diagnosis (PDBNN) • Face Detection(PDBNN) • Face Recognition (PDBNN) • Money Recognition(PDBNN) • Multimedia Library(DBNN)

  38. OCR Classification (DBNN)

  39. Image Texture Classification (DBNN)

More Related