1 / 64

RBF

NEURAL NETWORK Radial Basis Function. RBF. Radial Basis Functions. The RBF networks, just like MLP networks, can therefore be used classification and/or function approximation problems.

huslu
Download Presentation

RBF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NEURAL NETWORKRadial Basis Function RBF

  2. Radial Basis Functions • The RBF networks, just like MLP networks, can therefore be used classification and/or function approximation problems. • The RBFs, which have a similar architecture to that of MLPs, however, achieve this goal using a different strategy: ……….. Linear output layer Input layer Nonlinear transformation layer (generates local receptive fields)

  3. Radial Basis Function A hidden layer of radial kernels The hidden layer performs a non-linear transformation of input space The resulting hidden space is typically of higher dimensionality than the input space An output layer of linear neurons The output layer performs linear regression to predict the desired targets Dimension of hidden layer is much larger than that of input layer Cover’s theorem on the separability of patterns “A complex pattern-classification problem cast in a high-dimensional space non-linearly is more likely to be linearly separable than in a low-dimensional space” Support Vector Machines. RBFs are one of the kernel functions most commonly used.

  4. Radial Basis Function j ( X ) 1 X w 1 m j x1 i å ( X ) = j y w ( X ) 2 w i i 2 = i 1 Output x2 å w 3 y j ( X ) x3 3 w ... m 1 ... More likely to be linearly separated x N j ( X ) m 1 Increased dimension: j ( X ) : non linear function N->m1 i RBFs have their origins in techniques for performing exact function interpolation [Bishop, 1995]

  5. ARCHITECTURE • Input layer: source of nodes that connect the NN with its • environment. x1 w1 x2 y wm1 xm • Hidden layer: applies a non-linear transformation from the input space to the hidden space. • Output layer: applies a linear transformation from the hidden space to the output space.

  6. φ-separability of patterns Hidden function Hidden space A (binary) partition, also called dichotomy, (C1,C2) of the training set C is φ-separable if there is a vector w of dimension m1 such that:

  7. Examples of φ-separability Separating surface: Examples of separable partitions (C1,C2): Linearly separable: Quadratically separable: Polynomial type functions Spherically separable:

  8. x1 x2 xm HIDDEN NEURON MODEL • Hidden units: use a radial basis function the output depends on the distance of the input x from the center t φ( || x - t||2) φ( || x - t||2) t is called center  is called spread center and spread are parameters

  9. Hidden Neurons • A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the spread . • Larger spread less sensitivity

  10. φ : center Small  Large  Gaussian Radial Basis Function φ  is a measure of how spread the curve is:

  11. Types of φ • Multiquadrics: • Inverse multiquadrics: • Gaussian functions:

  12. Nonlinear Receptive Fields • The hallmark of RBF networks is their use of nonlinear receptive fields • RBFs are universal approximators ! • The receptive fields nonlinearly transforms (maps) the input feature space, where the input patterns are not linearly separable, to the hidden unit space, where the mapped inputs may be linearly separable. • The hidden unit space often needs to be of a higher dimensionality • Cover’s Theorem (1965) on the separability of patterns: A complex pattern classification problem that is nonlinearly separable in a low dimensional space, is more likely to be linearly separable in a high dimensional space.

  13. x2 (0,1) (1,1) 0 1 y x1 (0,0) (1,0) Example: the XOR problem • Input space: • Output space: • Construct an RBF pattern classifier such that: • (0,0) and (1,1) are mapped to 0, class C1 • (1,0) and (0,1) are mapped to 1, class C2

  14. φ2 (0,0) Decision boundary 1.0 0.5 (1,1) 1.0 φ1 0.5 (0,1) and (1,0) Example: the XOR problem • In the feature (hidden) space: • When mapped into the feature space < 1 , 2 >, C1 and C2become linearly separable. Soa linear classifier with 1(x) and 2(x) as inputs can be used to solve the XOR problem.

  15. (1,1) _ _ _ _ _ _ 1.0 0.8 0.6 0.4 0.2 0 (0,1) (1,0) (0,0) | | | | | | 0 0.2 0.4 0.6 0.8 1.0 1.2 The (you guessed it right) XOR Problem Consider the nonlinear functions to map the input vector x to the 1- 2 space x2 1 x=[x1 x2] 0 x1 0 1  The nonlinear  function transformed a nonlinearly separable problem into a linearly separable one !!!

  16. Initial Assessment • Using nonlinear functions, we can convert a nonlinearly separable problem into a linearly separable one. • From a function approximation perspective, this is equivalent to implementing a complex function (corresponding to the nonlinearly separable decision boundary) using simple functions (corresponding to the linearly separable decision boundary) • Implementing this procedure using a network architecture, yields the RBF networks, if the nonlinear mapping functions are radial basis functions. • Radial Basis Functions: • Radial: Symmetric around its center • Basis Functions: A set of functions whose linear combination can generate an arbitrary function in a given function space.

  17. x1 uJi  xd RBF Networks d inputnodes H hidden layer RBFs(receptive fields) x1 c outputnodes 1 z1 x2 ……... .. Wkj netk Uji zk …….... j .. yj Linear act. function zc … x(d-1) H xd : spread constant

  18. x1 UJi  xd Principle of Operation Euclidean Norm : spread constant yJ y1  wKj  yH Unknowns: uji, wkj, 

  19. Principle of Operation • What do these parameters represent? • Physical meanings: • : The radial basis function for the hidden layer. This is a simple nonlinear mapping function (typically Gaussian) that transforms the d- dimensional input patterns to a (typically higher) H-dimensional space. The complex decision boundary will be constructed from linear combinations (weighted sums) of these simple building blocks. • uji: The weights joining the first to hidden layer. These weights constitute the center points of the radial basis functions. • : The spread constant(s). These values determine the spread (extend) of each radial basis function. • Wjk: The weights joining hidden and output layers. These are the weights which are used in obtaining the linear combination of the radial basis functions. They determine the relative amplitudes of the RBFs when they are combined to form the complex function.

  20. Principle of Operation wJ:Relative weight of Jth RBF J: Jth RBF function J * uJCenter of Jth RBF

  21. Learning Algorithms • Parameters to be learnt are: • centers • spreads • weights • Different learning algorithms

  22. Learning Algorithm 1 • Centers are selected at random • center locations are chosen randomly from the training set • Spreads are chosen by normalization:

  23. Learning Algorithm 1 • Weights are found by means of pseudo-inverse method Desired response Pseudo-inverse of

  24. Learning Algorithm 2 • Hybrid Learning Process: • Self-organized learning stage for finding the centers • Spreads chosen by normalization • Supervised learning stage for finding the weights, using LMS algorithm • Centers are obtained from unsupervised learning (clustering). • Spreads are obtained as variances of clusters, w are obtained through LMS algorithm. Clustering (k-means) and LMS are iterative. This is the most commonly used procedure. Typically provides good results.

  25. Learning Algorithm 2: Centers • K-means clustering algorithm for centers • Initialization: tk(0) random k = 1, …, m1 • Sampling: draw x from input space C • Similaritymatching: find index of best center • Updating: adjust centers • Continuation: increment n by 1, goto 2 and continue until no noticeable changes of centers occur

  26. Learning Algorithm 3 • Supervised learning of all the parameters using the gradient descent method • Modify centers • All unknowns are obtained from supervised learning. Instantaneous error function Learning rate for Depending on the specific function can be computed using the chain rule of calculus

  27. Learning Algorithm 3 • Modify spreads • Modify output weights

  28. RBFs Training [Haykin, 1999] Unsupervised selection of RBF centers RBF centers are selected so as to match the distribution of training examples in the input feature space Supervised computation of output vectors Hidden-to-output weight vectors are determined so as to minimize the sum squared error between the RBF outputs and the desired targets Since the outputs are linear, the optimal weights can be computed using fast, linear matrix inversion Once the RBF centers have been selected, hidden-to-output weights are computed so as to minimize the MSE error at the output

  29. Linear Regression Models

  30. Polynomial model family Linear inwReduces to the linear regression case, but with more variables. Number of terms grows asDM

  31. Generalized linear model Linear inwReduces to the linear regression case, but with more variables. Requires good guess on “basis functions”hk(x)

  32. Polynomial Classifier: XOR problem • XOR problem with polynomial function. • With nonlinear polynomial functions classes can be classified. • Example XOR-Problem: …but with a polynomial function!

  33. Polynomial Classifier: XOR problem

  34. Polynomial Classifier: XOR problem

  35. Polynomial Classifier: XOR problem

  36. Polynomial Classifier: XOR problem

  37. Polynomial Classifier more general

  38. The Gaussian is the most commonly used RBF • Note that • Gaussian RBFs are localized functions ! unlike the sigmoids used by MLPs Using sigmoidal radial basis functions Using Gaussian radial basis functions

  39. 1. Function Width Demo • Application Model • Function Approximation • Target Function • Focus • The width and the center of RBF. • Generalization of RBF Network

  40. Change the width of RBF (1) • Radial Basis Function model : Gaussian • Test Case • The number of RBF(neuron) is fixed : 21 • Case 1: width = 1 • Case 2: width = 0.2 • Case 3: width = 200 • Validation Test • To check the generalization ability of RBF Networks

  41. Change the width of RBF (2) Width = 1 Width = 0.2 Width = 200

  42. Change the number of RBF center (1) • Test Case • The Number of RBF = 7 center position: [-8, –5, –2, 0, 2, 5, 8] • Case 1: width = 1 • Case 2: width = 6 • The Number of RBF = 2 center position: [-3, 10] • Case 3: width = 1 • Case 4: width = 6

  43. Change the number of RBF center (2) • # of RBF = 7 [-8, -5, -2, 0, 2, 5, 8] width = 1 width = 6

  44. Change the number of RBF center (3) • # of RBF = 2 [ -3, 10] width = 1 width = 6

More Related