120 likes | 147 Views
Explore the concept of Radial Basis Functions in Neural Networks, including training methods, error functions, learning strategies, and practical examples. Understand the complexities and strategies involved in optimizing RBF networks.
E N D
Radial Basis Functions If we are using such linear interpolation, then our radial basis function (RBF) 0that weights an input vector based on its distance to a neuron’s reference (weight) vector is 0(D) = D-1. For the training samples xp, p = 1, …, P0, surrounding the new input x, we find for the network’s output o: (In the following, to keep things simple, we will assume that the network has only one output neuron. However, any number of output neurons could be implemented.) Neural Networks Lecture 15: Radial Basis Functions
Radial Basis Functions Since it is difficult to define what “surrounding” should mean, it is common to consider all P training samples and use any monotonically decreasing RBF : This, however, implies a network that has as many hidden nodes as there are training samples. This in unacceptable because of its computational complexity and likely poor generalization ability – the network resembles a look-up table. Neural Networks Lecture 15: Radial Basis Functions
Radial Basis Functions It is more useful to have fewer neurons and accept that the training set cannot be learned 100% accurately: Here, ideally, each reference vector i of these N neurons should be placed in the center of an input-space cluster of training samples with identical (or at least similar) desired output i. To learn near-optimal values for the reference vectors and the output weights, we can – as usual – employ gradient descent. Neural Networks Lecture 15: Radial Basis Functions
The RBF Network output vector o1 • Example: Network function f: R3 R output layer w0 w1 w2 w3 w4 1,1 2,2 3,3 4,4 RBF layer 1 input layer x0=1 x2 x3 input vector Neural Networks Lecture 15: Radial Basis Functions
Radial Basis Functions • For a fixed number of neurons N, we could learn the following output weights and reference vectors: • To do this, we first have to define an error function E: • Taken together, we get: Neural Networks Lecture 15: Radial Basis Functions
Learning in RBF Networks • Then the error gradient with regard to w1, …, wN is: • For i,j, the j-th vector component of i, we get: Neural Networks Lecture 15: Radial Basis Functions
Learning in RBF Networks • The vector length (||…||) expression is inconvenient, because it is the square root of the given vector multiplied by itself. • To eliminate this difficulty, we introduce a function R with R(D2) = (D) and substitute . • This leads to a simplified differentiation: Neural Networks Lecture 15: Radial Basis Functions
Learning in RBF Networks Together with the following derivative… … we finally get the result for our error gradient: Neural Networks Lecture 15: Radial Basis Functions
Learning in RBF Networks This gives us the following updating rules: where the (positive) learning rates i and i,j could be chosen individually for each parameter wi and i,j. As usual, we can start with random parameters and then iterate these rules for learning until a given error threshold is reached. Neural Networks Lecture 15: Radial Basis Functions
Learning in RBF Networks If the node function is given by a Gaussian, then: As a result: Neural Networks Lecture 15: Radial Basis Functions
Learning in RBF Networks The specific update rules are now: and Neural Networks Lecture 15: Radial Basis Functions
Learning in RBF Networks It turns out that, particularly for Gaussian RBFs, it is more efficient and typically leads to better results to use partially offline training: First, we use any clustering procedure (e.g., k-means) to estimate cluster centers, which are then used to set the values of the reference vectors i and their spreads (standard deviations) i. Then we use the gradient descent method described above to determine the weights wi. Neural Networks Lecture 15: Radial Basis Functions