Approximating Functions with Advanced Neural Networks

Function Approximation Fariba Sharifian Somaye Kafi

Contents • Introduction to Counterpropagation • Full Counterpropagation • Architecture • Algorithm • Application • example • Forward only Counterpropagation • Architecture • Algorithm • Application • example

Contents • Function Approximation Using Neural Network • Introduction • Development of Neural Network Weight Equations • Algebra Training Algorithms • Exact Matching of Function Input –Output Data • Approximate Matching of Gradient Data in Algebra Training • Approximate Matching of Function Input-Output Data • Exact Matching of Function Gradient Data

Introduction to Counterpropagation • are multilayer networks based on combination of input, clustering and output layers • can be used to compress data, to approximate functions, or to associate patterns • approximate its training input vectors pair by adoptively constructing a lookup table

Introduction to Counterpropagation (cont.) • training has two stages • Clustering • Output weight updating • There are two types of it • Full • Forward only

Full Counterpropagation • Produces an approximation x*:y* based on • input of an x vector • input of a y vector only • input of an x:y ,possibly with some distorted or missing elements in either or both vectors.

Full Counterpropagation (cont.) • Phase 1 • The units in the cluster layer compete. The learning rule for weight updates on the winning cluster unit is (only the winning unit is allowed to learn)

Full Counterpropagation (cont.) • Phase 2 • The weights from the winning cluster unit J to the output units are adjusted so that the vector of activations of the units in the Y output layer, y*, is an approximation to the input vector y; x*, is an approximation to the input vector x. The weight updates for the units in the Y output and X output layers are

Architecture of Full Counterpropagation X1 w Y1 u Hidden layer Xi Yk Z1 Xn Ym Zj t v Zp Y1* X1* Yk* Xi* Cluster layer Ym* Xn*

Full Counterpropagation Algorithm

Full Counterpropagation Algorithm (phase 1) • Step 1. Initialize weights, learning rates, etc. • Step 2. While stopping condition for Phase 1 is false, do Step 3-8 • Step 3. For each training input pair x:y, do Step 4-6 • Step 4. Set X input layer activations to vector x ; • set Y input layer activations to vector y. • Step 5. Find winning cluster unit; call its index J • Step 6. Update weights for unit ZJ: • Step 7. Reduce learning rate  and . • Step 8. Test stopping condition for Phase 1 training

Full Counterpropagation algorithm(phase 2) • Step 9. While stopping condition for Phase 2 is false, do Step 10-16 • (Note:  and  are small, constant values during phase 2) • Step 10. For each training input pair x:y, do Step 11-14 • Step 11. Set X input layer activations to vector x ; • set Y input layer activations to vector y. • Step 12. Find winning cluster unit; call its index J • Step 13. Update weights for unit ZJ:

Full Counterpropagation Algorithm(phase 2)(cont.) • Step 14. Update weights from unit ZJ to the output layers • Step 15. Reduce learning rate a and b. • Step 16. Test stopping condition for Phase 2 training.

Which cluster is the winner? • dot product (find the cluster with the largest net input) • Euclidean distance (find the cluster with smallest square distance from the input)

Full Counterpropagation Application • The application for counterpropagation is as follows: • Step0: initialize weights. • step1: for each input pair x:y, do step 2-4. • Step2: set X input layer activation to vector x set Y input layer activation to vector Y;

Full Counterpropagation Application (cont.) • Step3: find cluster unit Z, that is closest to the input pair • Step4: compute approximations to x and y: • X*i=tji • Y*k=ujk

Full counterpropagation example • Function approximation of y=1/x • After training phase we have • Cluster unit v w • z1 0.11 9.0 • z2 0.14 7.0 • z3 0.20 5.0 • z4 0.30 3.3 • z5 0.60 1.6 • z6 1.60 0.6 • z7 3.30 0.3 • z8 5.00 0.2 • z9 7.00 0.14 • z10 9.00 0.11

Full counterpropagation example (cont.) X1 Y1 0.11 9.0 Z1 7.0 0.14 5.0 0.2 Z2 . . . 9.0 0.14 0.11 7.0 5.0 Z10 Y1* 0.2 X1*

Full counterpropagation example (cont.) • To approximate value for y for x=0.12 • As we don’t know any thing about y compute D just by means of x • D1=(.12-.11)2 =.0001 D2=.0004 D3=.064 D4=.032 D5=.23 D6=2.2 D7=10.1 D8=23.8 D9=47.3 D10=81

Forward Only Counterpropagation • Is a simplified version of the full counterpropagation • Are intended to approximate y=f(x) function that is not necessarily invertible • It may be used if the mapping from x to y is well defined, but the mapping from y to x is not.

Forward Only Counterpropagation Architecture XY XY w u X1 Y1 Z1 Xi Zj Yk Zp Xn Ym Input layer Cluster layer Output layer

Forward Only Counterpropagation Algorithm • Step 1. Initialize weights, learning rates, etc. • Step 2. While stopping condition for Phase 1 is false, do Step 3-8 • Step 3. For each training input x, do Step 4-6 • Step 4. Set X input layer activations to vector x • Step 5. Find winning cluster unit; call its index j • Step 6. Update weights for unit ZJ: • Step 7. Reduce learning rate  • Step 8. Test stopping condition for Phase 1 training.

Step 9. While stopping condition for Phase 2 is false, do Step 10-16 • (Note:  is small, constant values during phase 2) • Step 10. For each training input pair x:y, do Step 11-14 • Step 11. Set X input layer activations to vector x ; • set Y input layer activations to vector y. • Step 12. Find winning cluster unit; call its index J • Step 13. Update weights for unit ZJ ( is small) • Step 14. Update weights from unit ZJ to the output layers • Step 15. Reduce learning rate a. • Step 16. Test stopping condition for Phase 2 training.

Forward Only Counterpropagation Application • Step0: initialize weights (by training in previous subsection). • Step1: present input vector x. • Step2: find unit J closest to vector x. • Step3: set activation output units: yk=ujk

Forward only counterpropagation example • Function approximation of y=1/x • After training phase we have • Cluster unit w u • z1 0.5 5.5 • z2 1.5 0.75 • z3 2.5 0.4 • z4 . . • z5 . . • z6 . . • z7 . . • z8 . . • z9 . . • z10 9.5 0.1

Function ApproximationUsing Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data

Introduction • analytical description for a set of data • referred to as data modeling or system identification

standard tools • Splines • Wavelets • Neural network

Why Using Neural Network • Splines & Wavelets not generalize well to higher 3 dimensional spaces • universal approximators • parallel architecture • trained to map multidimensional nonlinear functions

Why Using Neural Network (cont) • Central to the solution of differential equations. • Provide differentiable closed-analytic- form solutions • have very good generalization properties • widely applicable • translates into a set of nonlinear, transcendental weight equations • cascade structure • nonlinearity of the hidden nodes • linear operations in the input and output layers

Function Approximation Using Neural Network • functions not known analytically • have a set of precise input–output samples • functions modeled using an algebraic approach • design objectives: • exact matching • approximate matching • feedforward neural networks • Data: • Input • Output • And/or gradient information

Objective • exact solutions • sufficient degrees of freedom • retaining good generalization properties • synthesize a large data set by a parsimonious network

Input-to-node values • algebraic training base • if all sigmoidal functions inputs are known weight equations become algebraic • input-to-node values, sigmoidal functions inputs • determine the saturation level of each sigmoid at a given data point

weight equations structure • analyze & train a nonlinear neural network • means • linear algebra • controlling the distribution • controlling the saturation level of the active nodes

Development of Neural Network Weight Equations • Objective • approximate a smooth scalar function of q Inputs • using a feedforward sigmoidal network

Derivative information • can improve network’s generalization properties • partial derivatives with input • can be incorporated in the training set

Network Output • z: computed as a nonlinear transformation • w: input weight • p: input • b: bias • d: output bias • v: output weight • :sigmoid functions • such as: • input-to-node variables

Scalar OutPut of Network

Exactly Match of the Function’s Outputs • output weighted equation

Gradient Equations • derivative of the network output with respect to its inputs

Exact Matching of the Function’s Derivatives • gradient weight equations

Input-to-node Weight Equations • rewriting 12

Four Algebraic Algorithms • Exact Matching of Function Input –Output Data • Approximate Matching of Gradient Data in Algebra Training • Approximate Matching of Function Input-Output Data • Exact Matching of Function Gradient Data

A.Exact Matching of Function Input-Output Data • Input • S is known matrix ps • strategy for producing a well-conditioned S • input weights • o • random number N(0,1) • L scaling factor • user-defined scalar • input-to-node values that do not saturate the sigmoids

Input bias • The input bias d is computed to center each sigmoid at one of the training pairs from

Finally, the linear system in (9) is solved for v by inverting S

17 produced an ill-conditioned S => computation repeated

Fig. 2-a. Exact input–output-based algebraic algorithm Exact Input-Output-Based Algebraic Algorithm

Approximating Functions with Advanced Neural Networks

Approximating Functions with Advanced Neural Networks

Presentation Transcript

Function Approximation

Function approximation

Supervised Function Approximation

RL for Large State Spaces: Value Function Approximation

Greedy Approximation with Non- submodular Potential Function

Indexing For Function Approximation

Lecture 5I Function Approximation

Lecture 5 II Function Approximation

Data driven function approximation

Reflectance Function Approximation

4. Function Approximation Theory

Kernelized Value Function Approximation for Reinforcement Learning

Reinforcement Learning Generalization and Function Approximation

Chapter 8: Generalization and Function Approximation

Function Approximation

RL for Large State Spaces: Value Function Approximation

Function approximation: Fourier, Chebyshev, Lagrange

Greedy Approximation with Non- submodular Potential Function

Kernelized Value Function Approximation for Reinforcement Learning

Function Approximation