EE 690 Design of Embodied Intelligence

Least-squares-based Multilayer perceptron training with weighted adaptation-- Software simulation project EE 690 Design of Embodied Intelligence

Outline • Multilayer Perceptron • Least-squares based Learning Algorithm • Weighted Adaptation in training • Signal-to-Noise Ratio Figure and Overfitting • Software simulation project

Inputs x Outputs z Multilayer perceptron (MLP) Feedforward (no recurrent connections) network with units arranged in layers

MLP Multilayer perceptron (MLP) • Efficient mapping from inputs to outputs • Powerful universal function approximation • Number of inputs and outputs determined by the data • Number of hidden neurons • Number of hidden layers outputs inputs

Back-propagation (BP) training algorithm: how much each weight is responsible for the error signal BP has two phases: Forward pass phase: feedforward propagation of input signals through network Backward pass phase: propagates the error backwards through network output layer input layer hidden layer Multilayer Perceptron Learning

Multilayer Perceptron Learning • Backward Pass We want to know how to modify weights in order to decrease E. • Use gradient descent: • Gradient-based adjustment could go to local minima • Time-consuming due to large number of learning steps and the step size needs to be configured

Optimized weights Optimized signals Least-squares based Learning Algorithm • Least-squared fit (LSF): to obtain the minimum sum of squared error • For underdetermined problem, LSF finds the solution with the minimum SSE • For overdetermined problem, pseudo-inverse finds the solution with minimum norm • Can be applied in the optimization for weights or signals on the layers

b1 b2 W1 W2 d x z2 y2 y1 z1 Least-squares based Learning Algorithm (I) • Start with desired output signal back-propagation •  signals optimization • Propagation of the desired outputs back through layers • Optimization of the weights between layers (1). y2=f -1(z2), scale y1 to (-1, 1). (2). Based on W2,b2:W2.z1=y2-b2. (3). y1=f-1(z1), scale y1 to (-1, 1). (4). Optimize W1, b1 to satisfy W1.x-b1=y1. (5). Evaluate z1, y1 using the new W1 and bias b1. (6). Optimize W2, b2 to satisfy W2.z1+b2=y2. (7). Evaluate z2, y2 using the new W2 and bias b2. (8). Evaluate the MSE

Optimize W1, b1 to satisfy W1.x=y1-b1 Least-squares based Learning Algorithm (I) • Weights optimization with weighted LSF The location of x on the transfer function determines its effect on output signal of this layer dy/dx weighting term in LSF Δy Δy Δx Δx Weighted LSF

x Least-squares based Learning Algorithm (II) II. Weights optimization with iterative fitting W1 can be further adjusted based on the output error Each hidden neuron: basis function Start with the 1st hidden neurons, and continue to other neurons as long as eout exists

b1 b2 W1 W2 d x z2 y2 y1 z1 Least-squares based Learning Algorithm (III) • III. Start with input feedforward  weights optimization • Propagation of the inputs forward through layers • Optimization of the weights between layers and signals on layers (1). Evaluate z1, y1 using the initial W1 and bias b1. (2). y2=f -1(d). (3). Optimize W2, b2 to satisfy W2.z1+b2=y2. (4). Based on W2,b2, optimize z1 to satisfy W2.z1-b2=y2. (5). y1=f-1(z1). (6). Optimize W1, b1 to satisfy W1.x+b1=y1. (7). Evaluate y1, z1, y2, z2 using the new W1,W2 and bias b1,b2. (8). Evaluate the MSE

Least-squares based Learning Algorithm (III) y • Signal optimization with weighted adaptation The location of x on the transfer function determines how much the signal can be changed x

Overfitting problem • Learning algorithm can adapt MLP to fit into the training data. • For the noisy training data, how well we should learn into the data? • Overfitting • Number of hidden neurons Number of layers  affect the training accuracy, determined by users: critical • Optimized Approximation Algorithm –SNRF criterion

Signal-to-noise ratio figure (SNRF) • Sampled data: function value + noise • Error signal: approximation error component + noise component Noise part Should not be learned Useful signal Should be reduced • Assumption: continuous function & WGN as noise • Signal-to-noise ratio figure (SNRF): signal energy/noise energy • Compare SNRFe and SNRFWGN Learning should stop – ? If there is useful signal left unlearned If noise dominates in the error signal

Error signal Training data and approximating function Signal-to-noise ratio figure (SNRF) noise component approximation error component +

Optimization using SNRF • SNRFe< threshold SNRFWGN • Start with small network (small # of neurons or layers) • Train the MLP  etrain • Compare SNRFe & SNRFWGN • Add hidden neurons Noise dominates in the error signal, Little information left unlearned, Learning should stop Stopping criterion: SNRFe< threshold SNRFWGN

Optimization using SNRF • Set the structure of MLP • Train the MLP with back-propagation iteration  etrain • Compare SNRFe & SNRFWGN • Keep training with more iterations Applied in optimizing number of iterations in back-propagation training to avoid overfitting (overtraining)

M x N matrix: “Features” 1 x N vector: “Values” Software simulation project • Prepare the data • Data sample along the row: N samples • Features along the column: M features • Desired output in a row vector: N values • Save “features” and “values” in a training MAT file • How to recall the function • Run “main_MLP_LS.m” • Specify MAT file path and name and MLP parameters in command window.

Software simulation project • Input the path where data file can be found (C:*): E:\Research\MLP_LSInitial_desired\MLP_LS_package\ • Input the name of data file (*.mat): mackey_glass_data.mat • There are overall 732 samples. How do you like to divide them into training and testing set? Number of training samples: 500 Number of testing samples: 232 • How many layers does MLP have? 3:2:7 • How many neurons there are on each hidden layer ? 3:1:10 • What kind of tranfer function you like to have on hidden neurons? • 0. Linear tranfer function • 1. Tangent sigmoid • 2. Logrithmic sigmoid • 2

b1 W1 b2 W2 d x z2 y2 y1 z1 Software simulation project • There are 4 types of training algorithms you can choose from. Which type you like to use? • 1. Least-squared based training (I) • 2. Least-squared based training with iterative neuron fitting (II) • 3. Least-squared based training with weighted signal adaptation (III) • 4. Back-propagation training (BP) • 1 • How many iterations you would like to have in the training ? 3 • How many Monte-Carlo runs you would like to have for the training? 2

Software simulation project • Results: J_train (num_layer, num_neuron) J_test (num_layer, num_neuron) SNRF (num_layer, num_neuron) • Present training and testing errors for various configurations of the MLP • Present the optimum configuration found by SNRF • Present the comparison of the results, including errors, network structure

Software simulation project • Typical database and literature survey • Function approximation & classification dataset “IEEE Neural Networks Council Standards Committee Working Group on Data modeling Benchmarks” http://neural.cs.nthu.edu.tw/jang/benchmark/#MG “Neural Network Databases and Learning Data” http://www.neoxi.com/NNR/Neural_Network_Databases.php “UCI Machine Learning Repository” http://www.ics.uci.edu/~mlearn/MLRepository.html • Data are normalized • Multiple input, with signal output. • For multiple output data, use separate MLPs. • Compare results from literature which uses the same dataset (*)

EE 690 Design of Embodied Intelligence