Back Propagation and Representation in PDP Networks

Back Propagation and Representation in PDP Networks Psychology 209February 6, 2013

Homework 4 • Part 1 due Feb 13 • Complete Exercises 5.1 and 5.2. • It may be helpful to carry out some explorations of parameters, as suggested in Exercise 5.3. This may help you achieve a solution in the last part of the homework, below. However, no write-up is required for this. • Part 2 due Feb 20 • Consult Chapter 8 of the PDP Book by Rumelhart, Hinton, and Williams (In readings directory for Feb 6). Consider the problems described there that were solved using back propagation, and choose one; or create a problem of your own to investigate with back propagation. • Carry out Exercise 5.4, creating your own network, template, pattern, and startup file (similar to bpxor.m), and answer question 5.4.1.

The Perceptron For input pattern p, teacher tp and output op, change the threshold And weights as follows: Note: including bias = -qin net and using threshold of 0, thentreating bias as a weight from a unit that is always on is equivalent

AND, OR, XOR

Adding a unit to make XOR solvable

Gradient Descent Learning in the ‘LMS’ Associator Output is a linear function of inputs and weights: Find a learning rule to minimize the Summed squared Error: Consider the policy: This breaks down into the sum overpatterns of terms of the form: Taking derivatives, we find:

Error Surface for OR functionin LMS Associator

What if we want to learn how to solve xor? We need to figure out how to adjust the weightsinto the ‘hidden’ unit, following the principle ofgradient descent:

We start with an even simplerproblem 1 2 0 w10 w21 Assume units are linear, both weights = .5 and, i = 1, t = 1. Weight changes should follow the gradient: We use the chain rule to calculate for each weight. First we unpack the chain, then we calculate the elements of it.

Including a non-linear activation function • Let • Then • So our chains from before become:

Including the activation function in the chain rule and including more than one output unit leads to the formulation below, in which we use ‘di’ to represent ∂E/∂neti Calculating the d term for output unit i: di = (ti-ai)f’(neti) i And the d term for hidden unit j: dj = f’(netj)Sidiwij j We can continue this back indefinitely… ds = f’(nets)Srdrwrs The weight change rule at every layer is: Dwrs = edras k

Back propagation algorithm • Propagate activation forward • Activation can only flow from lower-numbered units to higher numbered units • Propagate “error” backward • Error flows from higher numbered units back to lower numbered units • Calculate ‘weight error derivative’ terms = dras • One can change weights after processing a single pattern or accumulate weight error derivatives over a batch of patterns before changing the weights.

Variants/Embellishments to back propagation • Full “batch mode” (epoch-wise) learning rule with weight decay and momentum:Dwrs= eSpdrpasp – wwrs + aDwrs(prev) • Weights can alternatively be updated after each pattern or after every k patterns. • An alternative error measure has both conceptual and practical advantages:CEp = -Si[tiplog(aip) + (1-tip)log(1-aip)] • If targets are actually probabilistic, minimizing CEp maximizes the probability of the observed target values. • This also eliminates the ‘pinned output unit’ problem.

Why is back propagation important? • Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem. • Contrary to expectation, it does not get stuck in local minima except in cases where the network is exceptionally tightly constrained. • Allows networks to learn how to represent information as well as how to use it. • Raises questions about the nature of representations and of what must be specified in order to learn them.

Is Backprop biologically plausible? • Neurons do not send error signals backward across their weights through a chain of neurons, as far as anyone can tell • But we shouldn’t be too literal minded about the actual biological implementation of the learning rule. • Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information

Back Propagation and Representation in PDP Networks

Back Propagation and Representation in PDP Networks

Presentation Transcript

Fast Worm Propagation In IPv6 Networks

NEURONAL NETWORKS AND CONNECTIONIST (PDP) MODELS

Classification by Back Propagation

Back-propagation

Back-propagation

Propagation on Large Networks

Cache Replacement Scheme based on Back Propagation Neural Networks

Propagation on Large Networks

Back Propagation Learning Algorithm

Filtering of Spam E-Mails Using Back-Propagation Neural Networks

Back Propagation

Overview of Back Propagation Algorithm

Back Propagation Learning Algorithm

Back Propagation Neural Networks BPNN

Measuring Wireless Propagation in Sensor Networks

BACK FLOWS IN NETWORKS

Propagation in Networks

Back-propagation network (BPN)

Back-propagation

Back Propagation