Principles of Back-Propagation Prof. Bart ter Haar Romeny

The relation between biological vision and computer vision Principles of Back-Propagation Prof. Bart ter Haar Romeny

How does this actually work? Deep Learning Convolutional Neural Networks In AlexNet(Alex Krizhevsky2012) Error backpropagation ImageNet challenge: 1.4 million images, 1000 classes 75% → 94% A typical big deep NN has (hundreds of) millions of connections: weights. Convolution, ReLU, max pooling, convolution, convolution etc.

From Prakash Jay, Senior Data Scientist @FractalAnalytics: https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c A numerical example of backpropagation on a simple network:

Approach • Build a small neural network as defined in the architecture right. • Initialize the weights and biases randomly. • Fix the input and output. • Forward pass the inputs. Calculate the cost. • Compute the gradients and errors. • Backprop and adjust the weights and biases accordingly. We initialize the network randomly:

Forward pass layer 1:

Forward pass layer 1: Matrix operation: Relu operation: Example:

Forward pass layer 2: Matrix operation: Sigmoid operation: Example:

Forward pass output layer: Matrix operation: Softmax operation: Example: [ 0.1985 0.2855 0.5158 ]

Analysis: • The Actual Output should be [1.0, 0.0, 0.0] • but we got [0.2698, 0.3223, 0.4078]. • To calculate the error let us use cross-entropy • Error: Example: Error = -(1 * Log[0.19858]+0+0 * Log[0.28559]+1 * Log[1-0.28559] +0 *Log[0.51583]+1 * Log[1-0.51583]) = 2.67818

Analysis: • The Actual Output should be [1.0, 0.0, 0.0] but we got [0.19858, 0.28559, 0.51583]. • To calculate the error let us use cross-entropy • Error: Example: Error = -(1 * Log[0.19858]+0+0 * Log[0.28559]+1 * Log[1-0.28559] +0 *Log[0.51583]+1 * Log[1-0.51583]) = 2.67818 We are done with the forward pass. We know the error of the first iteration (we go do this numerous times). Now let us study the backward pass.

A chain of functions: From Rohan Kapur: https://ayearofai.com/rohan-lenny-1-neural-networks-the-backpropagation-algorithm-explained-abf4609d4f9d

We recall:

For gradient descent: The derivative of this function with respect to some arbitrary weight (for example w1) is calculated by applying the chain rule: For a simple error measure (p = predicted, a = actual):

Important derivatives: Sigmoid: ReLU: SoftMax:

= Two slides ago, we saw that

Going one more layer backwards, we can determine that: With etc. And finally: 1 And iterate until convergence:

Numerical example in great detail by Prakash Jay on Medium.com: • https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c etc.

Deeper reading: • https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative • https://eli.thegreenplace.net/2018/backpropagation-through-a-fully-connected-layer/

Principles of Back-Propagation Prof. Bart ter Haar Romeny

Principles of Back-Propagation Prof. Bart ter Haar Romeny

Presentation Transcript

Appendix B: An Example of Back-propagation algorithm

Classification by Back Propagation

Back-propagation

Principles of Macroeconomics Prof. Jeffrey Nilsen

Back-propagation

Applying back propagation to shape recognition

Ans ter Haar, De Meerwaarde, Barneveld

Back Propagation Learning Algorithm

Back Propagation

Overview of Back Propagation Algorithm

Principles of Back Pain

Physical Principles of Propagation

Back Propagation Learning Algorithm

Haar Wavelets

Lecture 3: CNN: Back-propagation

Back Propagation Neural Networks BPNN

Lecture 11. MLP (III): Back-Propagation

Back-propagation network (BPN)

Back-propagation

Lecture 11. MLP (III): Back-Propagation

Back Propagation