Introduction to Neural Networks

Introduction to Neural Networks Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and Complex and Adaptive Systems Labs University College Dublin gianluca.pollastri@ucd.ie

Credits • Geoffrey Hinton, University of Toronto. • borrowed some of his slides for “Neural Networks” and “Computation in Neural Networks” courses. • Paolo Frasconi, University of Florence. • This guy taught me Neural Networks in the first place (*and* I borrowed some of his slides too!).

Recurrent Neural Networks (RNN) • One of the earliest versions: Jeffrey Elman, 1990, Cognitive Science. • Problem: it isn’t easy to represent time with Feedforward Neural Nets: usually time is represented with space. • Attempt to design networks with memory.

RNNs • The idea is having discrete time steps, and considering the hidden layer at time t-1 as an input at time t. • This effectively removes cycles: we can model the network using an FFNN, and model memory explicitly.

It Xt Ot d d = delay element

BPTT • BackPropagation Through Time. • If Ot is the output at time t, It the input at time t, and Xt the memory (hidden) at time t, we can model the dependencies as follows:

BPTT • We can model both f() and g() with (possibly multilayered) networks. • We can transform the recurrent network by unrolling it in time. • Backpropagation works on any DAG. An RNN becomes one once it’s unrolled.

It Xt Ot d d = delay element

It Xt Ot It+1 Xt+1 Ot+1 It-1 Xt-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Xt-2 Ot-2

gradient in BPTT • GRADIENT(I,O,T) { • # I=inputs, O=outputs, T=targets • T := size(O); • X0 := 0; • for t := 1..T • Xt := f( Xt-1 , It ); • for t := 1..T { • Ot := g( Xt , It ); • g.gradient( Ot - Tt ); • δt = g.deltas( Ot - Tt ); • } • for t := T..1 • f.gradient(δt ); • δt-1 += f.deltas(δt ); • }

It Xt Ot It+1 Xt+1 Ot+1 It-1 Xt-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Xt-2 Ot-2

It Xt Ot It+1 Xt+1 Ot+1 It-1 Xt-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Xt-2

It Xt Ot It+1 Xt+1 Ot+1 It-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Xt-2 Xt-1

It Ot It+1 Xt+1 Ot+1 It-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Ot-1 Xt-2 Xt-1 Xt

It It+1 Ot+1 It-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Ot-1 Ot Xt-2 Xt-1 Xt Xt+1

It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2

What I will talk about • Neurons • Multi-Layered Neural Networks: • Basic learning algorithm • Expressive power • Classification • How can we *actually* train Neural Networks: • Speeding up training • Learning just right (not too little, not too much) • Figuring out you got it right • Feed-back networks? • Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann Machines) • Recurrent Neural Networks • Bidirectional RNN • 2D-RNN • Concluding remarks

Bidirectional Recurrent Neural Networks (BRNN)

BRNN Ft = ( Ft-1 , Ut ) Bt = ( Bt+1 , Ut ) Yt = ( Ft , Bt , Ut ) • () () ed () are realised with NN • (), () and () are independent from t: stationary

Inference in BRNNs • FORWARD(U) { • T  size(U); • F0  BT+1  0; • for t  1..T • Ft = ( Ft-1 , Ut ); • for t  T..1 • Bt = ( Bt+1 , Ut ); • for t  1..T • Yt = ( Ft , Bt , Ut ); • return Y; • }

GRADIENT(U,Y) { T  size(U); F0  BT+1  0; for t  1..T Ft = ( Ft-1 , Ut ); for t  T..1 Bt = ( Bt+1 , Ut ); for t  1..T { Yt = ( Ft , Bt , Ut ); [δFt, δBt] = .backprop&gradient( Yt - Yt ); } for t  T..1 δFt-1 += .backprop&gradient(δFt ); for t  1..T δBt+1 += .backprop&gradient(δBt ); } Learning in BRNNs

What I will talk about • Neurons • Multi-Layered Neural Networks: • Basic learning algorithm • Expressive power • Classification • How can we *actually* train Neural Networks: • Speeding up training • Learning just right (not too little, not too much) • Figuring out you got it right • Feed-back networks? • Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann Machines) • Recurrent Neural Networks • Bidirectional RNN • 2D-RNN • Concluding remarks

2D RNNs Pollastri & Baldi 2002, Bioinformatics Baldi & Pollastri 2003, JMLR

2D RNNs

Introduction to Neural Networks