390 likes | 547 Views
Introduction to Neural Networks. Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and Complex and Adaptive Systems Labs University College Dublin gianluca.pollastri@ucd.ie. Credits. Geoffrey Hinton, University of Toronto.
E N D
Introduction to Neural Networks Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and Complex and Adaptive Systems Labs University College Dublin gianluca.pollastri@ucd.ie
Credits • Geoffrey Hinton, University of Toronto. • borrowed some of his slides for “Neural Networks” and “Computation in Neural Networks” courses. • Paolo Frasconi, University of Florence. • This guy taught me Neural Networks in the first place (*and* I borrowed some of his slides too!).
Recurrent Neural Networks (RNN) • One of the earliest versions: Jeffrey Elman, 1990, Cognitive Science. • Problem: it isn’t easy to represent time with Feedforward Neural Nets: usually time is represented with space. • Attempt to design networks with memory.
RNNs • The idea is having discrete time steps, and considering the hidden layer at time t-1 as an input at time t. • This effectively removes cycles: we can model the network using an FFNN, and model memory explicitly.
It Xt Ot d d = delay element
BPTT • BackPropagation Through Time. • If Ot is the output at time t, It the input at time t, and Xt the memory (hidden) at time t, we can model the dependencies as follows:
BPTT • We can model both f() and g() with (possibly multilayered) networks. • We can transform the recurrent network by unrolling it in time. • Backpropagation works on any DAG. An RNN becomes one once it’s unrolled.
It Xt Ot d d = delay element
It Xt Ot It+1 Xt+1 Ot+1 It-1 Xt-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Xt-2 Ot-2
gradient in BPTT • GRADIENT(I,O,T) { • # I=inputs, O=outputs, T=targets • T := size(O); • X0 := 0; • for t := 1..T • Xt := f( Xt-1 , It ); • for t := 1..T { • Ot := g( Xt , It ); • g.gradient( Ot - Tt ); • δt = g.deltas( Ot - Tt ); • } • for t := T..1 • f.gradient(δt ); • δt-1 += f.deltas(δt ); • }
It Xt Ot It+1 Xt+1 Ot+1 It-1 Xt-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Xt-2 Ot-2
It Xt Ot It+1 Xt+1 Ot+1 It-1 Xt-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Xt-2 Ot-2
It Xt Ot It+1 Xt+1 Ot+1 It-1 Xt-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Xt-2
It Xt Ot It+1 Xt+1 Ot+1 It-1 Ot-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Xt-2 Xt-1
It Ot It+1 Xt+1 Ot+1 It-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Ot-1 Xt-2 Xt-1 Xt
It It+1 Ot+1 It-1 It+2 Xt+2 Ot+2 It-2 Ot-2 Ot-1 Ot Xt-2 Xt-1 Xt Xt+1
It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2
It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2
It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2
It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2
It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2
It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2
It It+1 It-1 It+2 It-2 Ot-2 Ot-1 Ot Ot+1 Ot+2 Xt-2 Xt-1 Xt Xt+1 Xt+2
What I will talk about • Neurons • Multi-Layered Neural Networks: • Basic learning algorithm • Expressive power • Classification • How can we *actually* train Neural Networks: • Speeding up training • Learning just right (not too little, not too much) • Figuring out you got it right • Feed-back networks? • Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann Machines) • Recurrent Neural Networks • Bidirectional RNN • 2D-RNN • Concluding remarks
BRNN Ft = ( Ft-1 , Ut ) Bt = ( Bt+1 , Ut ) Yt = ( Ft , Bt , Ut ) • () () ed () are realised with NN • (), () and () are independent from t: stationary
BRNN Ft = ( Ft-1 , Ut ) Bt = ( Bt+1 , Ut ) Yt = ( Ft , Bt , Ut ) • () () ed () are realised with NN • (), () and () are independent from t: stationary
BRNN Ft = ( Ft-1 , Ut ) Bt = ( Bt+1 , Ut ) Yt = ( Ft , Bt , Ut ) • () () ed () are realised with NN • (), () and () are independent from t: stationary
BRNN Ft = ( Ft-1 , Ut ) Bt = ( Bt+1 , Ut ) Yt = ( Ft , Bt , Ut ) • () () ed () are realised with NN • (), () and () are independent from t: stationary
Inference in BRNNs • FORWARD(U) { • T size(U); • F0 BT+1 0; • for t 1..T • Ft = ( Ft-1 , Ut ); • for t T..1 • Bt = ( Bt+1 , Ut ); • for t 1..T • Yt = ( Ft , Bt , Ut ); • return Y; • }
GRADIENT(U,Y) { T size(U); F0 BT+1 0; for t 1..T Ft = ( Ft-1 , Ut ); for t T..1 Bt = ( Bt+1 , Ut ); for t 1..T { Yt = ( Ft , Bt , Ut ); [δFt, δBt] = .backprop&gradient( Yt - Yt ); } for t T..1 δFt-1 += .backprop&gradient(δFt ); for t 1..T δBt+1 += .backprop&gradient(δBt ); } Learning in BRNNs
What I will talk about • Neurons • Multi-Layered Neural Networks: • Basic learning algorithm • Expressive power • Classification • How can we *actually* train Neural Networks: • Speeding up training • Learning just right (not too little, not too much) • Figuring out you got it right • Feed-back networks? • Anecdotes on real feed-back networks (Hopfield Nets, Boltzmann Machines) • Recurrent Neural Networks • Bidirectional RNN • 2D-RNN • Concluding remarks
2D RNNs Pollastri & Baldi 2002, Bioinformatics Baldi & Pollastri 2003, JMLR