Recurrent Neural Networks & LSTM

Recurrent Neural Networks & LSTM Advisor: S. J. Wang, F. T. Chien Student: M. C. Sun 20150226

Outline • Neural Network • Recurrent Neural Network (RNN) • Introduction • Training • Long Short-Term Memory (LSTM) • Evolution • Architecture • Connectionist Temporal Classification (CTC) • Application • Speech recognition • Architecture

Neural Network • Inspired by human’s neural system • A complicated architecture • With some specific limitations • Ex. DBN, CNN…

Feedforward Neural Network … • Define width & depth of neural net • In training step • Given input data and their targets • Learn weights 1 output hidden …… 1 hidden 1 Input • h : activation function • σ : sigmoid function/ softmax function

Training in Neural Network output unit k compared with target at t error function … 1 output hidden ‧ …… 1 hidden 1 Input Forward propagation

Training in Neural Network output unit k compared with target at t error function … 1 output hidden …… 1 hidden 1 Input Forward propagation Backpropagation

RNN A deepest neural network

Recurrent Neural Network • A network of neurons with feedback connections • For time-varying input • It’s good at temporal processing and sequence learning input input time time

Recurrent Neural Network • For supervised learning • Training: back propagation through time output output • hidden Unfolding over time • hidden input input t=5 t=4 t=1 t=2 t=3

Unidirectional RNN Output Unroll Hidden t-1 t Input t+1 State of hidden node can represent short-term memory by connecting to itself at previous state

Bidirectional RNN

Bidirectional RNN Output Forward States Backward States Input t-1 t t+1

Training in RNN output unit k compared with target at t error function • Feedforward • f(x): activation function • Back propagation through time output k • hidden j input t=4 t=1 t=2 t=3 t=5

Training in RNN output unit k compared with target at t error function • Feedforward • f(x): logistic function • Back propagation through time output k • hidden j input t=5 t=4 t=1 t=2 t=3

Training in RNN output unit k compared with target at t error function • Feedforward • f(x): logistic function • Back propagation through time output k • hidden j input t=5 t=4 t=1 t=2 t=3 Maximal value f’=0.25 Vanishing gradient problem

LSTM One solution to vanishing gradient when training RNN

Long Short-Term Memory (LSTM) • For conventional RNN, • Hidden state with self-cycle can only represents short short-term memory due to the vanishing gradient problem • Invented by Hochreiter & Schmidhuber (1997) • Longshort-term memoryis designed to allow hidden state to retain important information over longer period of time • Replace neurons’ structure of hidden layer in RNN • Reference: Long Short-Term Memory. Sepp Hochreiter, JuergenSchmidhuber. (1997)

Long Short-Term Memory (LSTM)

Constant Error Carrousel (CEC) • Let error flow passes through CEC • Enforce non-decay error flow back into time • Connection to the unit itself = 1 • Introduce cell concept Self-cycle = 1 t=4 t=5 t=1 t=2 t=3

Gate • Weight conflict: all data come into nodes included important or irrelevant information • Introduce gate concept • From input to hidden • To protect cell from irrelevant input & storing value in cell • From hidden to output • To protect output from irrelevant cell & output cell’s value

Long Short-Term Memory Structure • LSTM (CEC+gate) • Cell: storing memory • Input gate: protect the cell from irrelevant input • Output gate: protect irrelevant other units from cell • Forget gate: reset storing memory in cell • Peephole connection: improve precise timing

Long Short-Term Memory Structure • By using CEC to achieve long short-tem memory • Learning gates’ weights to get the model

Recurrent Neural Networks & LSTM