Neural Network for Machine translation

Neural Network for Machine translation Wan- Ru, Lin 2015/12/15

Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference

Introduction • Use of software programs which have been specifically designed to translate both verbal and written texts from one language to another • Advantages of machine translation over human translation • You don't have to spend hours poring over dictionaries to translate the words • It is comparatively cheap • Giving sensitive data to a translator might be risky while with machine translation your information is protected

Introduction • Rule-based machine translation • large collections of rules, manually developed over time by human experts, that map structures from the source to the target language • Shortage : costly and complicated to implement • Statistical machine translation • a computer algorithm that explores millions of possible ways of putting the small pieces together, looking for the translation that statistically looks best

Introduction • Statistical machine translation becomes popular with the arise of Neural Network • Use lots of training sentence to teach machine about human language • Not just select an appropriate sentence in the target corpus but learn both the syntax and semantics of the language

Background-Feedforward Neural Network • Powerful machine learning model • Limit : we have to know the length of both source and target sentence

Background-Recurrent Neural Network • RNNs can use their internal memory to process arbitrary sequences of inputs • Advantage : suitable for speech recognition and handwriting recognition

Background-Long Short-Term Memory • Deal with long sentences • Input gate : blocking that value from entering into the next layer • forget gate : the block will effectively forget whatever value it was remembering • Output gate : determine when the unit should output the value in its memory

Background-Long Short-Term Memory • In real world, we use more than just two cells at each layer

Background-Bidirectional RNN • remember each state of each hidden unit • Forward -> The cat sat on the map • Backward -> Map the on sat cat the The sat cat

Neural Probabilistic Language Model • The goal of statistical language modeling is to learn the joint probability function of sequences of words in a language • Input : first n-1 words (1-of-N mapping) • Output : a vector whose -th element estimates the probability • Ex: The(0,1,0,0,..0,0) cat (0,0,…1,….,00) sat on the map The cat sat on the

Neural Probabilistic Language Model-Learning a distributed representation for word • Curse of dimensionality : A word sequence on which the model will be tested is likely to be different from all the word sequences seen during training • Ex : If one wants to model the joint distribution of 10 consecutive words in a natural language with a vocabulary V of size 100,000, there are potentially −1 = −1 free parameters • Too expensive and unrealistic!

Neural Probabilistic Language Model-Learning a distributed representation for word • Solution : Use continuous representation that capture a large number of precise syntactic and semantic word relationship • Input vocabulary transformation :

Neural Probabilistic Language Model-Learning a distributed representation for word • Word embedding matrix • (2003) Feed-forward Neural Net Language Model (NNLM) • Maximize the training corpus penalized log-likelihood • matrix number of free parameter • : probability function over words

Neural Probabilistic Language Model-Learning a distributed representation for word • The number of features = 30

Neural Probabilistic Language Model-Learning a distributed representation for word • Novel model for computing continuous vector representations of words • Recurrent Language Model • Continuous Bag-of-Words Model • Continuous Skip-gram Model

Neural Probabilistic Language Model-Learning a distributed representation for word • Word relationship in vector form • Big is similar to biggestin the same sense that small is similar to smallest • = - + =

Neural Probabilistic Language Model-Learning a distributed representation for word • Transform the Input to lower dimension, before sent into the translation system • Lower computation load • Encode meaningful things inside the vector • Use the same concept to map the phrases or sentence to the continuous space

Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks

RecurrentContinuous Translation Model Ⅰ • Recurrent Continuous Translation Model Ⅰ(RCTM Ⅰ)use a convolution sentence model (CSM) in the conditioning architecture

RecurrentContinuous Translation Model Ⅰ • Architecture of Recurrent Language Model (RLM) • Input vocabulary transformation : • Recurrent transformation : • Output vocabulary transformation : • Output :

RecurrentContinuous Translation Model Ⅰ • Convolution Sentence Model (CSM) models the continuous representation of a sentence based on the continuous representations of the words in the sentence • : kernels of the convolution ⇒ learnt feature detectors • Local first, then global

RecurrentContinuous Translation Model Ⅰ • Turn a sequence of words into a vector of fixed dimensionality

RecurrentContinuous Translation Model Ⅱ • Convolution -gram model is obtained by truncating the CSM at the level

RecurrentContinuous Translation Model • Convolutional neural networks might lose the ordering of words • RCTM Ⅱ got better performance

RNN Encoder-Decoder • Fixed-length vector representation • Modified LSTM (forgetgate, input gate) • 1000 hidden units (dimension of c = 1000)

RNN Encoder-Decoder • Vector representation c

Neural Machine Translation by Jointly Learning to Align and Translate • Deal with long sentence • Apply bidirectional RNN (BiRNN) to remember each state of each hidden unit • 1000 forward and 1000 backward hidden unit

Neural Machine Translation by Jointly Learning to Align and Translate • RNNsearch vs. RNNencdec • Testing set : WMT’14

Sequence to Sequence Learning with Neural Networks • Fixed-length vector representation • LSTM learns much better when the source sentences are reversed • Perplexity dropped from 5.8 to 4.7 • BLEU score increased from 25.9 to 30.6 • Deep LSTMs with 4 layers • 1000 LSTMs at each layer

Sequence to Sequence Learning with Neural Networks • Directly translate • Rescoring

Sequence to Sequence Learning with Neural Networks

Conclusion • Sentence representations capture their meaning • Similar meanings are close together • Different sentence will be far • In the future, it is possible for the machine to extract important messages from long paragraph

Reference • Long short-term memory, S. Hochreiter and J. Schmidhuber, 1997 • A Neural Probabilistic Language Model, Y. Bengio, 2003. • Distributed Representations of Words and Phrases and their Compositionality, Tomas Mikolov, 2013 • Efficient Estimation of Word Representations in Vector Space, Tomas Mikolov, 2013 • Recurrent Continuous Translation Models, NalKalchbrenner, Phil Blunsom, 2013. • Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, Cho et al., 2014. • Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry B. and KyungHyun C., 2014. • Sequence to Sequence Learning with Neural Networks, Sutskever et al., 2014.

Neural Network for Machine translation

Neural Network for Machine translation

Presentation Transcript

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Artificial Neural Network for Machine Learning – Structure & Layers

Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Statistical and Neural Machine Translation Part I - Introduction

Machine Translation

Machine Translation

Effective Approaches to attention based neural machine translation