460 likes | 571 Views
Neural Network for Machine translation. Wan- Ru, Lin 2015/12/15. Outline. Introduction Background Recurrent Neural Network (RNN) LSTM Bidirectional RNN ( BiRNN ) Neural Probabilistic Language Model Neural Machine Translation Conclusion Reference. Introduction.
E N D
Neural Network for Machine translation Wan- Ru, Lin 2015/12/15
Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference
Introduction • Use of software programs which have been specifically designed to translate both verbal and written texts from one language to another • Advantages of machine translation over human translation • You don't have to spend hours poring over dictionaries to translate the words • It is comparatively cheap • Giving sensitive data to a translator might be risky while with machine translation your information is protected
Introduction • Rule-based machine translation • large collections of rules, manually developed over time by human experts, that map structures from the source to the target language • Shortage : costly and complicated to implement • Statistical machine translation • a computer algorithm that explores millions of possible ways of putting the small pieces together, looking for the translation that statistically looks best
Introduction • Statistical machine translation becomes popular with the arise of Neural Network • Use lots of training sentence to teach machine about human language • Not just select an appropriate sentence in the target corpus but learn both the syntax and semantics of the language
Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference
Background-Feedforward Neural Network • Powerful machine learning model • Limit : we have to know the length of both source and target sentence
Background-Recurrent Neural Network • RNNs can use their internal memory to process arbitrary sequences of inputs • Advantage : suitable for speech recognition and handwriting recognition
Background-Long Short-Term Memory • Deal with long sentences • Input gate : blocking that value from entering into the next layer • forget gate : the block will effectively forget whatever value it was remembering • Output gate : determine when the unit should output the value in its memory
Background-Long Short-Term Memory • In real world, we use more than just two cells at each layer
Background-Bidirectional RNN • remember each state of each hidden unit • Forward -> The cat sat on the map • Backward -> Map the on sat cat the The sat cat
Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference
Neural Probabilistic Language Model • The goal of statistical language modeling is to learn the joint probability function of sequences of words in a language • Input : first n-1 words (1-of-N mapping) • Output : a vector whose -th element estimates the probability • Ex: The(0,1,0,0,..0,0) cat (0,0,…1,….,00) sat on the map The cat sat on the
Neural Probabilistic Language Model-Learning a distributed representation for word • Curse of dimensionality : A word sequence on which the model will be tested is likely to be different from all the word sequences seen during training • Ex : If one wants to model the joint distribution of 10 consecutive words in a natural language with a vocabulary V of size 100,000, there are potentially −1 = −1 free parameters • Too expensive and unrealistic!
Neural Probabilistic Language Model-Learning a distributed representation for word • Solution : Use continuous representation that capture a large number of precise syntactic and semantic word relationship • Input vocabulary transformation :
Neural Probabilistic Language Model-Learning a distributed representation for word • Word embedding matrix • (2003) Feed-forward Neural Net Language Model (NNLM) • Maximize the training corpus penalized log-likelihood • matrix number of free parameter • : probability function over words
Neural Probabilistic Language Model-Learning a distributed representation for word • The number of features = 30
Neural Probabilistic Language Model-Learning a distributed representation for word • Novel model for computing continuous vector representations of words • Recurrent Language Model • Continuous Bag-of-Words Model • Continuous Skip-gram Model
Neural Probabilistic Language Model-Learning a distributed representation for word • Word relationship in vector form • Big is similar to biggestin the same sense that small is similar to smallest • = - + =
Neural Probabilistic Language Model-Learning a distributed representation for word • Transform the Input to lower dimension, before sent into the translation system • Lower computation load • Encode meaningful things inside the vector • Use the same concept to map the phrases or sentence to the continuous space
Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference
Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks
RecurrentContinuous Translation Model Ⅰ • Recurrent Continuous Translation Model Ⅰ(RCTM Ⅰ)use a convolution sentence model (CSM) in the conditioning architecture
RecurrentContinuous Translation Model Ⅰ • Architecture of Recurrent Language Model (RLM) • Input vocabulary transformation : • Recurrent transformation : • Output vocabulary transformation : • Output :
RecurrentContinuous Translation Model Ⅰ • Convolution Sentence Model (CSM) models the continuous representation of a sentence based on the continuous representations of the words in the sentence • : kernels of the convolution ⇒ learnt feature detectors • Local first, then global
RecurrentContinuous Translation Model Ⅰ • Turn a sequence of words into a vector of fixed dimensionality
RecurrentContinuous Translation Model Ⅱ • Convolution -gram model is obtained by truncating the CSM at the level
RecurrentContinuous Translation Model • Convolutional neural networks might lose the ordering of words • RCTM Ⅱ got better performance
Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks
RNN Encoder-Decoder • Fixed-length vector representation • Modified LSTM (forgetgate, input gate) • 1000 hidden units (dimension of c = 1000)
RNN Encoder-Decoder • Vector representation c
Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks
Neural Machine Translation by Jointly Learning to Align and Translate • Deal with long sentence • Apply bidirectional RNN (BiRNN) to remember each state of each hidden unit • 1000 forward and 1000 backward hidden unit
Neural Machine Translation by Jointly Learning to Align and Translate • RNNsearch vs. RNNencdec • Testing set : WMT’14
Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks • Fixed-length vector representation • LSTM learns much better when the source sentences are reversed • Perplexity dropped from 5.8 to 4.7 • BLEU score increased from 25.9 to 30.6 • Deep LSTMs with 4 layers • 1000 LSTMs at each layer
Sequence to Sequence Learning with Neural Networks • Directly translate • Rescoring
Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference
Conclusion • Sentence representations capture their meaning • Similar meanings are close together • Different sentence will be far • In the future, it is possible for the machine to extract important messages from long paragraph
Reference • Long short-term memory, S. Hochreiter and J. Schmidhuber, 1997 • A Neural Probabilistic Language Model, Y. Bengio, 2003. • Distributed Representations of Words and Phrases and their Compositionality, Tomas Mikolov, 2013 • Efficient Estimation of Word Representations in Vector Space, Tomas Mikolov, 2013 • Recurrent Continuous Translation Models, NalKalchbrenner, Phil Blunsom, 2013. • Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, Cho et al., 2014. • Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry B. and KyungHyun C., 2014. • Sequence to Sequence Learning with Neural Networks, Sutskever et al., 2014.