1 / 41

Neural Network for Machine translation

Neural Network for Machine translation. Wan- Ru, Lin 2015/12/15. Outline. Introduction Background Recurrent Neural Network (RNN) LSTM Bidirectional RNN ( BiRNN ) Neural Probabilistic Language Model Neural Machine Translation Conclusion Reference. Introduction.

kbirch
Download Presentation

Neural Network for Machine translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Network for Machine translation Wan- Ru, Lin 2015/12/15

  2. Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference

  3. Introduction • Use of software programs which have been specifically designed to translate both verbal and written texts from one language to another • Advantages of machine translation over human translation • You don't have to spend hours poring over dictionaries to translate the words • It is comparatively cheap • Giving sensitive data to a translator might be risky while with machine translation your information is protected

  4. Introduction • Rule-based machine translation • large collections of rules, manually developed over time by human experts, that map structures from the source to the target language • Shortage : costly and complicated to implement • Statistical machine translation • a computer algorithm that explores millions of possible ways of putting the small pieces together, looking for the translation that statistically looks best

  5. Introduction • Statistical machine translation becomes popular with the arise of Neural Network • Use lots of training sentence to teach machine about human language • Not just select an appropriate sentence in the target corpus but learn both the syntax and semantics of the language

  6. Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference

  7. Background-Feedforward Neural Network • Powerful machine learning model • Limit : we have to know the length of both source and target sentence

  8. Background-Recurrent Neural Network • RNNs can use their internal memory to process arbitrary sequences of inputs • Advantage : suitable for speech recognition and handwriting recognition

  9. Background-Long Short-Term Memory • Deal with long sentences • Input gate : blocking that value from entering into the next layer • forget gate : the block will effectively forget whatever value it was remembering • Output gate : determine when the unit should output the value in its memory

  10. Background-Long Short-Term Memory • In real world, we use more than just two cells at each layer

  11. Background-Bidirectional RNN • remember each state of each hidden unit • Forward -> The cat sat on the map • Backward -> Map the on sat cat the The sat cat

  12. Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference

  13. Neural Probabilistic Language Model • The goal of statistical language modeling is to learn the joint probability function of sequences of words in a language • Input : first n-1 words (1-of-N mapping) • Output : a vector whose -th element estimates the probability • Ex: The(0,1,0,0,..0,0) cat (0,0,…1,….,00) sat on the map The cat sat on the

  14. Neural Probabilistic Language Model-Learning a distributed representation for word • Curse of dimensionality : A word sequence on which the model will be tested is likely to be different from all the word sequences seen during training • Ex : If one wants to model the joint distribution of 10 consecutive words in a natural language with a vocabulary V of size 100,000, there are potentially −1 = −1 free parameters • Too expensive and unrealistic!

  15. Neural Probabilistic Language Model-Learning a distributed representation for word • Solution : Use continuous representation that capture a large number of precise syntactic and semantic word relationship • Input vocabulary transformation :

  16. Neural Probabilistic Language Model-Learning a distributed representation for word • Word embedding matrix • (2003) Feed-forward Neural Net Language Model (NNLM) • Maximize the training corpus penalized log-likelihood • matrix number of free parameter • : probability function over words

  17. Neural Probabilistic Language Model-Learning a distributed representation for word • The number of features = 30

  18. Neural Probabilistic Language Model-Learning a distributed representation for word • Novel model for computing continuous vector representations of words • Recurrent Language Model • Continuous Bag-of-Words Model • Continuous Skip-gram Model

  19. Neural Probabilistic Language Model-Learning a distributed representation for word • Word relationship in vector form • Big is similar to biggestin the same sense that small is similar to smallest • = - + =

  20. Neural Probabilistic Language Model-Learning a distributed representation for word • Transform the Input to lower dimension, before sent into the translation system • Lower computation load • Encode meaningful things inside the vector • Use the same concept to map the phrases or sentence to the continuous space

  21. Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference

  22. Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks

  23. RecurrentContinuous Translation Model Ⅰ • Recurrent Continuous Translation Model Ⅰ(RCTM Ⅰ)use a convolution sentence model (CSM) in the conditioning architecture

  24. RecurrentContinuous Translation Model Ⅰ • Architecture of Recurrent Language Model (RLM) • Input vocabulary transformation : • Recurrent transformation : • Output vocabulary transformation : • Output :

  25. RecurrentContinuous Translation Model Ⅰ • Convolution Sentence Model (CSM) models the continuous representation of a sentence based on the continuous representations of the words in the sentence • : kernels of the convolution ⇒ learnt feature detectors • Local first, then global

  26. RecurrentContinuous Translation Model Ⅰ • Turn a sequence of words into a vector of fixed dimensionality

  27. RecurrentContinuous Translation Model Ⅱ • Convolution -gram model is obtained by truncating the CSM at the level

  28. RecurrentContinuous Translation Model • Convolutional neural networks might lose the ordering of words • RCTM Ⅱ got better performance

  29. Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks

  30. RNN Encoder-Decoder • Fixed-length vector representation • Modified LSTM (forgetgate, input gate) • 1000 hidden units (dimension of c = 1000)

  31. RNN Encoder-Decoder • Vector representation c

  32. Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks

  33. Neural Machine Translation by Jointly Learning to Align and Translate • Deal with long sentence • Apply bidirectional RNN (BiRNN) to remember each state of each hidden unit • 1000 forward and 1000 backward hidden unit

  34. Neural Machine Translation by Jointly Learning to Align and Translate • RNNsearch vs. RNNencdec • Testing set : WMT’14

  35. Neural Translation Model • 2013 Recurrent Continuous Translation Models • 2014 RNN Encoder-Decoder • 2014 Neural Machine Translation by Jointly Learning to Align and Translate • 2014 Sequence to Sequence Learning with Neural Networks

  36. Sequence to Sequence Learning with Neural Networks • Fixed-length vector representation • LSTM learns much better when the source sentences are reversed • Perplexity dropped from 5.8 to 4.7 • BLEU score increased from 25.9 to 30.6 • Deep LSTMs with 4 layers • 1000 LSTMs at each layer

  37. Sequence to Sequence Learning with Neural Networks • Directly translate • Rescoring

  38. Sequence to Sequence Learning with Neural Networks

  39. Outline • Introduction • Background • Recurrent Neural Network (RNN) • LSTM • Bidirectional RNN (BiRNN) • Neural Probabilistic Language Model • Neural Machine Translation • Conclusion • Reference

  40. Conclusion • Sentence representations capture their meaning • Similar meanings are close together • Different sentence will be far • In the future, it is possible for the machine to extract important messages from long paragraph

  41. Reference • Long short-term memory, S. Hochreiter and J. Schmidhuber, 1997 • A Neural Probabilistic Language Model, Y. Bengio, 2003. • Distributed Representations of Words and Phrases and their Compositionality, Tomas Mikolov, 2013 • Efficient Estimation of Word Representations in Vector Space, Tomas Mikolov, 2013 • Recurrent Continuous Translation Models, NalKalchbrenner, Phil Blunsom, 2013. • Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, Cho et al., 2014. • Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry B. and KyungHyun C., 2014. • Sequence to Sequence Learning with Neural Networks, Sutskever et al., 2014.

More Related