1 / 28

Generating Text with Recurrent Neural Networks

Generating Text with Recurrent Neural Networks. Ilya Sutskever , James Martens, and Geoffrey Hinton, ICML 2011 2013-4-1 Institute of Electronics, NCTU 指導教授 : 王聖智 S. J. Wang 學生 : 陳冠廷 K. T. Chen . Outline. Introduction Motivation What is the RNN ?

leane
Download Presentation

Generating Text with Recurrent Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating Text with Recurrent Neural Networks IlyaSutskever, James Martens, and Geoffrey Hinton, ICML 2011 2013-4-1 Institute of Electronics, NCTU 指導教授: 王聖智 S. J. Wang 學生 : 陳冠廷 K. T. Chen

  2. Outline • Introduction • Motivation • What is the RNN? • Why do we choose RNN to solve this problem? • How to train RNN? • Contribution • Character-Level language modeling • The multiplicative RNN • The Experiments • Discussion

  3. Outline • Introduction • Motivation • What is the RNN? • Why do we choose RNN to solve this problem? • How to train RNN? • Contribution • Character-Level language modeling • The multiplicative RNN • The Experiments • Discussion

  4. Movation • Read some sentences and then try to predict next character. Easter is a Christian festival and holiday celebrating the resurrection of Jesus Christ on the third day after his crucifixion at Calvary as described in the New Testament. ? Easter is a Christian festival and holiday celebrating the resurrection of Jesus Christ..

  5. Recurrent neural networks Feed-forward neural network Recurrent neural network • A recurrent neural network (RNN) is a class of neural network where connections between units form a directed cycle output output hidden hidden input input

  6. Why do we choose RNNs? • RNNs are suitable to deal with sequential data.(memory) • RNNs are neural network in time predictions hiddens inputs t +1 t -1 t time

  7. How to train RNN? • Backpropagation through time (BPTT) • The gradient is easy to compute with backpropagation. • RNNs learn by minimizing the training error. predictions hiddens inputs t +1 t -1 t time

  8. RNNs are hard to train • They can be volatile and can exhibit long-range sensitivity to small parameter perturbations. • The Butterfly Effect • The “vanishing gradient problem” makes gradient descent ineffective. outputs hiddens inputs time

  9. How to overcome vanishing gradient? • Long-short term memory.(LSTM) • Modify the architecture of neural network. • Hessian-Free optimizer. (James Martens et al. 2011.) • Base on the Newton’s method + conjugate gradient algorithm • Echo State network. • Only learn the hidden-output weighted. data write read keep

  10. Outline • Introduction • Motivation • What is the RNN? • Why do we choose RNN to solve this problem? • How to train RNN? • Contribution • Character-Level language modeling • The multiplicative RNN • The Experiments • Discussion

  11. Character-Level language modeling • The RNN observes a sequence of characters. • The target output at each time step is defined as the input character at the next time-step. l e l o target Hidden state stores relevant Information. “Hello” “Hell” “Hel” “He” “H” H e l l o

  12. The standard RNN • The current input is transformed via the visible-to-hidden weight matrix ,and then contributes additively to the input for the current hidden state. ……. ……. Softmax …… H Predict distribution for next character. character: 1-of-86

  13. Some motivation from model a tree • Each node is a hidden state vector. The next character must transform this to a new node. • The next hidden state needs to depend on the conjunction of the current character and the current hidden representation. ..fix e i ..fixi ..fixe n .fixin

  14. The Multiplicative RNN • They tried several neural network architectures and found the “Multiplicative-RNN” (MRNN) to be more effective than the regular RNN The weight matrix is chosen by the current character Current input character

  15. The Multiplicative RNN • Naïve implementation : assign a matrix to each character • This requires a lot of parameters. (86*1500*1500) • This could make the net overfit. • Difficult to parallelize on a GPU • Factorize the matrices of each character • Fewer parameters • Easier to parallelize

  16. The Multiplicative RNN • We can get groups a and b to interact multiplicatively by using “factors” f Group c Group a Scalar coefficient Outer product transition matrix with rank 1 Group b

  17. The Multiplicative RNN ……. 1500 hidden units ……. 1500 hidden units f ……. …… H character: 1-of-86 Predict distribution for next character. Each factor f defines a rank one matrix,

  18. The Multiplicative RNN Output Input characters t - 1 t t + 1 t +2 Time

  19. The Multiplicative RNN:Key advantages • The MRNN combines conjunction of contexts and characters more easily: • The MRNN has two nonlinearities per timestep,whick make its dynamics even ricker and more powerful. Predict “i,e,_” Predict “n” fix fixi i

  20. Outline • Introduction • Motivation • What is the RNN? • Why do we choose RNN to solve this problem? • How to train RNN? • Contribution • Character-Level language modeling • The multiplicative RNN • The Experiments • Discussion

  21. The Experiments • Training on three large datasets • ~1GB of the English Wikipedia • ~1GB of articles from New York Times • ~100MB of JMLR and AISTATS paper • Compare with the Sequence Memorizer (Wood et al.) and PAQ (Mahoney et al.)

  22. Training on subsequences millions long This is an extremely long string of text…………………………………………………………………………. 250 This is an extre ……… Compute the gradient and the curvature on subset of the subsequences. Use a different subset at each iteration his is an extrem…….. is is an extreme……... The subsequences s is an extremel……... is an extremely……... is an extremely ....... s an extremely l....... …..

  23. Parallelization • Use HF optimizer to evaluate the gradient and curvature on large minibatches of data. Data GPU + GPU gradient GPU GPU GPU + GPU curvature GPU GPU

  24. The architecture of model • Use 1500 hidden units and 1500 multiplicative factors on 250-long sequences. • Arguably the largest and deepest neural network ever trained. ...... predicetions 1500 ...... …. …. …. …. …. …. hiddens …. …. ...... input 500

  25. Demo • The MRNN extracts “higher level information”, stores it for many timesteps,and uses it to make a prediction. • Parentheses sensitivity • (Wlching et al. 2005) the latter has received numerical testimony without much deeply grow • (Wlching, Wulungching, Alching, Blching, Clching et al." 2076) and Jill Abbas, The Scriptures reported that Achsia and employed a • the sequence memoizer (Wood et al McWhitt), now called "The Fair Savings.'"" interpreted a critic. In t • Wlchingethics, like virtual locations. The signature tolerator is necessary to en • Wlching et al., or Australia Christi and an undergraduate work in over knowledge, inc • They often talk about examples as of January 19, . The "Hall Your Way" (NRF film) and OSCIP • Her image was fixed through an archipelago's go after Carol^^'s first century, but simply to

  26. Outline • Introduction • Motivation • What is the RNN? • Why do we choose RNN to solve this problem? • How to train RNN? • Contribution • Character-Level language modeling • The multiplicative RNN • The Experiments • Discussion

  27. Discussion • The MRNN model generated text contains very few non-words. (e.g., “cryptoliation”, “homosomalist”). This let MRNN can deal with real words that it didn’t see in the training set. • If they have more computational power, they could train much bigger MRNNs with millions of units and billions of connections.

  28. Reference • Generating Text with Recurrent Neural Networks, Ilya Sutskever, James Martens, and Geoffrey Hinton, ICML 2011 • Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style, GrahamW. Taylor, Geoffrey E. Hinton • Coursera : Neural Networks for Machine Learning ,Geoffrey Hinton • http://www.cs.toronto.edu/~ilya/rnn.html

More Related