1 / 56

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing. Topics. Word embeddings Recurrent neural networks Long-short-term memory networks Neural machine translation Automatically generating image captions. Word meaning in NLP. How do we capture meaning and context of words?

karlaw
Download Presentation

Deep Learning for Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deep Learning for Natural Language Processing

  2. Topics • Word embeddings • Recurrent neural networks • Long-short-term memory networks • Neural machine translation • Automatically generating image captions

  3. Word meaning in NLP • How do we capture meaning and context of words? Synonyms: Synechdoche: “I loved the movie.” “Today, Washington affirmed “I adored the movie.” its opposition to the trade pact.” Homonyms: “I deposited the money in the bank.” “I buried the money in the bank.” Polysemy: “I read a book today.” “I wasn’t able to book the hotel room.”

  4. Word Embeddings “One of the most successful ideas of modern NLP”. One example: Google’s Word2Vecalgorithm

  5. Word2Vec algorithm . . . . . . . . .

  6. Word2Vec algorithm . . . . . . . . . Input: One-hot representation of input word over vocabulary 10,000 units

  7. Word2Vec algorithm Hidden layer (linear activation function) 300 units . . . . . . . . . Input: One-hot representation of input word over vocabulary 10,000 units

  8. Word2Vec algorithm Output: Probability (for each word wi in vocabulary) that wiis nearby the input word in a sentence. 10,000 units Hidden layer (linear activation function) 300 units . . . . . . . . . Input: One-hot representation of input word over vocabulary 10,000 units

  9. Word2Vec algorithm Output: Probability (for each word wi in vocabulary) that wiis nearby the input word in a sentence. 10,000 units 300 × 10,000 weights Hidden layer (linear activation function) 300 units . . . 10,000 × 300 weights . . . . . . Input: One-hot representation of input word over vocabulary 10,000 units

  10. Word2Vec training • Training corpus of documents • Collect pairs of nearby words • Example “document”: Every morning she drinks Starbucks coffee. Training pairs (window size = 3): (every, morning) (morning, drinks) (drinks, Starbucks) (every, she) (she, drinks) (drinks, coffee) (morning, she) (she, Starbucks) (Starbucks, coffee)

  11. Target (probability that “Starbucks” is nearby “drinks”) Word2Vec training via backpropagation Starbucks . . . 300 × 10,000 weights Linear activation function 10,000 × 300 weights . . . . . . . . . drinks

  12. Target (probability that “coffee” is nearby “drinks”) Word2Vec training via backpropagation coffee . . . 300 × 10,000 weights Linear activation function 10,000 × 300 weights . . . . . . . . . drinks

  13. Learned word vectors . . . . . . 10,000 × 300 weights . . . . . . drinks

  14. Some surprising results of word2vec http://www.aclweb.org/anthology/N13-1#page=784

  15. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdfhttp://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

  16. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdfhttp://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

  17. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdfhttp://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

  18. Word embeddings demo http://bionlp-www.utu.fi/wv_demo/

  19. Recurrent Neural Network (RNN) From http://axon.cs.byu.edu/~martinez/classes/678/Slides/Recurrent.pptx

  20. Recurrent Neural Network “unfolded” in time From http://eric-yuan.me/rnn2-lstm/ Training algorithm: “Backpropagation in time”

  21. Encoder-decoder (or “sequence-to-sequence”) networks for translation http://book.paddlepaddle.org/08.machine_translation/image/encoder_decoder_en.png

  22. Problem for RNNs: learning long-term dependencies. “The cat that my mother’s sister took to Hawaii the year before last when you were in high school is now living with my cousin.” Backpropagation through time: problem of vanishing gradients

  23. Long Short Term Memory (LSTM) • A “neuron” with a complicated memory gating structure. • Replaces ordinary hidden neurons in RNNs. • Designed to avoid the long-term dependency problem

  24. Long-Short-Term-Memory (LSTM) Unit Simple RNN (hidden) unit From https://deeplearning4j.org/lstm.html LSTM (hidden) unit

  25. Comments on LSTMs • LSTM unit replaces simple RNN unit • LSTM internal weights still trained with backpropagation • Cell value has feedback loop: can remember value indefinitely • Function of gates (“input”, “forget”, “output”) is learned via minimizing loss

  26. Google “Neural Machine Translation”: (unfolded in time) From https://arxiv.org/pdf/1609.08144.pdf

  27. Neural Machine Translation:Training: Maximum likelihood, using gradient descent on weights Trained on very large corpus of parallel texts in source (X) and target (Y) languages.

  28. How to evaluate automated translations? Human raters’ side-by-side comparisons: Scale of 0 to 6 0: “completely nonsense translation” 2: “the sentence preserves some of the meaning of the source sentence but misses significant parts” 4: “the sentence retains most of the meaning of the source sentence, but may have some grammar mistakes” 6: “perfect translation: the meaning of the translation is completely consistent with the source, and the grammar is correct.”

  29. Results from Human Raters

  30. Automating Image Captioning

  31. Automating Image Captioning CNN features Training: Large dataset of image/caption pairs from Flickr and other sources Softmax probability distribution over vocabulary Word embeddings Words in caption Vinyals et al., “Show and Tell: A Neural Image Caption Generator”, CVPR 2015

  32. “NeuralTalk” sample results From http://cs.stanford.edu/people/karpathy/deepimagesent/generationdemo/

  33. Microsoft Captionbot https://www.captionbot.ai/

  34. From Andrej Karpathy’s Blog, Oct. 22, 2012: “The State of Computer Vision and AI: We are Really, Really Far Away.” What knowledge do you need to understand this situation? http://karpathy.github.io/2012/10/22/state-of-computer-vision/

  35. Microsoft CaptionBot.ai: “I can understand the content of any photograph and I’ll try to describe it as well as any human.”

  36. Microsoft CaptionBot.ai: “I can understand the content of any photograph and I’ll try to describe it as well as any human.”

  37. Winograd Schema “Common Sense” Challenge

  38. Winograd Schema “Common Sense” Challenge I poured water from the bottle into the cup until it was full. What was full? I poured water from the bottle into the cup until it was empty. What was empty? Winograd Schemas (Levesque et al., 2011)

  39. Winograd Schema “Common Sense” Challenge The steel ball hit the glass table and it shattered. What shattered? The glass ball hit the steel table and it shattered. What shattered? Winograd Schemas (Levesque et al., 2011)

  40. State-of-the-art AI: ~60% (vs. 50% with random guessing) Humans: 100% (if paying attention)

  41. State-of-the-art AI: ~60% (vs. 50% with random guessing) Humans: 100% (if paying attention) “When AI can’t determine what ‘it’ refers to in a sentence, it’s hard to believe that it will take over the world.” — Oren Etzioni, Allen Institute for AI

  42. https://www.seattletimes.com/business/technology/paul-allen-invests-125-million-to-teach-computers-common-sense/https://www.seattletimes.com/business/technology/paul-allen-invests-125-million-to-teach-computers-common-sense/ https://allenai.org/alexandria/

  43. https://www.darpa.mil/news-events/2018-10-11 Today’s machine learning systems are more advanced than ever, capable of automating increasingly complex tasks and serving as a critical tool for human operators. Despite recent advances, however, a critical component of Artificial Intelligence (AI) remains just out of reach – machine common sense. Defined as “the basic ability to perceive, understand, and judge things that are shared by nearly all people and can be reasonably expected of nearly all people without need for debate,” common sense forms a critical foundation for how humans interact with the world around them. Possessing this essential background knowledge could significantly advance the symbiotic partnership between humans and machines. But articulating and encoding this obscure-but-pervasive capability is no easy feat. “The absence of common sense prevents an intelligent system from understanding its world, communicating naturally with people, behaving reasonably in unforeseen situations, and learning from new experiences,” said Dave Gunning, a program manager in DARPA’s Information Innovation Office (I2O). “This absence is perhaps the most significant barrier between the narrowly focused AI applications we have today and the more general AI applications we would like to create in the future.”

  44. Allen AI Institute Common Sense Challenge • Which factor will most likely cause a person to develop a fever? (A) a leg muscle relaxing after exercise (B) a bacterial population in the bloodstream (C) several viral particles on the skin (D) carbohydrates being digested in the stomach

More Related