1 / 16

NLP: Grand Challenges

NLP: Grand Challenges. The Ultimate Goal – For computers to use NL as effectively as humans do…. Reading and writing text Abstracting Monitoring Extraction into Databases Interactive Dialogue: Natural, effective access to computer systems Informal Speech Input and Output

amanda
Download Presentation

NLP: Grand Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP: Grand Challenges The Ultimate Goal – For computers to use NL as effectively as humans do…. Reading and writing text • Abstracting • Monitoring • Extraction into Databases Interactive Dialogue: Natural, effective access to computer systems • Informal Speech Input and Output Translation: Input and Output in Multiple Languages

  2. The Past: Human Machine Interfaces • SHRDLU (Winograd, 1969) • A fragile demonstration of the fundamental vision • PUNDIT (Hirschman, Palmer, … 1989) • Information Extraction from Real World texts

  3. SHRDLU: A demonstration proof Person: PICK UP A BIG RED BLOCK. Computer: OK. (does it) Person: GRASP THE PYRAMID. Computer: I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX. Computer: BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. Computer: OK. (does it) Person: WHAT DOES THE BOX CONTAIN? Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. Person: WHAT IS THE PYRAMID SUPPORTED BY? Computer: THE BOX.

  4. SHRDLU: A demonstration proof

  5. The Past: Information Extraction PUNDIT (Palmer et al. 1987) Sample CASREP & automatic summary FAILURE OF ONE OF TWO SACS. UNIT HAD LOW OUTPUT AIR PRESSURE. RESULTED IN SLOW GAS TURBINE START. TROUBLESHOOTING REVEALED NORMAL SAC LUBE OIL PRESSURE AND TEMPERATURE. EROSION OF IMPELLOR BLADE TIP EVIDENT. CAUSE OF EROSION OF IMPELLOR BLADE UNDETERMINED. NEW SAC RECEIVED.

  6. The Past: Crucial flaws in the paradigm These systems worked well, BUT • Usually, only for a small set of examples • Person-years of work to port to new applications and, often, to extend coverage on a single application • Very limited and inconsistent coverage of English

  7. An Early Robust Statistical NLP Application • A Statistical Model For Etymology • Church, K.W. (1985) "Stress assignment in letter to sound rules for speech synthesis", Proceedings of the 23rd Annual Meeting (University of Chicago), [text to speech; phonetics] • Determining etymology is crucial for text-to-speech

  8. An Early Robust Statistical NLP Application • Etymology can be determined reasonably accurately from statistics computed from letter sequences trigrams!

  9. A Central Challenge: Extracting Meaning ??Meaning Extractor?? Text or speech Meaning

  10. Literal vs. Implicit Meaning • Cognitive beings automatically • combine literal meaning • with world knowledge • to see implicit meaning • Q: Whose greed? Q: Whose ambition? • Understanding this involves inferring implicit meaning • Recent NLP has focused on robust extraction of shallow, literal meaning “The founder of Pakistan's nuclear program, Abdul Qadeer Khan, has admitted he transferred nuclear technology to Iran, Libya and North Korea, a Pakistani government official said Monday… The transfers were made during the late 1980s and in the early and mid 1990s, and were motivated by "personal greed and ambition," an official said.”

  11. Levels of Representation Full Semantics Explicit Semantics Syntax Words Morphology

  12. The founder of Pakistan's nuclear program, Abdul Qadeer Khan, has admitted he transferred nuclear technology to Iran, Libya and North Korea, a Pakistani government official said Monday. Khan made the confession in a written statement submitted "a couple of days ago" to investigators probing allegations of nuclear proliferation by Pakistan, the official told The Associated Press on condition on anonymity. The transfers were made during the late 1980s and in the early and mid 1990s, and were motivated by "personal greed and ambition," the official said. The official said the transfers were not authorized by the government. Unigrams Word Unigram Representation

  13. The founder of Pakistan's nuclear program, Abdul Qadeer Khan, has admitted he transferred nuclear technology to Iran, Libya and North Korea, a Pakistani government official said Monday. Khan made the confession in a written statement submitted "a couple of days ago" to investigators probing allegations of nuclear proliferation by Pakistan, the official told The Associated Press on condition on anonymity. The transfers were made during the late 1980s and in the early and mid 1990s, and were motivated by "personal greed and ambition," the official said. The official said the transfers were not authorized by the government. Bigrams Word Bigram Representation

  14. The • founder • of • Pakistan’s • nuclear department • Abdul Qadeer Khan • has • admitted • he • transferred • nuclear technology • to • Iran, • Libya, • and • North Korea NP NP PP NP S NP NP VP VP SBAR NP S VP NP PP NP NP NP NP Syntax Representation: Treebank • TreeBank includes • Part of speech • Syntactic structure

  15. The • founder • of • Pakistan’s • nuclear department • Abdul Qadeer Khan • has • admitted • he • transferred • nuclear technology • to • Iran, • Libya, • and • North Korea NP NP PP NP S NP NP VP VP SBAR NP S VP NP PP NP NP NP NP 1995: A breakthrough in parsing 106 words ofTreebank Annotation + Machine Learning = Robust Parsers Training Program The founder of Pakistan's nuclear program, Abdul Qadeer Khan, has admitted he transferred nuclear technology to Iran, Libya and North Korea training sentences answers Models Trees Parser • 1990 Best hand-built parsers: ~40-60% accuracy (guess) • 1995+ Statistical parsers: ~90% accuracy

  16. Rich Linguistic Representations + Powerful Machine Learning = Robust, Effective NLP 1970s, ’80s: Focus on Linguistic Representations 1990s, early 2000s: Focus on Machine Learning Recently: New work combining the two

More Related