1 / 43

Natural Language Processing

Natural Language Processing. Lecture 1: Syntax. Outline for today’s lecture. Motivation Paradigms for studying language Levels of NL analysis Syntax Parsing Top-down Bottom-up Chart parsing. Motivation. ‘Natural’ interface for humans

tymon
Download Presentation

Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing Lecture 1: Syntax

  2. Outline for today’s lecture • Motivation • Paradigms for studying language • Levels of NL analysis • Syntax • Parsing • Top-down • Bottom-up • Chart parsing

  3. Motivation • ‘Natural’ interface for humans • Programming language interfaces are difficult to learn • WIMP (windows, icons, menus, pointers) can be inefficient, impractical • Flatten out search space • Ubiquitous computing

  4. Motivation • Economics • Cost of maintaining a phone bank • Cost of voice transactions • Turing Test • Language makes us human (?) • Example – problem with expert system interfaces

  5. Motivation • Large text databases • Question answering • Text summarization

  6. Why can’t we do it yet? • Speech recognition • Technology is getting better, but we may be pushing up against what is possible with signal processing only • The real problem • AMBIGUITY!

  7. Paradigms for studying language • Linguistic • How do words form sentences and phrases? • What constrains possible meanings for a sentence? • Psycholinguistic • How do people identify the structure of sentences? • How are word meanings identified?

  8. Paradigms for studying language • Philosophic • What is meaning, and how do words and sentences acquire it? • How do words identify objects in the world? • Computational linguistic • How is the structure of sentences identified? • How can knowledge and reasoning be modeled? • How can language be used to accomplish specific tasks?

  9. Phonetic How are words related to the sounds that make them? /puh-lee-z/ = please Important for speech recognitions systems Levels of understanding

  10. Phonetic Morphological How are words constructed from more basic components Un-friend-ly Gives information about function of words Levels of understanding

  11. Phonetic Morphological Syntactic How are words combined to form correct sentences? What role do words play? Best understood – Well studied for formal languages Levels of understanding

  12. Phonetic Morphological Syntactic Semantic What do words mean? How do these meanings combine in sentences? Levels of understanding

  13. Phonetic Morphological Syntactic Semantic Pragmatic How are sentences used in different situations? How does this affect interpretation of a sentence? Levels of understanding

  14. Phonetic Morphological Syntactic Semantic Pragmatic Discourse level How does the surrounding language content affect the interpretation of a sentence? Pronoun resolution, temporal references Levels of understanding

  15. Phonetic Morphological Syntactic Semantic Pragmatic Discourse level World knowledge General knowledge about the world necessary to communicate. Includes knowledge about goals and intentions of other users. Levels of understanding

  16. Ambiguity in language • Language can be ambiguous on many levels • Too, two, to • Cook, set, bug • The man saw the boy with the telescope. • Every boy loves a dog. • Green ideas have large noses. • Can you pass the salt?

  17. Syntax • The syntactic structure of a sentence indicates the way that the words in the sentence are related to each other. • The structure can indicate relationships between words and phrases, and can store information that can be used later in processing

  18. Example • The boy saw the cat • The cat saw the boy • The girl saw the man in the store • Was the girl in the store?

  19. Syntactic processing • Main goals • Determining whether a sequence of symbols constitute a legal sentence • Assigning a phrase/constituent structure to legal sentences for later processing

  20. Grammars and parsing techniques • We need a grammar in order to parse • Grammar = formal specification of structures allowed in a language • Given a grammar, we also need a parsing technique, or a method of analyzing a sentence to determine its structure according to the grammar

  21. Statistical vs. Deterministic • Deterministic • Provably correct • Brittle • Statistical • Always gives an answer • No guarantees • We probably want to split the difference

  22. NL and CFGs • Context-free grammars (CFG) are a good choice • Powerful enough to describe most NL structure • Restricted enough to allow for efficient parsing • A CFG has rules with a single symbol on the left-hand side

  23. A simple top-down parser • (example in handouts)

  24. S -> NP VP VP -> V NP NP -> NAME NP -> ART N NAME -> John V -> ate ART -> the N -> cat A parse tree for “John ate the cat” A simple, silly grammar S NP VP NAME V NP ART N John ate the cat

  25. S -> NP VP VP -> V NP NP -> NAME NP -> ART N NAME -> John V -> ate ART -> the N -> cat S NP VP NAME VP John VP John V NP John ate NP John ate ART N John ate the N John ate the cat Simple top-down parse

  26. S -> NP VP VP -> V NP NP -> NAME NP -> ART N NAME -> John V -> ate ART -> the N -> cat NAME ate the cat NAME V the cat NAME V ART cat NAME V ART N NP V ART N NP V NP NP VP S Simple bottom-up parse

  27. Parsing as search • Parsing can be viewed as a special case of the search problem • What are the similarities?

  28. Chart parsing • Maintains information about partial parses, so consitutents do not have to be recomputed more than once

  29. Top-down chart parsing • Algorithm: • Do until no input left and agenda is empty: • If agenda is empty, look up interpretations of next word and add them to the agenda • Select a constituent C from the agenda • Combine C with every active arc on the chart. Add newly formed constituents to the agenda • For newly created active arcs, add to chart using arc introduction algorithm

  30. Top-down chart parsing • Arcs keep track of completed consitutents, or potential constituents • Arc introduction algorithm: • To add an arc S -> C1 . . . ° Ci . . . Cn ending at position j, do the following: • For each rule in the grammar of form Ci -> X1 … Xk, recursively add the new arc Ci -> ° X1 … Xk

  31. To add an arc S -> C1 . . . ° Ci . . . Cn ending at position j, do the following: For each rule in the grammar of form Ci -> X1 … Xk, recursively add the new arc Ci -> ° X1 … Xk NAME -> John V -> ate ART -> the N -> cat 0 John 1 ate 2 the 3 cat 4 S -> ° NP VP NP -> ° ART N NP -> ° NAME S -> NP VP VP -> V NP NP -> ART N NP -> NAME

  32. Agenda: John NAME 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP -> NAME ° S -> ° NP VP NP -> ° ART N NP -> ° NAME S -> NP VP VP -> V NP NP -> ART N NP -> NAME

  33. Agenda: NP NP1 NAME1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° S ->NP ° VP VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME

  34. Agenda: ate NP1 NAME1 V1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° S ->NP ° VP VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME

  35. Agenda: NP1 NAME1 V1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME

  36. Agenda: the NP1 NAME1 V1 ART1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME

  37. Agenda: cat NP1 NAME1 V1 ART1 N1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N NP ->ART N ° S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME

  38. Agenda: NP NP1 NP2 NAME1 V1 ART1 N1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N NP ->ART N ° S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME

  39. Agenda: VP VP1 NP1 NP2 NAME1 V1 ART1 N1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N NP ->ART N ° S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME

  40. A bigger example S1 1 the 2 large 3 can 4 can 5 hold 6 the 7 water 8 S2 VP3 NP2 VP2 NP1 VP1 N1 N2 NP3 V1 N2 V3 V4 ART1 ADJ1 AUX1 AUX2 N3 ART2 N4

  41. Complexity • For a sentence of length n: • Pure search: Cn, where C depends on algorithm • Chart-based: Kn3, where K depends on the algorithm

  42. Next time • Semantics • Maybe some Prolog

  43. Other ideas • Augmented transition networks • Features

More Related