430 likes | 613 Views
Natural Language Processing. Lecture 1: Syntax. Outline for today’s lecture. Motivation Paradigms for studying language Levels of NL analysis Syntax Parsing Top-down Bottom-up Chart parsing. Motivation. ‘Natural’ interface for humans
E N D
Natural Language Processing Lecture 1: Syntax
Outline for today’s lecture • Motivation • Paradigms for studying language • Levels of NL analysis • Syntax • Parsing • Top-down • Bottom-up • Chart parsing
Motivation • ‘Natural’ interface for humans • Programming language interfaces are difficult to learn • WIMP (windows, icons, menus, pointers) can be inefficient, impractical • Flatten out search space • Ubiquitous computing
Motivation • Economics • Cost of maintaining a phone bank • Cost of voice transactions • Turing Test • Language makes us human (?) • Example – problem with expert system interfaces
Motivation • Large text databases • Question answering • Text summarization
Why can’t we do it yet? • Speech recognition • Technology is getting better, but we may be pushing up against what is possible with signal processing only • The real problem • AMBIGUITY!
Paradigms for studying language • Linguistic • How do words form sentences and phrases? • What constrains possible meanings for a sentence? • Psycholinguistic • How do people identify the structure of sentences? • How are word meanings identified?
Paradigms for studying language • Philosophic • What is meaning, and how do words and sentences acquire it? • How do words identify objects in the world? • Computational linguistic • How is the structure of sentences identified? • How can knowledge and reasoning be modeled? • How can language be used to accomplish specific tasks?
Phonetic How are words related to the sounds that make them? /puh-lee-z/ = please Important for speech recognitions systems Levels of understanding
Phonetic Morphological How are words constructed from more basic components Un-friend-ly Gives information about function of words Levels of understanding
Phonetic Morphological Syntactic How are words combined to form correct sentences? What role do words play? Best understood – Well studied for formal languages Levels of understanding
Phonetic Morphological Syntactic Semantic What do words mean? How do these meanings combine in sentences? Levels of understanding
Phonetic Morphological Syntactic Semantic Pragmatic How are sentences used in different situations? How does this affect interpretation of a sentence? Levels of understanding
Phonetic Morphological Syntactic Semantic Pragmatic Discourse level How does the surrounding language content affect the interpretation of a sentence? Pronoun resolution, temporal references Levels of understanding
Phonetic Morphological Syntactic Semantic Pragmatic Discourse level World knowledge General knowledge about the world necessary to communicate. Includes knowledge about goals and intentions of other users. Levels of understanding
Ambiguity in language • Language can be ambiguous on many levels • Too, two, to • Cook, set, bug • The man saw the boy with the telescope. • Every boy loves a dog. • Green ideas have large noses. • Can you pass the salt?
Syntax • The syntactic structure of a sentence indicates the way that the words in the sentence are related to each other. • The structure can indicate relationships between words and phrases, and can store information that can be used later in processing
Example • The boy saw the cat • The cat saw the boy • The girl saw the man in the store • Was the girl in the store?
Syntactic processing • Main goals • Determining whether a sequence of symbols constitute a legal sentence • Assigning a phrase/constituent structure to legal sentences for later processing
Grammars and parsing techniques • We need a grammar in order to parse • Grammar = formal specification of structures allowed in a language • Given a grammar, we also need a parsing technique, or a method of analyzing a sentence to determine its structure according to the grammar
Statistical vs. Deterministic • Deterministic • Provably correct • Brittle • Statistical • Always gives an answer • No guarantees • We probably want to split the difference
NL and CFGs • Context-free grammars (CFG) are a good choice • Powerful enough to describe most NL structure • Restricted enough to allow for efficient parsing • A CFG has rules with a single symbol on the left-hand side
A simple top-down parser • (example in handouts)
S -> NP VP VP -> V NP NP -> NAME NP -> ART N NAME -> John V -> ate ART -> the N -> cat A parse tree for “John ate the cat” A simple, silly grammar S NP VP NAME V NP ART N John ate the cat
S -> NP VP VP -> V NP NP -> NAME NP -> ART N NAME -> John V -> ate ART -> the N -> cat S NP VP NAME VP John VP John V NP John ate NP John ate ART N John ate the N John ate the cat Simple top-down parse
S -> NP VP VP -> V NP NP -> NAME NP -> ART N NAME -> John V -> ate ART -> the N -> cat NAME ate the cat NAME V the cat NAME V ART cat NAME V ART N NP V ART N NP V NP NP VP S Simple bottom-up parse
Parsing as search • Parsing can be viewed as a special case of the search problem • What are the similarities?
Chart parsing • Maintains information about partial parses, so consitutents do not have to be recomputed more than once
Top-down chart parsing • Algorithm: • Do until no input left and agenda is empty: • If agenda is empty, look up interpretations of next word and add them to the agenda • Select a constituent C from the agenda • Combine C with every active arc on the chart. Add newly formed constituents to the agenda • For newly created active arcs, add to chart using arc introduction algorithm
Top-down chart parsing • Arcs keep track of completed consitutents, or potential constituents • Arc introduction algorithm: • To add an arc S -> C1 . . . ° Ci . . . Cn ending at position j, do the following: • For each rule in the grammar of form Ci -> X1 … Xk, recursively add the new arc Ci -> ° X1 … Xk
To add an arc S -> C1 . . . ° Ci . . . Cn ending at position j, do the following: For each rule in the grammar of form Ci -> X1 … Xk, recursively add the new arc Ci -> ° X1 … Xk NAME -> John V -> ate ART -> the N -> cat 0 John 1 ate 2 the 3 cat 4 S -> ° NP VP NP -> ° ART N NP -> ° NAME S -> NP VP VP -> V NP NP -> ART N NP -> NAME
Agenda: John NAME 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP -> NAME ° S -> ° NP VP NP -> ° ART N NP -> ° NAME S -> NP VP VP -> V NP NP -> ART N NP -> NAME
Agenda: NP NP1 NAME1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° S ->NP ° VP VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME
Agenda: ate NP1 NAME1 V1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° S ->NP ° VP VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME
Agenda: NP1 NAME1 V1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME
Agenda: the NP1 NAME1 V1 ART1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME
Agenda: cat NP1 NAME1 V1 ART1 N1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N NP ->ART N ° S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME
Agenda: NP NP1 NP2 NAME1 V1 ART1 N1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N NP ->ART N ° S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME
Agenda: VP VP1 NP1 NP2 NAME1 V1 ART1 N1 0 John 1 ate 2 the 3 cat 4 NAME -> John V -> ate ART -> the N -> cat NP ->NAME ° VP ->V ° NP NP ->ART ° N NP ->ART N ° S ->NP ° VP NP ->° ART N NP ->° NAME VP ->° V NP S ->° NP VP NP ->° ART N NP ->° NAME S ->NP VP VP ->V NP NP ->ART N NP ->NAME
A bigger example S1 1 the 2 large 3 can 4 can 5 hold 6 the 7 water 8 S2 VP3 NP2 VP2 NP1 VP1 N1 N2 NP3 V1 N2 V3 V4 ART1 ADJ1 AUX1 AUX2 N3 ART2 N4
Complexity • For a sentence of length n: • Pure search: Cn, where C depends on algorithm • Chart-based: Kn3, where K depends on the algorithm
Next time • Semantics • Maybe some Prolog
Other ideas • Augmented transition networks • Features