630 likes | 639 Views
Explore Syntax in Computational Linguistics - word order, sentence formation, formal structures. Importance for grammar checkers, semantic interpretation, question answering, machine translation. Learn about key constituents like Noun Phrases, Verb Phrases, Prepositional Phrases. Unlock Context-Free Grammar for English with examples.
E N D
CPSC 503Computational Linguistics Lecture 8 Giuseppe Carenini CPSC503 Winter 2019
Knowledge-Formalisms Map (next ~two lectures) State Machines (and prob. versions) (Finite State Automata, Finite State Transducers, Markov Models) Neural Language Models, Neural Sequence Modeling Morphology Logical formalisms (First-Order Logics, Prob. Logics) Syntax Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) Semantics Pragmatics Discourse and Dialogue AI planners (MDP Markov Decision Processes, Reinforcement learning) CPSC503 Winter 2019
Today Jan 30 • Start Syntax • English Syntax • Context-Free Grammar for English • Rules • Trees • Problems • Intro Parsing CPSC503 Winter 2019
Syntax Def. The study of how sentences are formed by grouping and ordering words Example: Ming and Sue prefer morning flights * Ming Sue flights morning and prefer Groups behave as single unitwrt Substitution, Movement, Coordination (they, it, do so) morning flights are preferred… …and… CPSC503 Winter 2019
Syntax: Useful tasks • Why should you care? • Grammar checkers • Basis for semantic interpretation • Question answering • Information extraction • Summarization • Machine translation • Features for Machine Learning solutions to …. CPSC503 Winter 2019
Key Constituents – with heads(English) (Specifier) X (Complement) • (Det) N (PP) • (Qual) V (NP) • (Deg) P (NP) • (Deg) A (PP) • (NP) (I) (VP) • Noun Phrases • Verb Phrases • Prepositional Phrases • Adjective Phrases • Sentences Some simplespecifiers Category Typical function Examples Determiner specifier of N the, a, this, no.. Qualifier specifier of V never, often.. Degree word specifier of A or P very, almost.. Complements? CPSC503 Winter 2019
Sample Complements… more on handout For Verbs CPSC503 Winter 2019
More complements CPSC503 Winter 2019
Key Constituents: Examples • (Det) N (PP) • the cat on the table • (Qual) V (NP) • never eat a cat • (Deg) P (NP) • almost in the net • (Deg) A (PP) • very happy about it • Noun Phrases • Verb Phrases • Prepositional Phrases • Adjective Phrases CPSC503 Winter 2019
Start-symbol Context Free Grammar (Example) • S -> NP VP • NP -> Det NOMINAL • NOMINAL -> Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left • Non-terminal • Terminal CPSC503 Winter 2019
CFG more complex Example • Grammar with example phrases • Lexicon CPSC503 Winter 2019
CFGs • Define a Formal Language (un/grammatical sentences) • Generative Formalism • Generate strings in the language • Reject strings not in the language • Impose structures (trees) on strings in the language CPSC503 Winter 2019
Context Free Grammar (CFG) • 4-tuple (non-term., term., productions, start) • (N, , P, S) • P is a set of rules A; AN, (N)* CPSC503 Winter 2019
CFG: Formal Definitions • 4-tuple (non-term., term., productions, start) • (N, , P, S) • P is a set of rules A; AN, (N)* • A derivation is the process of rewriting 1into m (both strings in (N)*) by applying a sequence of rules: 1* m • L G = W|w* and S * w CPSC503 Winter 2019
Nominal Nominal flight Derivations as Trees CPSC503 Winter 2019
I prefer a morning flight Nominal Nominal Parser flight CFG Parsing • It is completely analogous to running a finite-state transducer with its tapes • It’s just more powerful • Chpt. 11 CPSC503 Winter 2019
Other Options • Regular languages (FSA) A xB or A x • Too weak (e.g., cannot deal with recursion in a general way – no center-embedding) • CFGs A (also produce more understandable and “useful” structure) • Context-sensitive A ; ≠ • Can be computationally intractable • Turing equiv. ; ≠ • Too powerful / Computationally intractable CPSC503 Winter 2019
Common Sentence-Types • Declaratives: A plane left S -> NP VP • Imperatives: Leave! S -> VP • Yes-No Questions: Did the plane leave? S -> Aux NP VP • WH Questions: Which flights serve breakfast? S -> WH NP VP When did the plane leave? S -> WH Aux NP VP CPSC503 Winter 2019
NP: more details • NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom • e.g., all the other cheap cars NP -> Specifiers N Complements • Nom -> Nom PP (PP) (PP) • e.g., reservation on BA456 from NY to YVR Nom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening CPSC503 Winter 2019
Conjunctive Constructions • S -> S and S • John went to NY and Mary followed him • NP -> NP and NP • John went to NY and Boston • VP -> VP and VP • John went to NY and visited MOMA • … • In fact the right rule for English is • X -> X and X CPSC503 Winter 2019
Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports …. CPSC503 Winter 2019
Problems with CFGs • Agreement • Subcategorization CPSC503 Winter 2019
Agreement • In English, • Determiners and nouns have to agree in number • Subjects and verbs have to agree in person and number • Many languages have agreement systems that are far more complex than this (e.g., gender). CPSC503 Winter 2019
This dog Those dogs This dog eats You have it Those dogs eat *This dogs *Those dog *This dog eat *You has it *Those dogs eats Agreement CPSC503 Winter 2019
S -> NP VP NP -> Det Nom VP -> V NP … SgS -> SgNP SgVP PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP3p ->SgV3p NP … Possible CFG Solution OLD Grammar NEW Grammar Sg = singular Pl = plural CPSC503 Winter 2019
CFG Solution for Agreement • It works and stays within the power of CFGs • But it doesn’t scale all that well (explosion in the number of rules) CPSC503 Winter 2019
Subcategorization • Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments(see complements table) • *John sneezed the book • *I prefer United has a flight • *Give with a flight CPSC503 Winter 2019
Subcategorization • Sneeze: John sneezed • Find: Please find [a flight to NY]NP • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • Prefer: I prefer [to leave earlier]TO-VP • Told: I was told [United has a flight]S • … CPSC503 Winter 2019
So? • So the various rules for VPs overgenerate. • They allow strings containing verbs and arguments that don’t go together • For example: • VP -> V NP therefore Sneezed the book • VP -> V S therefore go she will go there CPSC503 Winter 2019
Possible CFG Solution OLD Grammar NEW Grammar • VP -> V • VP -> V NP • VP -> V NP PP • … • VP -> IntransV • VP -> TransV NP • VP -> TransPPto NP PPto • … • TransPPto -> hand,give,.. This solution has the same problem as the one for agreement CPSC503 Winter 2019
CFG for NLP: summary • CFGs cover most syntactic structure in English. • But there are problems (over-generation) • That can be dealt with adequately, although not elegantly, by staying within the CFG framework. • Numerous alternative approaches have been developed that all share the common theme of making better use of the lexicon. • Lexical-Functional Grammar (LFG), • Head-Driven Phrase Structure Grammar (HPSG), • Tree-Adjoining Grammar (TAG), • Combinatory Categorical Grammar (CCG) (Sect. 10.6.1 J&M 3rd Ed). Possible Pedagogical Project CPSC503 Winter 2019
Today Jan 30 • Start Syntax • English Syntax • Context-Free Grammar for English • Rules • Trees • Problems • Intro (CFG) Parsing CPSC503 Winter 2019
Nominal Nominal flight Parsing with CFGs • Valid parse trees • Sequence of words Assign valid trees: covers all and only the elements of the input and has an S at the top I prefer a morning flight Parser CFG CPSC503 Winter 2019
CFG Parsing as Search • Search space of possible parse trees • S -> NP VP • S -> Aux NP VP • NP -> Det Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left, arrive • Aux -> do, does Parsing: find all trees that cover all and only the words in the input • defines CPSC503 Winter 2019
Nominal Nominal flight Constraints on Search • Sequence of words • Valid parse trees Search Strategies: • Top-down or goal-directed • Bottom-up or data-directed I prefer a morning flight Parser CFG (search space) CPSC503 Winter 2019
Input: flight Top-Down Parsing • Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S. • Then work your way down from there to the words. CPSC503 Winter 2019
…….. • …….. • …….. Next step: Top Down Space • When POS categories are reached, reject trees whose leaves fail to match all words in the input CPSC503 Winter 2019
flight flight flight Bottom-Up Parsing • Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. • Then work your way up from there. CPSC503 Winter 2019
…….. • …….. • …….. flight flight flight flight flight flight flight Two more steps: Bottom-Up Space CPSC503 Winter 2019
Top-Down vs. Bottom-Up • Top-down • Only searches for trees that can be answers • But suggests trees that may not be consistent with the words • Bottom-up • Only forms trees consistent with the words • Suggest trees that may make no sense globally CPSC503 Winter 2019
So Combine Them • Top-down: control strategy to generate trees • Bottom-up: to filter out inappropriate parses • Top-down Control strategy: • Depth vs. Breadth first • Which node to try to expand next • Which grammar rule to use to expand a node • (left-most) • (textual order) CPSC503 Winter 2019
Top-Down, Depth-First, Left-to-Right Search Sample sentence: “Does this flight include a meal?” CPSC503 Winter 2019
Example “Does this flight include a meal?” CPSC503 Winter 2019
Example “Does this flight include a meal?” flight flight CPSC503 Winter 2019
Example “Does this flight include a meal?” flight flight CPSC503 Winter 2019
Problems with TD-BU • Ambiguity • Repeated Parsing • SOLUTION: • (once again dynamic programming!) CPSC503 Winter 2019
(1) Structural Ambiguity Three basic kinds: Attachment/Coordination/NP-bracketing “I shot an elephant in my pajamas” CPSC503 Winter 2019
Structural Ambiguity (Ex. 1) VP -> V NP ; NP -> NP PP VP -> V NP PP “I shot an elephant in my pajamas” CPSC503 Winter 2019
Structural Ambiguity (Ex.2) “I saw Mary passing by cs2” “I saw Mary passing by cs2” (ROOT (S (NP (PRP I)) (VP (VBD saw) (S (NP (NNP Mary)) (VP (VBG passing) (PP (IN by) (NP (NNP cs2))))))) (ROOT (S (NP (PRP I)) (VP (VBD saw) (NP (NNP Mary)) (S (VP (VBG passing) (PP (IN by) (NP (NNP cs2))))))) CPSC503 Winter 2019