Natural Language Processing Syntactic Parsing

Natural Language ProcessingSyntactic Parsing Meeting 13, Oct 11, 2012 Rodney Nielsen Most of these slides were adapted from James Martin

Subcategorization • Many valid VP rules • But not valid for all verbs • Subcategorize verbs by sets of VP rules • Variation on transitive/intransitive • Grammars may have 100s of classes

Subcategorization • Sneeze: John sneezed • Find: Please find [a flight to NY]NP • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • Prefer: Iprefer [to leave earlier]TO-VP • Told:Iwas told [United has a flight]S • …

Programming Analogy • Verbs = methods • Subcat frames specify the number, position and type of arguments • Like formal parameters to a method

Subcategorization • *John sneezed the book • *I prefer United has a flight • *Give with a flight • As with agreement phenomena, we need a way to formally express these facts

Treebanks • Treebanks • corpora of sentence parse trees • These are generally created • First automatically parse • Then correct • Detailed annotation guidelines • POS tagset • Grammar • Instructions per grammatical constructions

Penn Treebank • Penn TreeBank is a widely used treebank. • Most well known part is the Wall Street Journal section of the Penn TreeBank. • 1 M words from the 1987-1989 Wall Street Journal.

Lexically Decorated Tree • Head Finding

Head Finding - Noun Phrases

Dependency Grammars • CFG-style phrase-structure grammars • Focus on constituents • Dependency grammar • Tree • Nodes = words • Links = dependency relations • Relations may be typed (labeled), or not.

Dependency Parse They hid the letter on the shelf

Treebank and Head-Finding Uses • Critical to develop statistical parsers • Chapter 14 • Valuable to Corpus Linguistics • Investigating empirical details of constructions

Summary • CFGs model syntax • Parsers often critical applications components • Constituency: key phenomena easily captured with CFG rules • Agreement and subcategorization pose significant problems • Treebanks: corpus of sentence trees

Today 10/11/2012 Syntactic Parsing • CKY

Automatic Syntactic Parse

CFG Parsing • Assigning proper trees • Trees that exactly cover the input • Not necessarily the correct tree

For Now • Assume… • Words are in a buffer • No POS tags • Ignore morphology • Words are known • No out of vocabulary (OOV) terms • Poor assumptions for a real application

Top-Down Search • Start with a rule mapping to S (sentences) • Progress down from there to the words

Top Down Space

Bottom-Up Parsing • Or … • Start with trees rooted at the words • Progress up to larger trees

Bottom-Up Search

Top-Down versus Bottom-Up • Top-down • Proper, feasible trees • But potentially inconsistent with the words • Bottom-up • Consistent with the words • But trees might not make sense globally

Search Strategy • How to search space and make choices • Node to expand next? • Grammar rule used for expansion • Backtracking • Make a choice • If it works, continue • If not, back up and make a different choice

Problems • Even with the best filtering, backtracking methods are doomed because of two inter-related problems • Ambiguity • Shared subproblems

Ambiguity

Shared Sub-Problems • No matter what kind of search • Don’t want to redo work already done • Naïve backtracking leads to duplicated work

Shared Sub-Problems • Consider: a flight from Indianapolis to Houston on TWA

Shared Sub-Problems • Assume a top-down parse making choices among the various Nominal rules • In particular, between these two • Nominal -> Noun • Nominal -> Nominal PP • Statically choosing the rules in this order leads to the following bad behavior...

Shared Sub-Problems

Dynamic Programming • Dynamic Programming search • Fill tables with partial results • Avoid repeating work • Solve exponential problems in nearly polynomial time • Efficiently store ambiguous structures with shared sub-parts • Bottom-up approach • CKY • Top-down approach • Earley

CKY Parsing • Limit grammar to epsilon-free binary rules • Consider the rule A BC • If there is an A somewhere in the input generated by this rule then there must be a B followed by a C in the input • If A spans [i to j), there must be a k st. i<k<j • I.e., B splits from C someplace after i and before j

Problem • What if your grammar isn’t binary? • E.g., the Penn TreeBank • Convert it to binary • Any CFG can be rewritten into Chomsky-Normal Form automatically • What does this mean? • Resulting grammar accepts (and rejects) the same set of strings • But the derivations (trees) are binary

Sample L1 Grammar

CNF Conversion

CKY • Build a table so A spanning [i to j) in the input is placed in cell [i, j] in the table • Non-terminal spanning entire string is in [0, n] • Parts of Amust be i to k & k to j, for some k

CKY • Given A B C • Look for B in [i,k] and C in [k,j]. • I.e., if there is an A spanning i,jAND • A B C THEN • There must be a B in [i,k] and a C in [k,j] for some k such that i<k<j

CKY • Fill the table by looping over the cell values [i, j] in a systematic way • For each cell, loop over the appropriate k values to search for things to add

CKY Table

CKY Algorithm What’s the complexity of this?

CKY • Fills the table one column at a time, from left to right, bottom to top • When filling a cell, the parts needed are already in the table (to the left and below) • It’s somewhat natural in that it processes the input left to right a word at a time • Known as online

Example

Example • Filling col 5 == processing word 5 (Houston) • j is 5. • i goes from 3 to 0 (3,2,1,0)

Natural Language Processing Syntactic Parsing

Natural Language Processing Syntactic Parsing

Presentation Transcript

CS 388: Natural Language Processing: Statistical Parsing

CS 388: Natural Language Processing: Semantic Parsing

Natural Language Processing : Probabilistic Parsing

Syntactic Parsing

Statistical Natural Language Parsing

Statistical Natural Language Parsing

2. Syntactic Analysis: Parsing

Introduction to Natural Language Processing (600.465) Parsing: Introduction

ICS 482 Natural Language Processing SLR Parsing

ICS 482 Natural Language Processing SLR Parsing

74.406 Natural Language Processing - Parsing 1 -

Natural Language Processing

Natural Language Processing

Natural Language Processing