310 likes | 424 Views
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor : Paul Tarau, based on Rada Mihalcea’s original slides. Parsing. Parsing with CFGs refers to the task of assigning correct trees to input strings
E N D
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on RadaMihalcea’s original slides
Parsing • Parsing with CFGs refers to the task of assigning correct trees to input strings • Correct here means a tree that covers all and only the elements of the input and has an S at the top • It doesn’t actually mean that the system can select the correct tree from among the possible trees • As with everything of interest, parsing involves a search that involves the making of choices
Some assumptions.. • Assume… • You have all the words already in some buffer • The input is/isn’t pos tagged • All the words are known • These are all (quite) feasible • State-of-the art in POS tagging? • “all words are known” ?
Top-Down Parsing • Since we are trying to find trees rooted with an S (Sentences) start with the rules that give us an S. • Then work your way down from there to the words.
Bottom-Up Parsing • Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. • Then work your way up from there.
Top-Down VS. Bottom-Up • Top-down • Only searches for trees that can be answers • But suggests trees that are not consistent with the words • Guarantees that tree starts with S as root • Does not guarantee that tree will match input words • Bottom-up • Only forms trees consistent with the words • Suggest trees that make no sense globally • Guarantees that tree matches input words • Does not guarantee that parse tree will lead to S as a root • Combine the advantages of the two by doing a search constrained from both sides (top and bottom)
Top-Down, Depth-First, Left-to-Right Search + Bottom-up Filtering
Example (cont’d) flight flight
Example (cont’d) flight flight
Possible Problem: Left-Recursion • What happens in the following situation • S -> NP VP • S -> Aux NP VP • NP -> NP PP • NP -> Det Nominal • … • With the sentence starting with • Did the flight…
Solution: Rule Ordering • S -> Aux NP VP • S -> NP VP • NP -> Det Nominal • NP -> NP PP • The key for the NP is that you want the recursive option after any base case.
Avoiding Repeated Work • Parsing is hard, and slow. It’s wasteful to redo stuff over and over and over. • Consider an attempt to top-down parse the following as an NP • A flight from Indianapolis to Houston on TWA
flight flight
Dynamic Programming • We need a method that fills a table with partial results that • Does not do (avoidable) repeated work • Does not fall prey to left-recursion • Solves an exponential problem in (approximately) polynomial time
Earley Parsing • Fills a table in a single sweep over the input words • Table is length N+1; N is number of words • Table entries represent • Completed constituents and their locations • In-progress constituents • Predicted constituents
States • The table-entries are called states and are represented with dotted-rules. • S -> · VP A VP is predicted • NP -> Det · Nominal An NP is in progress • VP -> V NP · A VP has been found
States/Locations • It would be nice to know where these things are in the input so… • S -> · VP [0,0] Predictor • A VP is predicted at the start of the sentence • NP -> Det · Nominal [1,2] Scanner • An NP is in progress; the Det goes from 1 to 2 • VP -> V NP · [0,3] Completer • A VP has been found starting at 0 and ending at 3
Earley • As with most dynamic programming approaches, the answer is found by looking in the table in the right place. • In this case, there should be an S state in the final column that spans from 0 to n+1 and is complete. • If that’s the case you’re done. • S -> α· [0,n+1] • So sweep through the table from 0 to n+1… • Predictor: New predicted states are created by states in current chart • Scanner: New incomplete states are created by advancing existing states as new constituents are discovered • Completer: New complete states are created in the same way.
Earley • More specifically… 1. Predict all the states you can upfront 2. Read a word • Extend states based on matches • Add new predictions • Go to 2 3. Look at N+1 to see if you have a winner
Earley and Left Recursion • So Earley solves the left-recursion problem without having to alter the grammar or artificially limit the search • Never place a state into the chart that’s already there • Copy states before advancing them • S -> NP VP • NP -> NP PP • The first rule predicts • S -> · NP VP [0,0] that adds • NP -> · NP PP [0,0] • stops there since adding any subsequent prediction would be fruitless • When a state gets advanced make a copy and leave the original alone • Say we have NP -> · NP PP [0,0] • We find an NP from 0 to 2 so we create NP -> NP · PP [0,2] • But we leave the original state as is
Example Book that flight We should find… an S from 0 to 3 that is a completed state…