510 likes | 697 Views
PARSING. Analyzing Linguistic Units. Why should we parse a sentence? to detect relations among words used to normalize surface syntactic variations. invaluable for a number of NLP applications. Some Concepts. Grammar: A generative device that prescribes a set of valid strings.
E N D
Analyzing Linguistic Units • Why should we parse a sentence? • to detect relations among words • used to normalize surface syntactic variations. • invaluable for a number of NLP applications
Some Concepts • Grammar: A generative device that prescribes a set of valid strings. • Parser: A device that uncovers the sequence of grammar rules that might have generated the input sentence. • Input: Grammar, Sentence • Output: parse tree, derivation tree • Recognizer: A device that returns a “yes” if the input string could be generated by the grammar. • Input: Grammar, Sentence • Output: boolean
Searching for a Parse • Grammar + rewrite procedure encodes • all strings generated by the grammar L(G) • all parse trees for each string (s) generated T(G) = U{Ts(G)} • Given an input sentence (I), the set of parse trees is TI (G). • Parsing is searching for TI (G) ⊆ T(G) • Ideally, parser finds the appropriate parse for the sentence.
CFG for Fragment of English S VP NP V Book Nom Det that N flight Bottom-up Parsing Top-down Parsing
Top-down/Bottom-up Parsing • Control strategy -- how to explore search space? • Pursuing all parses in parallel or backtrack or …? • Which rule to apply next? • Which node to expand next? • Look at how the Top-down and Bottom-up parsing works on the board for “Book that flight”
Top-down, Depth First, Left-to-Right parser • Systematic, incremental expansion of the search space. • In contrast to a parallel parser • Start State: (•S,0) • End State: (•,n) n is the length of input to be parsed • Next State Rules • (•wj+1b,j) (•b,j+1) • (•Bb,j) (•gb,j) if Bg (note B is left-most non-terminal) • Agenda: A data structure to keep track of the states to be expanded. • Depth-first expansion, if Agenda is a stack.
Fig 10.7 CFG
Category Left Corners S Det, PropN, Aux, V NP Det, PropN Nom N VP V Left Corners • Can we help top-down parsers with some bottom-up information? • Unnecessary states created if there are many Bg rules. • If after successive expansions B * w d; and w does not match the input, then the series of expansion is wasted. • The leftmost symbol derivable from B needs to match the input. • look ahead to left-corner of the tree • B is a left-corner of A if A * B g • Build table with left-corners of all non-terminals in grammar and consult before applying rule • At a given point in state expansion (•Bb,j) • Pick the rule B C g if left-corner of C matches the input wj+1
Limitation of Top-down Parsing: Left Recursion • Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP) • Solutions: • Rewrite the grammar to a weakly equivalent one which is not left-recursive NP NP PP NP Nom PP NP Nom • This may make rules unnatural • Fix depth of search explicitly • Other book-keeping needed in top-down parsing • Memoization for reusing previously parsed substrings • Packed representation for parse ambiguity • NP Nom NP’ • NP’ PP NP’ • NP’ e
Dynamic Programming for Parsing • Memoization: • Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds • Look up subtrees for each constituent rather than re-parsing • Since all parses implicitly stored, all available for later disambiguation • Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms • Earley parser: O(n^3) parser • Top-down parser with bottom-up information • State: [i, A a •b, j] • j is the position in the string that has been parsed • i is the position in the string where A begins • Top-down prediction: S * w1… wi A g • Bottom-up completion: a wj+1 … wn * wi … wn
Earley Parser • Data Structure: An n+1 cell array called : Chart • For each word position, chart contains set of states representing all partial parse trees generated to date. • E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence • Chart entries represent three type of constituents: • predicted constituents (top-down predictions) • in-progress constituents (we’re in the midst of …) • completed constituents (we’ve found …) • Progress in parse represented by Dotted Rules • Position of • indicates type of constituent • 0 Book 1 that 2 flight 3 (0,S • VP, 0) (predicting VP) (1,NP Det • Nom, 2) (finding NP) (0,VP V NP •, 3) (found VP)
Earley Parser: Parse Success • Final answer is found by looking at last entry in chart • If entry resembles (0,S •, n) then input parsed successfully • But … note that chart will also contain a record of all possible parses of input string, given the grammar -- not just the successful one(s) • Why is this useful?
Earley Parsing Steps • Start State: (0, S’ •S, 0) • End State: (0, Sa•, n) n is the input size • Next State Rules • Scanner: read input • (i, Aa•wj+1b, j) (i, Aawj+1•b, j+1) • Predictor: add top-down predictions • (i, Aa•Bb, j) (j, B•g, j) if Bg (note B is left-most non-terminal) • Completer: move dot to right when new constituent found • (i, Ba•Ab, k) (k, Ag•, j) (i, BaA•b, j) • No backtracking and no states removed: keep complete history of parse • Why is this useful?
Book that flight (Chart [0]) • Seed chart with top-down predictions for S from grammar
S NP VP S Aux NP VP S VP NP Det Nom Prep from | to | on PropN Houston | TWA CFG for Fragment of English Det that | this | a N book | flight | meal | money V book | include | prefer Aux does Nom N Nom N Nom NP PropN VP V Nom Nom PP VP V NP PP Prep NP
Chart[1] Vbook passed to Completer, which finds 2 states in Chart[0] whose left corner is V and adds them to Chart[1], moving dots to right
Retrieving the parses • Augment the Completer to add pointer to prior states it advances as a field in the current state • i.e. what states combined to arrive here? • Read the pointers back from the final state • What if the final cell does not have the final state? – Error handling. • Is it a total loss? No... • Chart contains every constituent and combination of constituents possible for the input given the grammar • Useful for partial parsing or shallow parsing used in information extraction
Alternative Control Strategies • Change Earley top-down strategy to bottom-up or ... • Change to best-first strategy based on the probabilities of constituents • Compute and store probabilities of constituents in the chart as you parse • Then instead of expanding states in fixed order, allow probabilities to control order of expansion
Probabilistic CFGs • Weighted CFGs • Attach weights to rules of CFG • Compute weights of derivations • Use weights to pick, preferred parses • Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. • Parsing with weighted grammars (like Weighted FA) • T* = arg maxT W(T,S) • Probabilistic CFGs are one form of weighted CFGs.
Probability Model • Rule Probability: • Attach probabilities to grammar rules • Expansions for a given non-terminal sum to 1 R1:VP V .55 R2: VP V NP .40 R3: VP V NP NP .05 • Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP) • Derivation Probability: • Derivation T= {R1…Rn} • Probability of a derivation: • Most likely probable parse: • Probability of a sentence: • Sum over all possible derivations for the sentence • Note the independence assumption: Parse probability does not change based on where the rule is expanded.
S NP VP VP V NP NP NP PP VP VP PP PP P NP NP John | Mary | Denver V -> called P -> from S VP NP PP VP V NP P NP called from John Mary Denver Structural ambiguity John called Mary from Denver S VP NP NP V NP PP called John Mary P NP from Denver
Cocke-Younger-Kasami Parser • Bottom-up parser with top-down filtering • Start State(s): (A, i, i+1) for each Awi+1 • End State: (S, 0,n) n is the input size • Next State Rules • (B, i, k) (C, k, j) (A, i,j) if ABC
Probabilistic CKY • Assign probabilities to constituents as they are completed and placed in the table • Computing the probability • Since we are interested in the max P(S,0,n) • Use the max probability for each constituent • Maintain back-pointers to recover the parse.
Problems with PCFGs • The probability model we’re using is just based on the rules in the derivation. • Lexical insensitivity: • Doesn’t use the words in any real way • Structural disambiguation is lexically driven • PP attachment often depends on the verb, its object, and the preposition • I ate pickles with a fork. • I ate pickles with relish. • Context insensitivity of the derivation • Doesn’t take into account where in the derivation a rule is used • Pronouns more often subjects than objects • She hates Mary. • Mary hates her. • Solution: Lexicalization • Add lexical information to each rule
An example of lexical information: Heads • Make use of notion of the headof a phrase • Head of an NP is a noun • Head of a VP is the main verb • Head of a PP is its preposition • Each LHS of a rule in the PCFG has a lexical item • Each RHS non-terminal has a lexical item. • One of the lexical items is shared with the LHS. • If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|) • Unary rules: O(|∑|*|R|)
Example (correct parse) Attribute grammar
Computing Lexicalized Rule Probabilities • We started with rule probabilities • VP V NP PP P(rule|VP) • E.g., count of this rule divided by the number of VPs in a treebank • Now we want lexicalized probabilities • VP(dumped) V(dumped) NP(sacks)PP(in) • P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) • Not likely to have significant counts in any treebank
Another Example • Consider the VPs • Ate spaghetti with gusto • Ate spaghetti with marinara • Dependency is not between mother-child. Vp (ate) Vp(ate) Np(spag) Vp(ate) Pp(with) np Pp(with) v np v Ate spaghetti with marinara Ate spaghetti with gusto
Log-linear models for Parsing • Why restrict to the conditioning to the elements of a rule? • Use even larger context • Word sequence, word types, sub-tree context etc. • In general, compute P(y|x); where fi(x,y) test the properties of the context; li is the weight of that feature. • Use these as scores in the CKY algorithm to find the best scoring parse.
S N NP S Adj N NP VP Adv S N underground V NP now poachers control S VP NP N N S S NP VP Adv VP Det NP N N N N NP NP VP VP V NP now the e S NP Adj V V poachers trade VP underground S VP Adv control control S NP NP VP now NP N V NP trade e N e trade Supertagging: Almost parsing Poachers now control the underground trade S S NP VP S NP V NP NP VP e N V NP e poachers : : e Adj : : : underground