280 likes | 412 Views
Optimal Ambiguity Packing in Context-Free Parsers with Interleaved Unification. Alon Lavie Carnegie Mellon University and Carolyn Penstein Rosé University of Pittsburgh. Outline. CF Parsers with Interleaved Unification The Problem: Packing with Interleaved Unification
E N D
Optimal Ambiguity Packing in Context-Free Parsers with Interleaved Unification Alon Lavie Carnegie Mellon University and Carolyn Penstein Rosé University of Pittsburgh
Outline • CF Parsers with Interleaved Unification • The Problem: Packing with Interleaved Unification • The Rule Prioritization Heuristic • Why is the Heuristic Optimal? • Experimental Evaluation • Discussion and Conclusions
Unification-Augmented CFGs • CFGs can be parsed efficiently (cubic time) • Unification-based grammars (i.e. HPSG) are more difficult to efficiently parse • Unification-augmented CFGs are a good compromise: • context-free backbone grammar • rules augmented with unification constraints • parsing produces a c-structure and f-structure
Unification-augmented CFG: Example (<DECL> <--> (<NP> <VP>) (((x2 agr) = (x1 agr)) ((x0 subject) = x1) ((x2 form) = *finite) (x0 = x2)))
CF Parsing with Interleaved Unification • f-structure computation is interleaved with the context-free c-structure computation • unification of functional constraints associated with a rule applied whenever the parser completes a constituent according to the rule • if parsing is bottom-up: the f-structure of the LHS constituent computed from the f-structures of the RHS constituents • if unification fails - the rule fails and LHS constituent is pruned from further consideration
Local Ambiguity Packing • NL grammars are often highly ambiguous • Number of parses as a function of sentence length may be exponential • a Local Ambiguity: a portion of the input that can be analyzed as a particular grammar category in multiple ways • Local Ambiguity Packing: the multiple sub-parses are stored in a common data-structure indexed by a single pointer. The parser can refer to the entire set of sub-parses using this pointer
Utilizing Local Ambiguity Packing • Parsing algorithm must be able to detect all local ambiguities and pack them together • Some parsing algorithms are better suited for local ambiguity packing: • Tabular parsing algorithms synchronize processing so that local ambiguities are easy to identify • GLR is not capable of performing full ambiguity packing: only constituents in same state contexts • Differences in packing effectiveness may account for conflicting evidence on parsing efficiency of Chart parsing versus GLR parsing
The Problem: Ambiguity Packing with Interleaved Unification • Most CF parsing algorithms are under-specified in terms of how to pursue multiple analyses • Parsing actions of different ambiguities may be arbitrarily interleaved • in Chart Parsing:which inactive edge should be picked next from the agenda? • In GLR Parsing: which of multiple reduce actions should be picked to perform next. • The particular order of parsing actions determines if and when local ambiguities are detected
The Problem: Ambiguity Packing with Interleaved Unification • A new local ambiguity may be detected after the packed constituent has been further processed • with pure CF parsing - just pack the new analysis into the existing packed node • Problem with unification - the f-structures have already been computed, must be re-computed • Alternatively - do not pack, create a new node • Our Goal: order the parsing actions so that local ambiguities are detected prior to the parse node being further processed.
Example: GLR Parsing • In GLR parsing - choice of which reduction to perform next • Assume we just performed a reduction by rule R0:[A --> B C] creating a constituent A: (4,7) • Assume we have a choice between the following rule reductions: • R1:[D --> A], reducing the recent A to D: (4,7) • R2:[A --> E F], creating a new constituent A: (4,7) • R3:[G --> B A], reducing B and previous A to G: (3,7) • Preferred choice: R2 • may allow packing new A with previousA
How to Prioritize the Rules? • Goal: find a fast rule ordering heuristic that can achieve maximal ambiguity packing • Main idea: we wish to delay applying rules that further process A until all other As of same span have been detected and packed. • The Rightmost Criterion: select rule that creates a constituent with the rightmost starting position • This is sufficient if grammar has no unary or epsilon rules! • Originally observed by Tomita and applied in GLR implementation, but not published
Improved Heuristic for Unary Rules • With unary rules, rightmost is not enough: • In our example: both R1 and R2 are rightmost, but R1 would further process the previous A before R2 detects a new local ambiguity • We need to extend the heuristic to model the dependency between constituents in unary rules • We define a partial order relation GE between constituents: • for every unary rule [A -->B] in the grammar, GE(A,B) • compute GE* - the transitive closure of GE • Extended Heuristic: among rightmost rules, pick the one with the “GE-least” LHS category
Rule Ordering Heuristic for GLR Input: a set of applicable grammar rule reductions Output: a selected grammar rule reduction to perform next Heuristic: (1) For each potential grammar rule reduction, determine the span and category of the resulting (reduced) constituent (2) Select the rule reduction that is rightmost - has the greatest start position (3) If there are multiple rules reductions that are rightmost, pick one that results in a category that is GE*-least.
Handling Epsilon Rules • Epsilon rules are still a problem: • there may be non-unary rules that further process A and that are still rightmost • Problem is similar to unary rules and can be treated via a revised partial order: 1. Find all nullable symbols in grammar G 2. Define a revised partial order GEE(A,B): (a) if GE(A,B) then GEE(A,B) (b) for every rule [A --> B1 B2 … Bk] if all Bi are nullable, then for all i, GEE(A,Bi) if at most one Bi is not nullable, then GEE(A,Bi) (c) compute GEE* - the transitive closure if GEE
Rule Ordering Heuristic: Properties • The heuristic is extremely fast to apply at runtime • The GEE* partial order can be statically computed from the grammar • It is possible for a grammar to have both GEE*(A,B) and GEE*(B,A) - the grammar is cyclic, but unification may resolve the cycle • This may result in sub-optimal ambiguity packing • Heuristic is best possible given just the static CF structure of the grammar • More sophisticated tests are most likely not cost effective computationally
Sketch of Optimality Proof • Assume it is not optimal • constituent A created, then B created using A, then another A of same span created and not packed • assume second A not a result of processing first A • look at sequence of rules applied after B was created and until second A was created • all of these constituents A, B, Xi have same span • according to definition of GEE*, GEE*(A,Xi) • also GEE*(B,A) thus GEE*(B,Xi) • at least one of the Xi was available when rule creating B was selected, so B was not least.
Rule Prioritization in Chart Parsing • The Agenda stores completed constituents waiting to be processed (used to extend active arcs) • Ambiguity packing is done on items stored in the Agenda (thus, not yet further processed) • Prioritize the order in which items are taken out from the Agenda • Same criteria: rightmost and GEE*
Empirical Evaluations • Two parsers: a GLR parser and a Chart parser • Both parsers also have robust versions - GLR* and LCFlex - robust mode adds significant amounts of ambiguity • Same LFG-style syntactic grammar • Grammar has 412 rules and 71 categories and produces complete predicate-argument f-structure • GLR parsing table has 628 states and 8822 actions • Test set of 520 sentences from ESST domain
Results: Non-Robust Parsers • Significant improvements in both number of parse nodes and parse times • For sentences of length 12: • GLR: 12% less nodes, 21% less time • LC Parser: 40% less nodes, 21% less time
Results: Robust Parsers • GLR* run with search beam of 30 • LCFlex set to simulate same skipping behavior of GLR* • Significant reductions in both number of parse nodes and parsing times • For sentences of length 12: • GLR*: 19% less nodes, 44% less time • LCFlex: 39% less nodes, 21% less time
Additional Independent Evaluation • Conducted by Paul Placeway at CMU • Rule ordering heuristic incorporated into independent parsing system for syntactic analysis of documentation manuals: • similar grammar formalism • different highly efficient Chart Parser with LC predictions, grammar path compression • different grammar and test set
Additional Independent Evaluation: Results condition CPU Gross Num Num time Memory Entries Arcs (sec) (kB) Strawman 2463 690960 592589 406889 Rightmost 2231 603603 491087 357842 (10.4%) (14.5%) (20.7%) (13.7%) Full >=* 2173 599310 483921 353197 comp to r'most: (2.7%) (0.7%) (1.5%) (1.3%) comp to straw: (13.3%) (15.3%) (22.5%) (15.2%)
Further Issues • Efficient packing of the f-structures • [Maxwell & Kaplan 91,93] [Miyao 99] • Other strategies for combining CF parsing and unification: • sequential composition • multi-pass parsing, with partial/full unification • Additional possible tie-breaking secondary ordering heuristics: • use a probabilistic model • apply a FIFO or “match the most recent” policy
Future Work • Further investigate f-structure packing and multi-pass strategies • Further development of the LCFlex Parser • Investigating the tight relationship between the parser’s robustness features, search strategy and disambiguation mechanisms