Efficient Parsing Techniques and Strategies for Context-Free Grammars

PARSING WITH CONTEXT-FREE GRAMMARS cc437

PARSING • Parsing is the process of recognizing and assigning STRUCTURE • Parsing a string with a CFG: • Finding a derivation of the string consistent with the grammar • The derivation gives us a PARSE TREE

EXAMPLE (CFR LAST WEEK)

PARSING AS SEARCH • Just as in the case of non-deterministic regular expressions, the main problem with parsing is the existence of CHOICE POINTS • There is a need for a SEARCH STRATEGY determining the order in which alternatives are considered

TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES • The search has to be guided by the INPUT and the GRAMMAR • TOP-DOWN search: the parse tree has to be rooted in the start symbol S • EXPECTATION-DRIVEN parsing • BOTTOM-UP search: the parse tree must be an analysis of the input • DATA-DRIVEN parsing

AN EXAMPLE OF TOP-DOWN SEARCH(IN PARALLEL)

AN EXAMPLE OF BOTTOM-UP SEARCH

NON-PARALLEL SEARCH • If it’s not possible to examine all alternatives in parallel, it’s necessary to make further decisions: • Which node in the current search space to expand first (breadth-first or depth-first) • Which of the applicable grammar rules to expand first • Which leaf node in a parse tree to expand next (e.g., leftmost)

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)

TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)

A T-D, D-F, L-R PARSER

TOP-DOWN vs BOTTOM-UP • TOP-DOWN: • Only search among grammatical answers • BUT: suggests hypotheses that may not be consistent with data • Problem: left-recursion • BOTTOM-UP: • Only forms hypotheses consistent with data • BUT: may suggest hypotheses that make no sense globally

LEFT-RECURSION • A LEFT-RECURSIVE grammar may cause a T-D, D-F, L-R parser to never return • Examples of left-recursive rules: • NP  NP PP • S  S and S • But also: • NP  Det Nom • Det  NP’s

THE PROBLEM WITH LEFT-RECURSION

LEFT-RECURSION: POOR SOLUTIONS • Rewrite the grammar to a weakly equivalent one • Problem: may not get correct parse tree • Limit the depth during search • Problem: limit is arbitrary

LEFT-CORNER PARSING • A hybrid of top-down and bottom-up parsing • Strategy: don’t consider any expansion unless the current input can serve as the LEFT-CORNER of that expansion

FURTHER PROBLEMS IN PARSING • Ambiguity • Church and Patel (1982): the number of attachment ambiguities grows like the Catalan numbers • C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867 • Avoiding reparsing

COMMON STRUCTURAL AMBIGUITIES • COORDINATION ambiguity • OLD (MEN AND WOMEN) vs (OLD MEN) AND WOMEN • ATTACHMENT ambiguity: • Gerundive VP attachment ambiguity • I saw the Eiffel Tower flying to Paris • PP attachment ambiguity • I shot an elephant in my pajamas

PP ATTACHMENT AMBIGUITY

AMBIGUITY: SOLUTIONS • Use a PROBABILISTIC GRAMMAR (not covered in this module) • Use semantics

AVOID RECOMPUTING INVARIANTS • Consider parsing with a top-down parser the NP: • A flight from Indianapolis to Houston on TWA • With the grammar rules: • NP  Det Nominal • NP  NP PP • NP  ProperNoun

INVARIANTS AND TOP-DOWN PARSING

THE EARLEY ALGORITHM

DYNAMIC PROGRAMMING • A standard T-D parser would reanalyze A FLIGHT 4 times, always in the same way • A DYNAMIC PROGRAMMING algorithm uses a table (the CHART) to avoid repeating work • The Earley algorithm also • Does not suffer from the left-recursion problem • Solves an exponential problem in O(n3)

THE CHART • The Earley algorithm uses a table (the CHART) of size N+1, where N is the length of the input • Table entries sit in the `gaps’ between words • Each entry in the chart is a list of • Completed constituents • In-progress constituents • Predicted constituents • All three types of objects are represented in the same way as STATES

THE CHART: GRAPHICAL REPRESENTATION

STATES • A state encodes two types of information: • How much of a certain rule has been encountered in the input • Which positions are covered • A  , [X,Y] • DOTTED RULES • VP  V NP  • NP  Det  Nominal • S   VP

EXAMPLES

SUCCESS • The parser has succeeded if entry N+1 of the chart contains the state • S   , [0,N]

THE ALGORITHM • The algorithm loops through the input without backtracking, at each step performing three operations: • PREDICTOR: add predictions to the chart • COMPLETER: Move the dot to the right when looked-for constituent is found • SCANNER: read in the next input word

THE ALGORITHM: CENTRAL LOOP

EARLEY ALGORITHM: THE THREE OPERATORS

EXAMPLE, AGAIN

EXAMPLE: BOOK THAT FLIGHT

EXAMPLE: BOOK THAT FLIGHT (II)

EXAMPLE: BOOK THAT FLIGHT (III)

EXAMPLE: BOOK THAT FLIGHT (IV)

READINGS • Jurafsky and Martin, chapter 10.1-10.4

Efficient Parsing Techniques and Strategies for Context-Free Grammars