300 likes | 457 Views
Parsing context-free grammars. Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given context-free grammar. We will review a top-down parsing algorithm a bottom-up parsing algorithm We will present the Earley algorithm.
E N D
Parsing context-free grammars • Context-free grammars specify structure, not process. • There are many different ways to parse input in accordance with a given context-free grammar. • We will review • a top-down parsing algorithm • a bottom-up parsing algorithm • We will present the Earley algorithm
S NP VP S Aux NP VP S VP NP Det Nominal NP ProperNoun VP Verb VP Verb NP Det that | this | a Noun book | flight | meal | money Verb book | include | prefer Aux does Prep from | to | on ProperNoun Houston | TWA PP P NP Nominal Nominal PP Nominal Noun Nominal Noun Nominal A simple grammar Figure 10.2
Bottom-up parsing • Yngve (1955) presented a bottom-up algorithm • Example (figure 10.4): Book that flight.
Look up words in lexicon Book is ambiguous – there are two possible POS tags for the word “Book”. Noun Det Noun Verb Det Noun Book that flight Book that flight
Build structure from bottom up NOM NOM NOM Noun Det Noun Verb Det Noun Book that flight Book that flight
Build structure from bottom up Now we have three possible structures: NPNP NOM NOM VP NOM NOM Noun Det Noun Verb Det Noun Verb Det Noun Book that flight Book that flight Book that flight
Build structure from bottom up The Noun interpretation of Book leads to a dead end, so only two parse trees survive: VP NP NP VP NOM NOM Verb Det Noun Verb Det Noun Book that flight Book that flight
Build structure from bottom up There is way to combine a VP and an NP to form an S, so only one parse tree survives: S VP NP NOM Verb Det Noun Book that flight
Build structure from top down When parsing top-down, we start with the grammar’s start symbol and apply productions to try to match input: S Book that flight
Build structure from top down Here we show only the successful choices: S VP Book that flight
Build structure from top down Here we show only the successful choices: S VP NP Verb Book that flight
Build structure from top down Here we show only the successful choices: S VP NP Verb Book that flight
Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Book that flight
Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Book that flight
Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Noun Book that flight
Build structure from top down Here we show only the successful choices: S VP NP NOM Verb Det Noun Book that flight
Top-down advantages Doesn’t explore trees which cannot be S Subtrees fit under S Top-down disadvantages Many fruitless trees are explored: trees explored may have no hope of matching input Bottom-up advantages All trees explored are consistent with input Bottom-up disadvantages Builds structure even if S cannot be formed Builds neighboring structures which can never combine Top-down versus bottom-up approaches
Approaches to dealing with ambiguity • parallel exploration • depth-first strategy with backtracking
Improving top-down parsing • Make top-down parser pay attention to input with bottom-up filtering (left-corner parsing) • “The parser should not consider any grammar rule if he current input cannot serve as the first word along the left edge of some derivation from this rule.” [pg. 369] • Left corners are pre-compiled.
Problems with top-down parsers • left-recursion X * X * Infinite loop in derivation! • ambiguity not efficiently handled • recomputation subtrees can be built multiple times (built, then thrown away during backtracking)
Earley’s algorithm • Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. • Dynamic programming involves storing of results so they don’t ever need to be recomputed. • Dynamic programming reduces exponential time requirement to polynomial time requirement: O(N3), where N is length of input in words.
Data structure • Earley’s algorithm uses a data structure called a chart to store information about the progress of the parse. • A chart contains an entry for each position in the input • A position occurs before the first word, between words, and after the last word. word1 word2 … wordN • A position is represented by a number; positions in the input are numbered from 0 (at the left) to N (at the right).
Chart details • A chart entry consists of a sequence of states. • A state represents • a subtree corresponding to a single grammar rule • information about how much of a rule has been processed • information about the span of the subtree w.r.t. the input • A state is represented by an annotated grammar rule • a dot () is used to show how much of the rule has been processed • a pair of positions, [x,y], indicates the span of the subtree w.r.t. the input; x is the position of the left edge of the subtree, and y is the position of the dot.
Three operators on a chart • Predictor • applies when NonTerminal to right of in a state is not a POS category (i.e. is not a pre-terminal) • adds states to current chart entry • Scanner • applies when NonTerminal to right of in a state is a POS category (i.e. is a pre-terminal) • adds states to next chart entry • Completer • applies when there is no NonTermial (and hence no Terminal) to right of in a state (i.e. is at end) • adds states to current chart entry
Predictor • Suppose rule to which Predicator applies is: X NT [x,y] • Predictor adds, to the current chart entry, a new state for each possible expansion of NT • For each expansion EX of NT, state added is NT EX [y,y]
Scanner • Suppose rule to which Scanner applies is: X POS [x,y] • Scanner adds, to the next chart entry, a new state for each possible expansion of POS • The new state added is X POS [x,y+1]
Completer • Suppose rule to which Completer applies is: X [x,y] • Completer adds, to the current chart entry, a new state for each possible reduction using the (now completed) state • For each state (from any earlier chart entry) of the form Y X [w,x] a new state of the following form is added Y X [w,y]
Completer (modification) • In order to recover parse tree information from the chart once parsing is complete, we need to modify the completer slightly. • Each state in the chart must be given a unique identifier (N for state N) • Each time the completer adds a state, it also adds the unique identifier of the state completed to the list of previous states for that new state (which is a copy of an already existing state, waiting for the category which the current state just completed).
Example (from text) • (work through on board)