410 likes | 553 Views
Earley’s algorithm. Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming involves storing of results so they don’t ever need to be recomputed.
E N D
Earley’s algorithm • Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. • Dynamic programming involves storing of results so they don’t ever need to be recomputed. • Dynamic programming reduces exponential time requirement to polynomial time requirement: O(N3), where N is length of input in words.
Data structure • Earley’s algorithm uses a data structure called a chart to store information about the progress of the parse. • A chart contains an entry for each position in the input • A position occurs before the first word, between words, and after the last word. word1 word2 … wordN • A position is represented by a number; positions in the input are numbered from 0 (at the left) to N (at the right).
Chart details • A chart entry consists of a sequence of states. • A state represents • a subtree corresponding to a single grammar rule • information about how much of a rule has been processed • information about the span of the subtree w.r.t. the input • A state is represented by an annotated grammar rule • a dot () is used to show how much of the rule has been processed • a pair of positions, [x,y], indicates the span of the subtree w.r.t. the input; x is the position of the left edge of the subtree, and y is the position of the dot.
Three operators on a chart • Predictor • applies when NonTerminal to right of in a state is not a POS category (i.e. is not a pre-terminal) • adds states to current chart entry • Scanner • applies when NonTerminal to right of in a state is a POS category (i.e. is a pre-terminal) • adds states to next chart entry • Completer • applies when there is no NonTerminal (and hence no Terminal) to right of in a state (i.e. is at end) • adds states to current chart entry
Predictor • Suppose state to which Predictor applies is: X NT [x,y] • Predictor adds, to the current chart entry, a new state for each possible expansion of NT • For each expansion EX of NT, state added is NT EX [y,y]
Scanner • Suppose rule to which Scanner applies is: X POS [x,y] • Scanner adds, to the next chart entry, a new state if the word in the next position can be a member of the category POS. • The new state added is POS word [y,y+1]
Completer • Suppose rule to which Completer applies is: X [x,y] • Completer adds, to the current chart entry, a new state for each possible reduction using the (now completed) state • For each state (from any earlier chart entry) of the form Y X [w,x] a new state of the following form is added Y X [w,y]
Completer (modification) • In order to recover parse tree information from the chart once parsing is complete, we need to modify the completer slightly. • Each state in the chart must be given a unique identifier (N for state N) • Each time the completer adds a state, it also adds the unique identifier of the state completed to the list of previous states for that new state (which is a copy of an already existing state, waiting for the category which the current state just completed).
chart[0] – initial state Id state span previous states 0: S [0,0] [] This is a dummy start state.
chart[0] – after 0: S (Predictor) Id state span previous states 0: S [0,0] [] 1: S NP VP [0,0] [] 2: S Aux NP VP [0,0] [] 3: S VP [0,0] []
chart[0] – after 1: S NP VP (Predictor) Id state span previous states 0: S [0,0] [] 1: S NP VP [0,0] [] 2: S Aux NP VP [0,0] [] 3: S VP [0,0] [] 4: NP Det Nominal [0,0] [] 5: NP ProperNoun [0,0] []
chart[1] – after 2: S Aux NP VP (Scanner) Since the input does not start with an auxiliary verb, the scanner does not add any state to chart[1], which therefore remains empty.
chart[0] – after 3: S VP (Predictor) Id state span previous states 0: S [0,0] [] 1: S NP VP [0,0] [] 2: S Aux NP VP [0,0] [] 3: S VP [0,0] [] 4: NP Det Nominal [0,0] [] 5: NP ProperNoun [0,0] [] 6: VP Verb [0,0] [] 7: VP Verb NP [0,0] []
chart[1] – after 4: NP Det Nominal (Scanner) Since the input does not start with an determiner, the scanner does not add any state to chart[1], which therefore remains empty.
chart[1] – after 5: NP ProperNoun (Scanner) Since the input does not start with an proper noun, the scanner does not add any state to chart[1], which therefore remains empty.
chart[1] – after 6: VP Verb (Scanner) Id state span previous states 8: Verb book [0,1] []
chart[1] – after 7: VP Verb NP (Scanner) Id state span previous states 8: Verb book [0,1] [] The state to be added is already in chart[1], so no change.
chart[0] 0: S [0,0] [] 1: S NP VP [0,0] [] 2: S Aux NP VP [0,0] [] 3: S VP [0,0] [] 4: NP Det Nominal [0,0] [] 5: NP ProperNoun [0,0] [] 6: VP Verb [0,0] [] 7: VP Verb NP [0,0] [] chart[1] 8: Verb book [0,1] [] After finishing processing of chart[0]
chart[1] – after 8: Verb book (Completer) Id state span previous states 8: Verb book [0,1] [] 9: VP Verb [0,1] [8] 10: VP Verb NP [0,1] [8] The completer moves the dot in those states already in a chart state with annotation [0,0] More generally, for a completed state with annotation [j,k], the completer moves the dot in those states already in a chart state with annotation [i,j].
chart[1] – after 9: VP Verb (Completer) Id state span previous states 8: Verb book [0,1] [] 9: VP Verb [0,1] [8] 10: VP Verb NP [0,1] [8] 11: S VP [0,1] [9]
chart[1] – after 10: VP Verb NP (Predictor) Id state span previous states 8: Verb book [0,1] [] 9: VP Verb [0,1] [8] 10: VP Verb NP [0,1] [8] 11: S VP [0,1] [9] 12: NP Det Nominal [1,1] [] 13: NP ProperNoun [1,1] []
chart[1] – after 11: S VP (Completer) Id state span previous states 8: Verb book [0,1] [] 9: VP Verb [0,1] [8] 10: VP Verb NP [0,1] [8] 11: S VP [0,1] [9] 12: NP Det Nominal [1,1] [] 13: NP ProperNoun [1,1] [] The book does not process this rule. I’m not sure why. However, if it were processed it would clearly not indicate a successful parse since it does not span entire input.
chart[2] – after 12: NP Det Nominal (Scanner) Id state span previous states 14: Det that [1,2] []
chart[2] – after 13: NP ProperNoun (Scanner) Id state span previous states 14: Det that [1,2] [] Since the input does not start with an proper noun, the scanner does not add any state to chart[2], which therefore remains the same.
After finishing processing of chart[1] chart[1] 8: Verb book [0,1] [] 9: VP Verb [0,1] [8] 10: VP Verb NP [0,1] [8] 11: S VP [0,1] [9] 12: NP Det Nominal [1,1] [] 13: NP ProperNoun [1,1] [] chart[2] 14: Det that [1,2] []
chart[2] – after 14: Det that (Completer) Id state span previous states 14: Det that [1,2] [] 15: NP Det Nominal [1,2] [14]
chart[2] – after 15: NP Det Nominal (Predictor) Id state span previous states 14: Det that [1,2] [] 15: NP Det Nominal [1,2] [14] 16: Nominal Noun [2,2] [] 17: Nominal Noun Nominal [2,2] []
chart[3] – after 16: Nominal Noun (Scanner) Id state span previous states 18: Noun flight [2,3] []
chart[3] – after 17: Nominal Noun Nominal (Scanner) Id state span previous states 18: Noun flight [2,3] [] The state to be added is already in chart[3], so no change.
After finishing processing of chart[2] chart[2] 14: Det that [1,2] [] 15: NP Det Nominal [1,2] [14] 16: Nominal Noun [2,2] [] 17: Nominal Noun Nominal [2,2] [] chart[3] 18: Noun flight [2,3] []
chart[3] – after 18: Noun flight (Completer) Id state span previous states 18: Noun flight [2,3] [] 19: Nominal Noun [2,3] [18] 20: Nominal Noun Nominal [2,3] [18]
chart[3] – after 19: Nominal Noun (Completer) Id state span previous states 18: Noun flight [2,3] [] 19: Nominal Noun [2,3] [18] 20: Nominal Noun Nominal [2,3] [18] 21: NP Det Nominal [1,3] [14, 19]
chart[3] – after 20: Nominal Noun Nominal (Predictor) Id state span previous states 18: Noun flight [2,3] [] 19: Nominal Noun [2,3] [18] 20: Nominal Noun Nominal [2,3] [18] 21: NP Det Nominal [1,3] [14, 19] 22: Nominal Noun [3,3] [] 23: Nominal Noun Nominal [3,3] []
chart[3] – after 21: NP Det Nominal (Completer) Id state span previous states 18: Noun flight [2,3] [] 19: Nominal Noun [2,3] [18] 20: Nominal Noun Nominal [2,3] [18] 21: NP Det Nominal [1,3] [14, 19] 22: Nominal Noun [3,3] [] 23: Nominal Noun Nominal [3,3] [] 24: VP Verb NP [0,3] [8, 21]
chart[3] – after 22: Nominal Noun (Scanner) Id state span previous states 18: Noun flight [2,3] [] 19: Nominal Noun [2,3] [18] 20: Nominal Noun Nominal [2,3] [18] 21: NP Det Nominal [1,3] [14, 19] 22: Nominal Noun [3,3] [] 23: Nominal Noun Nominal [3,3] [] 24: VP Verb NP [0,3] [8, 21] Since there is no more input, no new states are added.
chart[3] – after 23: Nominal Noun Nominal (Scanner) Id state span previous states 18: Noun flight [2,3] [] 19: Nominal Noun [2,3] [18] 20: Nominal Noun Nominal [2,3] [18] 21: NP Det Nominal [1,3] [14, 19] 22: Nominal Noun [3,3] [] 23: Nominal Noun Nominal [3,3] [] 24: VP Verb NP [0,3] [8, 21] Since there is no more input, no new states are added.
chart[3] – after 24: VP Verb NP (Completer) Id state span previous states 18: Noun flight [2,3] [] 19: Nominal Noun [2,3] [18] 20: Nominal Noun Nominal [2,3] [18] 21: NP Det Nominal [1,3] [14, 19] 22: Nominal Noun [3,3] [] 23: Nominal Noun Nominal [3,3] [] 24: VP Verb NP [0,3] [8, 21] 25: S VP [0,3] [24]
We’re done! • All states in chart[3] have been processed, no new states have been added to chart[4], and a state with LHS S spanning all the input is in chart[3]: 25: S VP [0,3] [24]
The basic idea is to trace back through the “previous state” links: 25: S VP [0,3] [24] 24: VP Verb NP [0,3] [8, 21] 21: NP Det Nominal [1,3] [14, 19] 19: Nominal Noun [2,3] [18] 18: Noun flight [2,3] [] 14: Det that [1,2] [] 8: Verb book [0,1] [] Recovering the tree S VP Verb NP Det Nominal Noun book that flight