170 likes | 279 Views
10. Parsing with Context-free Grammars -Speech and Language Processing-. 발표자 : 정영임 발표일 : 2007. 8. 7. 10.4 The Earley Algorithm. Earley Algorithm Dynamic programming Solution for those three parsing problems Information Represented by Chart: N+1 entries (N: Number of words) Dotted rule
E N D
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자: 정영임 발표일: 2007. 8. 7.
10.4 The Earley Algorithm • Earley Algorithm • Dynamic programming • Solution for those three parsing problems • Information Represented by • Chart: N+1 entries (N: Number of words) • Dotted rule e.g.) S → • VP [0,0]
10.4 The Earley Algorithm • Fig.10.16 The Earley algorighm
10.4 The Earley Algorithm • Predictor • To create new states representing top-down expectations • is applied to any state that has a non-terminal immediately to the right of its dot that is not a part-of-speech category • results in the creation of one new state for each alternative expansion of that non-terminal provided by the grammar • begins and ends at the point in the input where the generating state ends. • Example • S→• VP, [0,0] Predictor
10.4 The Earley Algorithm • Scanner • is called to examine the input and incorporate a state corresponding to the prediction of a word with a particular part-of-speech into the chart. • is accomplished by creating a new state from the input state with the dot advanced over the predicted input category. • Example • VP→• Verb NP, [0,0] • Scanner consults the current word in the input since the category following the dot is a part-of-speech. • It notes that book can be a verb, matching the expectation in the current state • This results in the creation of the new state VP → Verb• NP, [0,1]. • The new state is added to the chart entry that follows the one currently being processed
10.4 The Earley Algorithm • Completer • is applied to a state when its dot has reached the right end of the rule. • is to find, and advance, all previously created states that were looking for this grammatical category at this position in the input. • New states are then created by copying the older state, advancing the dot over the expected category, and installing the new state in the current chart entry. • Example • NP→ Det Nominal• [1,3] • Completer looks for states ending at 1 expecting an NP • VP→ Verb • NP, [0,1] • This results in the addition of a new complete state VP→ Verb NP •, [0,3]
10.4 The Earley Algorithm • Fig.10.17 An Example “Book that filght.”
10.4 The Earley Algorithm • Retrieving Parse Trees from a Chart • the version of the Earley algorithm is a recognizer not a parser. • valid sentences will leave the state S → α •, [0,N] in the chart. • Extraction of individual parses from the chart • the representation of each state must be augmented with an additional field to store information about the completed states that generated its constituents. • change necessary is to have COMPLETER add a pointer to the older state onto a list of constituent-states for the new state. • following pointers starting with the state (or states) representing a complete S in the final chart entry.
10.4 The Earley Algorithm • Retrieving Parse Trees from a Chart 8 14 18 19 18 21 14 19 22 8 21 23 22 10.18
10.4 The Earley Algorithm • Cost at tree retrieval process • if there are an exponential number of trees for a given sentence, the algorithm will require an exponential amount of time to return them all. • The Earley algorithm may fill the table in O(N3) time but it can’t magically return them as quickly.
10.5 Finite-State Parsing Methods • Efficient in a partial parse or shallow parse • Recognition of basic phrases(noun groups, verb groups, location, preposition and etc.) • Extraction of some sort of template in required data
10.5 Finite-State Parsing Methods • Finite-state rules for detecting noun groups(NG) • NG → Pronoun|Time-NP|Date-NP • NG → (DETP)(Adjs) HdNns|DETP Ving HdNns|DETP-CP (and HdNns) • DETP → DETP-CP|DETP-INCP • DETP-CP → ({Adv-pre-num|“another”|{Det|Pro-Poss}({Adv-pre-num|“only”(“other”)})})Number|Q|Q-er|(“the”)Q-est| “another”|Det-cp|DetQ|Pro-Poss-cp • DETP-INCP {{{Det|Pro-Poss}|“only”|“a”|“an”|Det-incomp|Pro-Poss-incomp}(“other”)|(DET-CP)“other”} • Adjs → AdjP({ “,”|(“,”) Conj}{AdjP|Vparticiple})* • AdjP → Ordinal|{(Q-er|Q-est}{Adj|Vparticiple}+|Number(“-”){“month”| “day” | “year”}(“-”) “old”}
10.5 Finite-State Parsing Methods • Finite-state rules for detecting noun groups(NG) (Ctnd’) • HdNns -> HdNn(“and” HdNn) • HdNn -> PropN|{PreNs|PropNPreNs}N[!Time-NP] |{PropN CommonN[!Time-NP]} • PreNs -> PreN(“and” PreN2)* • PreN -> (Adj”-”)Common-Sing-N • PreN2 -> PreN|Ordinal|Adj-noun-like
10.5 Finite-State Parsing Methods • Fig. 10.20-10.21
10.5 Finite-State Parsing Methods • Handling recursion of complete English grammar • Allowing only a limited amount of recursion • FASTUS does this by using its automata cascade • The second level of FASTUS finds non-recursive noun group • The third level combines these groups into larger NP-like units by • adding on measure phrases • 20,000 iron and “metal wood” clubs a month • Attaching preposition phrases • Production of 20,000 iron and “metal wood” … • Dealing with noun group conjunction • A local concern and a Japanese trading house => By splitting the parsing into two levels, NP on the left side is treated as a different kind of object from NP on the right side
10.5 Finite-State Parsing Methods • Handling recursion of complete English grammar • Chunk-based partial parsing via a set of finite-set cascades(Abney, 1996)
10.5 Finite-State Parsing Methods • Handling recursion of complete English grammar • Recursive Transition Network(RTN) • RTN is defined by a set of graphs like those in Fig.10.20 and Fig. 10.21 • Each arc contains a terminal or non-terminal node • Difference between RTN and FSA • In an RTN, whenever the machine comes to an arc labeled with a non-terminals, it treats that non-terminal as a subroutine • It places its current location onto a stack • It jumps to the non-terminal • Then it jumps back when that non-terminal has been parsed • RTN is exactly equivalent to a context-free grammar • A graphical way to view a simple top-down parser for context-free rules