240 likes | 438 Views
The CYK Parsing Method. Chiyo Hotani Tanya Petrova CL2 Parsing Course 28 November, 2007. Overview. CYK Recognition with CF grammar Basic Algorithm Problems: unit-rules, є -rules Recognition with a grammar in CNF CYK Parsing with CNF Parsing with CNF Recognition Table Chart Parsing
E N D
The CYK Parsing Method Chiyo Hotani Tanya Petrova CL2 Parsing Course 28 November, 2007
Overview • CYK Recognition with CF grammar • Basic Algorithm • Problems: unit-rules, є-rules • Recognition with a grammar in CNF • CYK Parsing with CNF • Parsing with CNF • Recognition Table • Chart Parsing • Summary • Advantages and Disadvantages • Other remarks
Basic Algorithm of CYK Recognition (1) Example Grammar: A grammar describing numbers in scientific notation Input: 32.5e+1
Basic Algorithm of CYK Recognition (2) Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Sign -> + | - derivations of substrings of length 1
Basic Algorithm of CYK Recognition (3) NumberS -> Integer | Real Integer -> Digit | Integer Digit Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 derivations of substrings of length 1 • Unit Rule: rules of the form AB, where A and B are non-terminals. We can have chains of them in a derivation.
Basic Algorithm of CYK Recognition (4) NumberS -> Integer | Real Integer -> Digit | Integer Digit Fraction -> . Integer Scale -> e Sign Integer | Empty
Basic Algorithm of CYK Recognition (5) NumberS -> Integer | Real Real -> Integer Fraction Scale Number does indeed derive 32.5e+1.
Basic Algorithm of CYK Recognition (7) • Rє = { Empty, Scale } • sentence: z = z1z2 . . . znsubstring of z starting at position i, of length l.si,l = zizi+1. . . zi+l-1 • Rsi,l: the set of non-terminals deriving the substring si,l A graphical presentation of substrings
CYK recognition with a grammar in CNF • Required restrictions: • Eliminate є-rules and unit rules • Limit the maximum length of RHS of the rule to 2 • CNF • No є-rules and unit rules • all rules have one of the following two forms: AaABC
CYK Parsing with CNF • Building the recognition table • Input : Our example grammar in CNF input sentence: 32.5 e + 1
CYK Parsing with the CNF • bottom-row : read directly from the grammar (rules of the form A a )
Two Ways to Copmute a R s i,l: • check each right-hand side • compute possible right-hand sides from the recognition table
How this is done Example: 2.5 e ( = s 2, 4) 1) N1 not in R s 2, 1 or R s 2, 2 N1 is a member of R s 2, 3 But Scale´ is not a member of R s 5, 1 2) R s 2, 4 is the set of Non- Terminals that have a right-hand side AB where either: A in R s 2, 1 and B in R s 3, 3 A in R s 2, 2 and B in R s 4, 2 A in R s 2, 3 and B in R s 5, 1 Possible combinations: N1 T2 or Number T2 In our grammar we do not have such a right-hand side, so nothing is added to R s 2, 4.
As a result we find out that: • This process is much less complicated than the one we saw before
Reasons • We do not have to repeat the process again and again until no new Non-Terminals are added to R s i,l (The substrings we are dealing with are really substrings and cannot be equal to the string we start with) • We only have to find one place where the substring must be split into two A B C Here !
Chart Parsing A chart is just a recognition table.
A short retrospective of CYK • First: recognition table using the original grammar. • Then: transforming grammar to CNF.
A short retrospective of CYK cont. • CNF is useful for improving the efficiency, but it is actually a bit too restrictive • Disadvantage of CNF: • Resulting recognition table lacks the information we need to construct a derivation using the original grammar!
A short retrospective of CYK cont. • In the transformation process, some non-terminals were thrown away (non-productive) • Missing information could be added.
A short retrospective of CYK cont. • Result: almost the same recognition table. • Extra information on non-terminals • Obtained in a simpler and much more efficient way.
Thank you for your attention!