1 / 14

The Cocke-Kasami-Younger Algorithm

The Cocke-Kasami-Younger Algorithm. An example of a CFG in CNF. An example of bottom-up parsing, for CFG in Chomsky normal form. G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b. 2 possibilities for first production. S. S. S. B. B. A. A. B. A.

lael
Download Presentation

The Cocke-Kasami-Younger Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Cocke-Kasami-Younger Algorithm An example of a CFG in CNF An example of bottom-up parsing, for CFG in Chomsky normal form G : S  AB | BB A CC | AB | a B  BB | CA | b C  BA | AA | b 2 possibilities forfirst production S S S B B A A B A aa bb a abb aab b S S S Possible splits for the string aabb B B B B B B aa bb a abb aab b

  2. The CKYounger Algorithm 4,1 3,1 3,2 2,1 2,2 2,3 1,1 1,2 1,3 1,4 a a b b Provides an efficient way of generating substring devisions and checking whether each substring can be legally derived Thus if the cell (4,1) contains S, string  L(G) A non terminal will be placed in the cell (i,j) if it can derive i consecutive symbolsof the string starting at jth position If the cell (i,j) contains the nonterminal A1 and the cell (i’,i+j) contains the nonterminal A2 and there is a production A A1 A2 then the cell (i+i’,j) will contain the nonterminal A

  3. The CKYounger Algorithm 4,1 3,1 3,2 2,1 2,2 2,3 1,1 1,2 1,3 1,4 a a b b Provides an efficient way of generating substring devisions and checking whether each substring can be legally derived G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b A nonterminal will be placed in the cell (i,j) if it can derive i consecutive symbolsof the string starting at jth position

  4. The Cocke-Kasami-Younger Algorithm S S S A B A B B A a a a a a a a a a a a a Relation derivation tree and pyramid S S S B A B A B A aa bb aab b a abb

  5. S S S B B B B B B a a a a a a a a a a a a S S S B B B B B B aa bb a abb aab b

  6. The Cocke-Kasami-Younger Algorithm A B,C A B,C a a a a Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b Step 1, fill the cell at row 1 Because of A  a Because of B  b, and C  b

  7. The Cocke-Kasami-Younger Algorithm C A S,B A B,C A B,C a a a a Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b B is in cell (2,3) Because of B  BB and B is in cell (1,3) and B is in cell (1,4) Step 2, fill the cell at row 2 C is in cell (2,1) Because of C  AA and A is in cell (1,1) and A is in cell (1,2) A is in cell (2,2) Because of A  AB and A is in cell (1,2) and B is in cell (1,3) S is in cell (2,3) Because of S  BB and B is in cell (1,3) and B is in cell (1,4)

  8. The Cocke-Kasami-Younger Algorithm A,C S,A,C C A S,B A B,C A B,C a a a a Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b C is in cell (3,1) Because of C  AA and A is in cell (1,1) and A is in cell (2,2) Step 3, fill the cell at row 3 ? is in cell (3,1) Because of ?  XY X is in cell (1,1) Y is in cell (2,2) or X is in cell (2,1) Y is in cell (1,3) or A is in cell (3,1) Because of C  CC and C is in cell (2,1) and C is in cell (1,3)

  9. The Cocke-Kasami-Younger Algorithm A,B,C,S A,C S,A,C C A S,B A B,C A B,C a a a a Builds up the pyramid in a bottom-up fashion G : S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b Since S is at the top, aabb  L(G) Step 4, fill the cell at row 4 S General rule ? is in cell (i,j) Because of ?  XY X is in cell (m,j) Y is in cell (i-m,j+m) with 1 ≤ m ≤ i-1 Step i B A b C C A A b a a

  10. Theorem The CKY algorithm is correct Given a grammar (T, N, P, S) in Chomsky normal form and w = x1...xn T* then A  N is in cell (i,j) of the CKY pyramid if and only if A xj...xj+i-1 Proof by induction on the row number Base step i= 1 in row 1 we get the nonterminals from which length 1 substrings of the string to parse can be derived. This is only possible by using productions of type A  a. Thus if A is in cell (1,i), 1 ≤ i ≤ n, then A xi  P, thus A xi Induction hypothesis theorem applies for all rows < i, i.e. all substrings of length < i. * *

  11. Induction step we first prove  Assume a derivation of a substring of length i, i>1, ABC xj...xj+i-1, then for some m > 0there must hold that B xj...xj+m-1 andC xj+m...xj+i-1. Thus by the induction hypothesis if B is in cell (m,j) and C in the cell (i-m, j+m). Since there is a production A BC, A is in the cell (i,j). We now prove  Assume A is in the cell (i,j), then form A we can derive a string xj...xj+i-1, with length i > 1, therefore there must be a production of the form A BC with B,C  N, and for some m, 1 ≤ m ≤ i-1, B is in cell (m,j) and C in the cell (i-m, j+m). By the induction hypothesis we have B xj...xj+m-1 and C xj+m...xj+i-1. Therefore we can write ABC xj...xj+i-1 and conclude Axj...xj+i-1 * * * * * Both cells have a lower row #, so induction hypothesis applies

  12. The complexity of the CKY algorithm The time complexity for wL(G)? • Let G = (T, N, P, S) be a CFG in Chomsky normal form, with k = #N. • Then using the CKY algorithm, w L(G) can be decided in time proportional to n3 , • where n = |w|. • Proof • First notice that • the number of entries in a cell is at most k. • maximum number of productions is k3, • I Complexity for row 1 cells • For each A  N, we have to check if it can be placed in cell(1,i), i.e. if A derives (in 1 step) • the terminal on position i. There are k nonterminals, thus cost per cell is k X 1. • There are n row 1 cells, thus total cost for row 1 = kn. Each nonterminal can only occur once in a cell A  BC Cfr. 3

  13. II Complexity for cell in a row > 1 The content of a cell is the result of at most n-1 pairings of lower cells. For each paring at most k nonterminals are paired with at most k other nonterminals, and each pairing is checked against at most k3 productions. Thus for each cell : cost ≤ k X k X k3 X1X(n-1) =k5 X(n-1) There are (n-1)+ (n-2) + …. + 1 = n(n-1)/2 cells in rows 2 to n, thus total cost for these rows is bounded above by n(n-1)/2 Xk5 X(n-1) To conclude : The total cost is bounded above by : kn + n(n-1)/2 Xk5 X(n-1) See slide 119 Cfr. 1 and 2 Since k is independent of n the conclusion is O(n3)

  14. Some remarks • Not really of practical use since • O(n3) is too slow • the grammar must be converted to CNF • only tests membership, this is not the complexity for building the derivation tree See course on compilers for faster algorithms Semantics!!!! To think about : CKY and unambiguous grammars.

More Related