120 likes | 215 Views
CS 461 – Oct. 10. Review PL grammar as needed How to tell if a word is in a CFL? Convert to PDA and run it. CYK algorithm Modern parsing techniques. Accepting input. How can we tell if a given source file (input stream of tokens) is a valid program? Language defined by CFG, so …
E N D
CS 461 – Oct. 10 • Review PL grammar as needed • How to tell if a word is in a CFL? • Convert to PDA and run it. • CYK algorithm • Modern parsing techniques
Accepting input • How can we tell if a given source file (input stream of tokens) is a valid program? Language defined by CFG, so … • Can see if there is some derivation from grammar? • Can convert CFG to PDA? • Exponential performance not acceptable. (e.g. doubling every time we add token) • Two improvements: • CYK algorithm, runs in O(n3) • Bottom-up parsing, generally linear, but restrictions on grammar.
CYK algorithm • In 1965-67, discovered independently by Cocke, Younger, Kasami. • Given any CFG and any string, can tell if grammar generates string. • The grammar needs to be in CNF first. • This ensures that the rules are simple. Rules are of the form X a or X YZ • Consider all substrings of len 1 first. See if these are in language. Next try all len 2, len 3, …. up to length n.
continued • Maintain results in an NxN table. Top right portion not used. • Example on right is for testing word of length 3. • Start at bottom; work your way up. • For length 1, just look for “unit rules” in grammar, e.g. Xa.
continued • For general case i..j • Think of all possible ways this string can be broken into 2 pieces. • Ex. 1..3 = 1..2 + 3..3 or 1..1 + 2..3 • We want to know if both pieces L. This handles rules of form A BC. • Let’s try example from 3+7+. (in CNF)
337 3+7+ ? S AB A 3 | AC B 7 | BD C 3 D 7 For each len 1 string, which variables generate it? 1..1 is 3. Rules A and C. 2..2 is 3. Rules A and C. 3..3 is 7. Rules B and D.
337 3+7+ ? S AB A 3 | AC B 7 | BD C 3 D 7 Length 2: 1..2 = 1..1 + 2..2 = (A or C)(A or C) = rule A 2..3 = 2..2 + 3..3 = (A or C)(B or D) = rule S
337 3+7+ ? S AB A 3 | AC B 7 | BD C 3 D 7 Length 3: 2 cases for 1..3: 1..2 + 3..3: (A)(B or D) = S 1..1 + 2..3: (A or C)(S) no! We only need one case to work.
CYK example #2 Let’s test the word baab S AB | BC A BA | a B CC | b C AB | a Length 1: ‘a’ generated by A, C ‘b’ generated by B
baab S AB | BC A BA | a B CC | b C AB | a Length 2: 1..2 = 1..1 + 2..2 = (B)(A, C) = S,A 2..3 = 2..2 + 3..3 = (A,C)(A,C) = B 3..4 = 3..3 + 3..4 = (A,C)(B) = S,C
baab S AB | BC A BA | a B CC | b C AB | a Length 3: [ each has 2 chances! ] 1..3 = 1..2 + 3..3 = (S,A)(A,C) = Ø 1..3 = 1..1 + 2..3 = (B)(B) = Ø 2..4 = 2..3 + 4..4 = (B)(B) = Ø 2..4 = 2..2 + 3..4 = (A,C)(S,C) = B
Finally… S AB | BC A BA | a B CC | b C AB | a Length 4 [has 3 chances!] 1..4 = 1..3 + 4..4 = (Ø)(B) = Ø 1..4 = 1..2 + 3..4 = (S,A)(S,C) = Ø 1..4 = 1..1 + 2..4 = (B)(B) = Ø Ø means we lose! baab L. However, in general don’t give up if you encounter Ø in the middle of the process.