480 likes | 652 Views
CSCI 3130: Formal Languages and Automata Theory Tutorial 5. Hung Chun Ho Office: SHB 1026. Department of Computer Science & Engineering. 1. Agenda. Cocke -Younger- Kasami (CYK) algorithm Parsing CFG in normal form Pushdown Automata (PDA) Design. 2. CYK Algorithm.
E N D
CSCI 3130: Formal Languages andAutomata TheoryTutorial 5 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering 1
Agenda • Cocke-Younger-Kasami (CYK) algorithm • Parsing CFG in normal form • Pushdown Automata (PDA) • Design 2
CYK Algorithm Bottom-up Parsing for normal form 3
S AB A CC | a | c B BC | b C CB | BA | c Example Normal Form • Every production is of type • X YZ • X a • S ε Cocke-Younger-Kasami Algorithm • Used to parse context-free grammar in Chomsky normal form (or simply normal form) 4
CYK Algorithm - Idea • = Algorithm 2 in Lecture Note (10L8.pdf) • Idea: Bottom Up Parsing • Algorithm: Given a string s of length N For k = 1 to N For every substring of length k Determine what variable(s) can derive it 5
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm - Example • CFG • Parse abbc 6
CYK Algorithm – Idea (1) • Idea: We parse the strings in this order: • Length-1 substring abbc abbc abbc abbc 7
CYK Algorithm – Idea (1) • Idea: We parse the strings in this order: • Length-2 substring abbc abbc abbc 8
CYK Algorithm – Idea (1) • Idea: We parse the strings in this order: • Length-3 substring abbc abbc • Length-4 substring abbc • Done! 9
CYK Algorithm – Idea (2) • Idea: Parsing of longer substrings depends on parsing of shorter substrings • Example: abb may be decomposed as • ab + b • a + bb • If we know how to parse ab and b (or, a and bb) then we know how to parse abb 10
CYK Algorithm – Substring • Denote sub(i, j) := substring with start index = i and end index = j • Example: For abbc, sub(2,4) = bbc • This notation is not to complicate things, but just for the sake of convenience in the following discussion… 11
CYK Algorithm – Table • Each cell corresponds to a substring • Store variables deriving the substring Substring of length = 3Starting with index = 2 Length of Substring i.e., sub(2,3) = bbc a b b c 12 Start Index of Substring
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • Base Case : length = 1 • The possible choices of variable(s) can be known by scanning through each production A B B A , C a b b c 13
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • Loop : length = 2 • For each substring of length 2 • Decompose into shorter substrings • Check cells below it ab Let’s parse this substring a b b c 14
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • For sub(1,2) = ab, it can be decomposed: • ab = a + b = sub(1,1) + sub(2,2) • Possible choices: AB • Scan rules : S S a b b c 15
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • For sub(2,3) = bb, it can be decomposed: • bb = b + b = sub(2,2) + sub(3,3) • Possible choices: BB • Scan rules No suitable rules are found The CFG cannot parse this substring : ∅ ∅ a b b c 16
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • For sub(3,4) = bc, it can be decomposed: • bc = b + c = sub(3,3) + sub(4,4) • Possible choices: BA, BC • Scan rules : B, C B, C a b b c 17
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • For sub(1,3) = abb: • abb = ab + b = sub(1,2) + sub(3,3) • Possible choices: SB • Scan rules No suitable variables found yetBut, there is another way to decompose the string : ∅ a b b c 18
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • For sub(1,3) = abb: • abb = a + bb = sub(1,1) + sub(2,3) • Possible choices: ∅ • Scan rules Cant parse smaller substring Cant parse the string No need to scan rules a b b c 19
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • For sub(1,3) = abb: • abb = sub(1,1) + sub(2,3) gives no valid parsing • abb = sub(1,2) + sub(3,3) gives no valid parsing • Cannot parse ∅ a b b c 20
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • For sub(2,4) = bbc: • bbc = sub(2,2) + sub(3,4) • Possible choices: BB, BC • bbc = sub(2,3) + sub(4,4) • Possible choices: ∅ Variable: B B a b b c 21
S AB A CC | a | c B BC | b C CB | BA | c CYK Algorithm – Simulation • Finally, for sub(1,4) = abbc: • Possible choices: • Variables: This cell represents the original string, and it consists Sabbc is in the language AB , SB, SC S a b b c 22
CYK Algorithm – Parse Tree • abbc is in the language! • How to obtain the parse tree? • Tracing back the derivations: • sub(1,4) is derived using SAB from sub(1,1) and sub(2,4) • sub(1,1) is derived using Aa • sub(2,4) is derived using BBC from sub(2,2) and sub(3,4) • … • So, record also the used derivations! 23
CYK Algorithm – Parse Tree • Obtained from the table a b b c 24
CYK Algorithm – Conclusion • A bottom up parsing algorithm • Dynamic Programming • Solution of a subproblem (parsing of a substring) depends on that of smaller subproblems • Before employing CYK Algorithm, convert the grammar into normal form • Remove ε-productions • Remove unit-productions 25
CYK Algorithm – Detailed D = “On input w = w1w2…wn: If w = ε, and S ε is rule, Accept For i = 1 to n: For each variable A: Test whether A b is a rule, where b = wi. If so, place A in table(i, i). For l = 2 to n: For i = 1 to n – l + 1: Let j = i + l – 1, For k = i to j – 1: For each rule A BC: If table(i,k) contains B and table(k+1, j) contains CPut A in table(i, j) If S is in table (1,n), accept. Otherwise, reject.” 26
Pushdown Automata NFA with infinite memory/states 27
Pushdown Automata • PDA ~= NFA, with a stack of memory • Transition: • NFA – Depends on input • PDA – Depends on input and top of stack • Push a symbol to stack • Pop a symbol to stack • Read a terminal on string • Transitions are non-deterministic (possibly ε) (possibly ε) (possibly ε) 28
Pushdown Automata and NFA • Accept: • NFA – Go to an Accept state • PDA – Go to an Accept state 29
PDA – Example 1 • Given the following language: • Design a PDA for it L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 30
PDA – Example 1 - Idea • Idea: The input has two sections • First half • All ‘0’s • Second half • All ‘1’s • #‘1 depends on #‘0’ • #‘0’ ≤ #‘1’ ≤ #‘0’ × 2 31
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Solution • Solution: L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 32
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • Let’s try some string… w = 00111 • See white board for simulation… L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 33
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • Indicates the start of parsing L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 34
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • This part saves information about #‘0’ • # ‘X’ in stack = #‘0’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 35
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • This part accounts for #‘1’ • #‘0’ ≤ #‘1’ ≤ #‘0’ × 2 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 36
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • Consume one ‘X’ and eats one ‘1’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 37
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • Consume one ‘X’ and eats two ‘1’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 38
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • Consume one ‘X’, and then • eats one ‘1’, or • eat two ‘1’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 39
1,X/e 0,e/X q1 e,e/e e,$/e e,e/$ 1,X/X 1,X/e q0 q2 q3 PDA – Example 1 – Explain • Solution: • Indicates the end of parsing L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 40
PDA – Example 2 • Given the following language: • Design a PDA for it L = { aibjckdl: i,j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 41
PDA – Example 2 – Idea • Idea: • Sequentially read (multiple) ‘a’, ‘b’, ‘c’ and ‘d’ • Maintain: • #‘a’ + #‘c’ • #‘b’ + #‘d’ • If these numbers equal • Accept 42
c,$/$X b,X/e a,e/X d,X/e c,X/XX q4 q1 q2 q3 e,e/e e,e/e e,e/e e, $ /e e,e/$ b,$/$Y c,Y/e • d,$/$Y q5 b,Y/YY d,Y/YY PDA – Example 2 – Solution • Solution: L = { aibjckdl: i,j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 43
c,$/$X b,X/e a,e/X d,X/e c,X/XX q2 q1 q3 q4 e,e/e e,e/e e,e/e e, $ /e e,e/$ b,$/$Y c,Y/e • d,$/$Y q5 b,Y/YY d,Y/YY PDA – Example 2 – Explain • Solution: a b c d start end L = { aibjckdl: i,j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 44
c,$/$X b,X/e a,e/X d,X/e c,X/XX q4 q1 q2 q3 e,e/e e,e/e e,e/e e, $ /e e,e/$ b,$/$Y c,Y/e • d,$/$Y q5 b,Y/YY d,Y/YY PDA – Example 2 – Explain • Solution: • Each X in stack = An extraa or c L = { aibjckdl: i,j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 45
c,$/$X b,X/e a,e/X d,X/e c,X/XX q4 q1 q2 q3 e,e/e e,e/e e,e/e e, $ /e e,e/$ b,$/$Y c,Y/e • d,$/$Y q5 b,Y/YY d,Y/YY PDA – Example 2 – Explain • Solution: • Each Y in stack = An extrab or d L = { aibjckdl: i,j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 46
c,$/$X b,X/e a,e/X d,X/e c,X/XX q4 q1 q2 q3 e,e/e e,e/e e,e/e e, $ /e e,e/$ b,$/$Y c,Y/e • d,$/$Y q5 b,Y/YY d,Y/YY PDA – Example 2 – Explain • Solution: • X and Y ‘cancel’ each other • The stack contains only X’s or only Y’s L = { aibjckdl: i,j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 47
c,$/$X b,X/e a,e/X d,X/e c,X/XX q4 q2 q3 q1 e,e/e e,e/e e,e/e e, $ /e e,e/$ b,$/$Y c,Y/e • d,$/$Y q5 b,Y/YY d,Y/YY PDA – Example 2 – Explain • Solution: • No X’s and no Y’s means • #a + #c = #b + #d Accept L = { aibjckdl: i,j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 48