270 likes | 513 Views
CSC 3130: Automata theory and formal languages Tutorial 4. KN Hung Office: SHB 1026. Department of Computer Science & Engineering. Agenda. Context Free Grammar (CFG) Design Parse Tree Cocke-Younger-Kasami (CYK) algorithm Parsing CFG in normal form Pushdown Automata (PDA) Design.
E N D
CSC 3130: Automata theory and formal languagesTutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering
Agenda • Context Free Grammar (CFG) • Design • Parse Tree • Cocke-Younger-Kasami (CYK) algorithm • Parsing CFG in normal form • Pushdown Automata (PDA) • Design
Context-Free Grammar (Recap) • A context free grammar is consisted of 4) Start Variable 3) Production Rule S AB | ba A aA | a B b Another Production Rule 1) Variable 2) Terminal
Context-Free Grammar (Recap) • A string is said to belong to the language (of the CFG) if it can be derived from the start variable = Apply Production Rule CFG Example Derivation S AB | ba A aA | a B b S • AB • aAB • aaB • aab Therefore, aab belongs to the language
Why CFG? • L = {w = 0n1n : n is an positive integer} • L is not a regular language • Proved by “Pumping Lemma” • A Context-Free Grammar can describe it • Thus, CFG is more general than regular expression • NFA Regular Expression DFA S 0S1 S 01
CFG Design • Given a context-free language, design the CFG • L = { ab-string, w : Number of a’s < Number of b’s } • Some time for you to get into think… 1 min S ? …
CFG Design (Con’t) • Trial: Bottom-up • Shortest string in L : “b” • Given a string in L, we can expand it, s.t. it is still in L • i.e., Add terminals, while not violating the constraints
After adding 1 “b”, number of “b” is still greater than that of “a” Adding 1 “a” and 1 “b”, the difference between the numbers of “a” and “b” keep constant CFG Design (Con’t) One Wrong Trial: S b S bS | Sb S abS | baS | bSa | aSb However, cannot parse strings like “aabbbbbaa”
Base Case #b still > #a 1st S 2nd S That a : #b ≥ #a + 1 : #b ≥ #a + 1 : #a = 1 #b ≥ #a + 2 - 1 CFG Design (Con’t) Approach 1: S b S SS S SaS | aSS | SSa But, is it sufficient to say the grammar is correct?
CFG Design (Con’t) Approach 2: • Start with the grammar for ab-strings with same number of a’s and b’s • Call the start symbol of this grammar E • Now, we generate all strings of type EbE | EbEbE | EbEbEbE | … • Thus, we have the grammar…
CFG Design (Con’t) Approach 2 (Con’t): S EbET T bET | ε E … For the pattern : EbE | EbEbE | … E generates ab-strings with same number of a’s and b’s (c.f. “09L7.pdf” – Slide #32)
CFG Design (Con’t) • After designing the grammar, G, you may have to prove (if required) that the language of this grammar is equivalent to the given language • i.e., Prove that L(G) = L • Proof Part 1) L(G) ⊂ L Part 2) L ⊂ L(G) • Due to time limit, I will not do this part
Derivation • AB • aAB • aaB • aab Parse Tree • How to parse “aab” in this grammar? (Previous example) CFG Example S AB | ba A aA | a B b S
S A B b a A a Parse Tree (Con’t) • Idea: Production Rule = Node + Children • Should be very intuitive to understand Derivation S • AB • aAB • aaB • aab
S S S - S S - S - 2 3 S - A S S 3 1 1 2 S S S - S 1 | 2 | 3 Parse Tree (Con’t) • Ambiguity: String: 3 - 1 - 2 CFG: 3 – 1 – 2 3 – (1 – 2)
Parse Tree (Con’t) • Useful in programming language • CSC3180 • Useful in compiler • CSC3120
S AB | BC A BA | a B CC | b C AB | a Example Normal Form • Every production is of type • X YZ • X a • S ε Cocke-Younger-Kasami Algorithm • Used to parse context-free grammar in Chomsky normal form (or simply normal form)
CYK Algorithm - Idea • = Algorithm 2 in Lecture Note (09L8.pdf) • Idea: Bottom Up Parsing • Algorithm: Given a string s of length N For k = 1 to N For every substring of length k Determine what variable(s) can derive it • sub(x,y) : starts at index x, ends at index y
S AB | BC A BA | a B CC | b C AB | a CYK Algorithm - Init • Base Case : k = 1 • The possible choices of variable(s) can be known by scanning through each production A,C B A,C A,C B b a a b a We want to parse this string
Substring of length = 3 Starting with index = 2 Length of Substring i.e., “aab” = sub(2,4) 3 A,C B A,C A,C B 2 Start Index of Substring CYK Algorithm – Table • Each cell: Variables deriving the substring b a a b a
S AB | BC A BA | a B CC | b C AB | a CYK Algorithm – Loop (k>1) • When k = 2 • Example • sub(1,2) = “ba” • “ba” = “b” + “a” = sub(1,1) + sub(2,2) • Possible: BA | BC • Variable A,S • Since ABA, SBC S,A A,C B A,C A,C B b a a b a
S AB | BC A BA | a B CC | b C AB | a = sub(2,2) + sub(3,4) = sub(2,3) + sub(4,4) S,A B S,C S,A A,C B A,C A,C B Therefore , B is put into the cell CYK Algorithm – Loop (k>1) • For each substring • Decompose into two substrings • Example sub(2,4) = “aab” • Possible: AS, AC, CS, CC , BB b a a b a
S AB | BC A BA | a B CC | b C AB | a CYK Algorithm – Loop (k>1) • How about sub(3,5) ? • Give you 1 min S,A B S,C S,A A,C B A,C A,C B b a a b a
S AB | BC A BA | a B CC | b C AB | a S,A,C S,A,C B B S,A B S,C S,A A,C B A,C A,C B CYK Algorithm – Parse Tree • Parse Tree is known from the table • See “09L8.pdf” - Slide #21 Length of Substring b a a b a Start Index of Substring
CYK Algorithm (Conclusion) • Start from shortest substring to the longest • i.e., from single-character-string to the whole string • For Context-free grammar, G 1) Convert G into normal form • Remove ε-productions • Remove unit-productions 2) Apply CYK algorithm • Con: Loss in intuition
End • Thanks for coming! =] • Any questions?