CSC 3130: Automata theory and formal languages Tutorial 4

CSC 3130: Automata theory and formal languagesTutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering

Agenda • Context Free Grammar (CFG) • Design • Parse Tree • Cocke-Younger-Kasami (CYK) algorithm • Parsing CFG in normal form • Pushdown Automata (PDA) • Design

Context-Free Grammar (Recap) • A context free grammar is consisted of 4) Start Variable 3) Production Rule S  AB | ba A  aA | a B  b Another Production Rule 1) Variable 2) Terminal

Context-Free Grammar (Recap) • A string is said to belong to the language (of the CFG) if it can be derived from the start variable = Apply Production Rule CFG Example Derivation S  AB | ba A  aA | a B  b S • AB • aAB • aaB • aab Therefore, aab belongs to the language

Why CFG? • L = {w = 0n1n : n is an positive integer} • L is not a regular language • Proved by “Pumping Lemma” • A Context-Free Grammar can describe it • Thus, CFG is more general than regular expression • NFA  Regular Expression  DFA S  0S1 S  01

CFG Design • Given a context-free language, design the CFG • L = { ab-string, w : Number of a’s < Number of b’s } • Some time for you to get into think… 1 min S  ? …

CFG Design (Con’t) • Trial: Bottom-up • Shortest string in L : “b” • Given a string in L, we can expand it, s.t. it is still in L • i.e., Add terminals, while not violating the constraints

After adding 1 “b”, number of “b” is still greater than that of “a” Adding 1 “a” and 1 “b”, the difference between the numbers of “a” and “b” keep constant CFG Design (Con’t) One Wrong Trial: S  b S  bS | Sb S  abS | baS | bSa | aSb However, cannot parse strings like “aabbbbbaa”

Base Case #b still > #a 1st S 2nd S That a : #b ≥ #a + 1 : #b ≥ #a + 1 : #a = 1  #b ≥ #a + 2 - 1 CFG Design (Con’t) Approach 1: S  b S  SS S  SaS | aSS | SSa But, is it sufficient to say the grammar is correct?

CFG Design (Con’t) Approach 2: • Start with the grammar for ab-strings with same number of a’s and b’s • Call the start symbol of this grammar E • Now, we generate all strings of type EbE | EbEbE | EbEbEbE | … • Thus, we have the grammar…

CFG Design (Con’t) Approach 2 (Con’t): S  EbET T  bET | ε E  … For the pattern : EbE | EbEbE | … E generates ab-strings with same number of a’s and b’s (c.f. “09L7.pdf” – Slide #32)

CFG Design (Con’t) • After designing the grammar, G, you may have to prove (if required) that the language of this grammar is equivalent to the given language • i.e., Prove that L(G) = L • Proof Part 1) L(G) ⊂ L Part 2) L ⊂ L(G) • Due to time limit, I will not do this part

Derivation • AB • aAB • aaB • aab Parse Tree • How to parse “aab” in this grammar? (Previous example) CFG Example S  AB | ba A  aA | a B  b S

S A B b a A a Parse Tree (Con’t) • Idea: Production Rule = Node + Children • Should be very intuitive to understand Derivation S • AB • aAB • aaB • aab

S S S - S S - S - 2 3 S - A S S 3 1 1 2 S S  S - S  1 | 2 | 3 Parse Tree (Con’t) • Ambiguity: String: 3 - 1 - 2 CFG: 3 – 1 – 2 3 – (1 – 2)

Parse Tree (Con’t) • Useful in programming language • CSC3180 • Useful in compiler • CSC3120

S  AB | BC A  BA | a B  CC | b C  AB | a Example Normal Form • Every production is of type • X  YZ • X  a • S  ε Cocke-Younger-Kasami Algorithm • Used to parse context-free grammar in Chomsky normal form (or simply normal form)

CYK Algorithm - Idea • = Algorithm 2 in Lecture Note (09L8.pdf) • Idea: Bottom Up Parsing • Algorithm: Given a string s of length N For k = 1 to N For every substring of length k Determine what variable(s) can derive it • sub(x,y) : starts at index x, ends at index y

S  AB | BC A  BA | a B  CC | b C  AB | a CYK Algorithm - Init • Base Case : k = 1 • The possible choices of variable(s) can be known by scanning through each production A,C B A,C A,C B b a a b a We want to parse this string

Substring of length = 3 Starting with index = 2 Length of Substring i.e., “aab” = sub(2,4) 3 A,C B A,C A,C B 2 Start Index of Substring CYK Algorithm – Table • Each cell: Variables deriving the substring b a a b a

S  AB | BC A  BA | a B  CC | b C  AB | a = sub(2,2) + sub(3,4) = sub(2,3) + sub(4,4) S,A B S,C S,A A,C B A,C A,C B Therefore , B is put into the cell CYK Algorithm – Loop (k>1) • For each substring • Decompose into two substrings • Example sub(2,4) = “aab” • Possible: AS, AC, CS, CC , BB b a a b a

S  AB | BC A  BA | a B  CC | b C  AB | a CYK Algorithm – Loop (k>1) • How about sub(3,5) ? • Give you 1 min S,A B S,C S,A A,C B A,C A,C B b a a b a

S  AB | BC A  BA | a B  CC | b C  AB | a S,A,C S,A,C B B S,A B S,C S,A A,C B A,C A,C B CYK Algorithm – Parse Tree • Parse Tree is known from the table • See “09L8.pdf” - Slide #21 Length of Substring b a a b a Start Index of Substring

CYK Algorithm (Conclusion) • Start from shortest substring to the longest • i.e., from single-character-string to the whole string • For Context-free grammar, G 1) Convert G into normal form • Remove ε-productions • Remove unit-productions 2) Apply CYK algorithm • Con: Loss in intuition

End • Thanks for coming! =] • Any questions?

CSC 3130: Automata theory and formal languages Tutorial 4

CSC 3130: Automata theory and formal languages Tutorial 4

Presentation Transcript

Lecture 6 Nondeterministic Finite Automata (NFA)

Chapter 13 Programming Languages and Program Development

Tutorial on the Semantic Web

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Representation, Inference and Learning in Relational Probabilistic Languages

Cellular Automata

Routing Lookups and Packet Classification: Theory and Practice

Lecture 2 Germanic languages

Formal Semantics

Year 12 Accounting Tutorial

Graphs and Graph Theory in Computational Biology

Turing Machines (11.5) Longin Jan Latecki Temple University

Security Protocol Specification Languages

Optimization of Java-Like Languages for Parallel and Distributed Environments

Cellular Automata

The Evolution of Major Programming Languages

Formal vs. Informal Language

Antichain Algorithms for Finite Automata

The Learning Coach

Lecture 5: Finite Automata

MT311 Java Programming and Programming Languages