270 likes | 424 Views
Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Context-free languages. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Context-free grammar. A → 0 A 1 A → B B → #. A , B are variables. 0 , 1 , # are terminals.
E N D
Fall 2010 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages Context-free languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Context-free grammar A → 0A1A → BB →# A, B are variables 0, 1, # are terminals A is the start variable A 0A1 00A11 000A111 000B111 000#111 this is a derivation
Context-free grammar • A context-free grammar (CFG) is (V, S, R, S) where • V is a finite set of variables or non-terminals • S is a finite set of terminals (VS = ) • R is a set of productions or substitution rules of the formwhere A is a variable V and a is a string of variables and terminals • S is a variable called the start variable A → a
The grammar of English ART NOUN PREP ART NOUN VERB ART NOUN CMPLX-NOUN CMPLX-NOUN CMPLX-NOUN NOUN-PHRASE PREP-PHRASE CMPLX-VERB VERB-PHRASE NOUN-PHRASE a girl with a flower likes the boy SENTENCE
The grammar of English ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB variables: SENTENCE, NOUN-PHRASE, … terminals: a, the, boy, girl, flower, likes, touches, sees, with start variable:SENTENCE This grammar describes (a part of) English
Derivations in English (10) (11) (12) (13) (14) (15) (16) (17) (18) ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB (1) (2) (3) (4) (5) (6) (7) (8) (9) SENTENCE NOUN-PHRASE VERB-PHRASE (1) CPLX-NOUN VERB-PHRASE (2) ARTICLE NOUN VERB-PHRASE (7) a NOUN VERB-PHRASE (10) a boy VERB-PHRASE (12) a boy CPLX-VERB (4) a boy VERB (9) a boy sees (17)
Grammars for programming languages bash-3.2$ python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) >>> (2+3)*5 25 E E + E E E * E E (E) E 0 E 1 … E 9 E E * E (E)* E (E + E)* E (2+ E)* E (2+3)* E (2+3)*5 meaning: “add 2 and 3, and then multiply by 5” Variables:E Terminals:+*()0123456789
Notation and conventions E E + E E E * E E (E) E N N 0N N 1N N 0 N 1 Variables: E, N Terminals:+, *, (, ), 0, 1 Start variable: E conventions: shorthand: Variables in UPPERCASE E E + E | E * E | (E)| N N 0N | 1N | 0 | 1 Start variable comes first
Derivation • A derivation is a sequential application of productions: E E * E (E)* E (E)* N (E + E)*1 (E + E)*1 (E + N)*1 (N + N)*1 (N +1N)*1 (N +10)*1 (1+10)*1 E E + E | E * E | (E)| N N 0N | 1N | 0 | 1 derivation a b b obtained from a in one production * a b b obtained from a in zero or more productions * E (1+10)*1
Context-free languages • The language of a CFG is the set of all strings of terminals that can be derived from the start variable * L(G) = {w : w S* and S w } • Questions we will ask: I give you a CFG, what is the language? I give you a language, write a CFG for it
Analysis example 1 • Can you derive: A → 0A1 | BB →# L(G) = {0n#1n: n ≥ 0} 00#11 A 0A1 00A11 00B11 00#11 A B # # 00#111 No, there is an uneven number of 0s and 1s 00##11 No, there are too many #
Analysis example 1 • Can you derive: • What is the language of this CFG? A → 0A1 | BB →# variables: A, Bterminals: 0, 1, # start variable: A 00##11 00#11 # 00#111 L = {0n#1n: n ≥ 0}
Analysis example 2 • Can you derive S SS | (S) | • S (S) (2) • () (3) • S (S) • (SS) • ((S)S) • ((S)(S)) • (()(S)) • (()()) (()()) ()
Parse trees • A parse tree gives a more compact representation: S SS | (S) | S • S (S) • (SS) • ((S)S) • ((S)(S)) • (()(S)) • (()()) ) ( S S S ( ) ( ) S S (()()) e e
Parse trees • One parse tree can represent several derivations • S (S) • (SS) • ((S)S) • ((S)(S)) • (()(S)) • (()()) • S (S) • (SS) • (S(S)) • ((S)(S)) • (()(S)) • (()()) S ( ) S S S • S (S) • (SS) • ((S)S) • (()S) • (()(S)) • (()()) • S (S) • (SS) • (S(S)) • (S()) • ((S)()) • (()()) ( ) ( ) S S e e
Analysis example 2 • Can you derive S SS | (S) | No, because there is an uneven number of ( and ) (()() ())()) No, because there is a prefix with an excess of )
Analysis example 2 S SS | (S) | L(G) = {w: w has the same number of ( and ) no prefix of w has more )than(} S S S Parsing rules: Divide w up in blocks with same number of ( and ) S S S S Each block is in L(G) S S e e e Parse each block recursively ( ( ) ( ) ) ( )
Design example 1 L = {0n1n | n 0} These strings have recursive structure: 000000111111 0000011111 00001111 000111 0011 01 e 0S1| S
Design example 2 L = numbers without leading zeros 0, 109, 2, 23 e, 01, 003 allowed not allowed S → 0|LN 1052870032 N → ND|e any number N D → 0|L leading digit L L → 1|2|3|4|5|6|7|8|9
Design examples L = {0n1n0m1m | n 0, m 0} 010011 00110011 These strings have two parts: 000111 L = L1L2 L1 = {0n1n | n 0} S S1S1 L2 = {0m1m | m 0} S1 0S11| rules for L1: S1 0S11| L2 is the same asL1
Design examples L = {0n1m0m1n | n 0, m 0} 011001 0011 These strings have nested structure: 1100 00110011 outer part: 0n1n inner part: 1m0m S 0S1|I I 1I0|
Design examples L = {x: xhas two 0-blocks with same number of 0s} 01011, 001011001, 10010101001 01001000, 01111 allowed not allowed 10010011010010110 A: e, or ends in 1 initial part middle part final part C: e, or begins with 1 A B C
Design examples 10010011010010110 A: e, or ends in 1 C: e, or begins with 1 A B C U: any string S → ABC B has recursive structure: A → e | U1 00110100 U → 0U | 1U | e D C → e | 1U B → 0D0 | 0B0 same number of 0s at least one 0 D → 1U1 | 1 D: begins and ends in 1
Context-free versus regular • Write a CFG for the language (0 + 1)*111 • Can you do so for every regular language? • S U111U 0U | 1U | e Every regular language is context-free regularexpression NFA DFA
From regular to context-free CFG regular expression Æ grammar with no rules e S→ e S →a a (alphabet symbol) S→ S1 | S2 E1 + E2 S→ S1S2 E1E2 S→ SS1 | e E1* In all cases, S becomes the newstart symbol
Context-free versus regular • Is every context-free language regular? L = {0n1n: n ≥ 0} S → 0S1 | e Is context-free but not regular regular context-free