320 likes | 536 Views
Fall 2009. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Context-free languages. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Context-free grammar. This is an a different model for describing languages
E N D
Fall 2009 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages Context-free languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Context-free grammar • This is an a different model for describing languages • The language is specified by productions (substitution rules) that tell how strings can be obtained, e.g. • Using these rules, we can derive strings like this: A → 0A1A → BB → # A, B are variables 0, 1, # are terminals A is the start variable A 0A1 00A11 000A111 000B111 000#111
ART NOUN PREP ART NOUN VERB ART NOUN CMPLX-NOUN CMPLX-NOUN CMPLX-NOUN NOUN-PHRASE PREP-PHRASE CMPLX-VERB VERB-PHRASE NOUN-PHRASE SENTENCE Natural languages • CFGs were first used for natural languages a girl with a flower likes the boy
Some examples • We can describe parts of English like this: ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB variables: SENTENCE, NOUN-PHRASE, … terminals: a, the, boy, girl, flower, likes, touches, sees, with start variable: SENTENCE
Derivations (10) (11) (12) (13) (14) (15) (16) (17) (18) ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB (1) (2) (3) (4) (5) (6) (7) (8) (9) SENTENCE NOUN-PHRASE VERB-PHRASE (1) CPLX-NOUN VERB-PHRASE (2) ARTICLE NOUN VERB-PHRASE (7) a NOUN VERB-PHRASE (10) a boy VERB-PHRASE (12) a boy CPLX-VERB (4) a boy VERB (9) a boy sees (17)
Programming languages • Context-free grammars are also used to describe (parts of) programming languages • For instance, expressions like (2 + 3) * 5 or 3 + 8 + 2 * 7 can be described by the CFG E E + E E E * E E (E) E 0 E 1 … E 9 Variables:E Terminals: + * ( ) 0 1 2 3 4 5 6 7 8 9
Motivation for studying CFGs • Context-free grammars are essential for understanding the meaning of computer programs code: (2 + 3) * 5 E E * E (E) * E (E + E) * E (2 + E) * E (2 + 3) * E (2 + 3) * 5 meaning: “add 2 and 3, and then multiply by 5”
Definition of context-free grammar • A context-free grammar (CFG) is a 4-tuple (V, S, R, S) where • V is a finite set of variables or non-terminals • S is a finite set of terminals (VS = ) • R is a set of productions or substitution rules of the formwhere A is a variable V and a is a string with variables and terminals • S is a variable called the start variable A → a
Shorthand notation for productions • When we have multiple productions with the same variable on the left likewe can write this in shorthand as E E + E E E * E E (E) E N N 0N N 1N N 0 N 1 Variables: E, N Terminals: +, *, (, ), 0, 1 Start variable: E E E + E | E * E | (E)| 0 | 1 N 0N | 1N | 0 | 1
Derivation • A derivation is a sequential application of productions: E E * E (E) * E (E) * N (E + E ) * N (E + E ) * 1 (E + N) * 1 (N + N) * 1 (N + 1N) * 1 (N + 10) * 1 (1 + 10) * 1 a b means b can be obtainedfrom a with one production derivation * a b means b can be obtainedfrom a after zero or moreproductions
Language of a CFG • The language of a CFG is the set of all strings of terminals that can be derived from the start variable • Such languages are called context-free * L(G) = { | S* and S }
Example 1 • Can you derive: • What is the language of this CFG? A → 0A1 | BB → # variables: A, Bterminals: 0, 1, # start variable: A 00#11 00#111 00##11 L = {0n#1n: n ≥ 0}
Example 2 S SS | (S) | • Give derivations of (), (()()) • How about ())? convention: variables in uppercase, terminals in lowercase, start variable first • S (S) (2) • () (3) • S (S) • (SS) • ((S)S) • ((S)(S)) • (()(S)) • (()())
Design examples L = {0n1n | n 0} These strings have recursive structure: 000000111111 0000011111 00001111 000111 0011 01 e 0S1| S
Design examples L = numbers without leading zeros 0, 109, 2, 23 e, 01, 003 allowed not allowed S → 0|LN 1052870032 N → NA|e any number A → 0|L leading digit L → 1|2|3|4|5|6|7|8|9
Design examples L = {0n1n0m1m | n 0, m 0} These strings have two parts: L = L1L2 L1 = {0n1n | n 0} S S1S1 L2 = {0m1m | m 0} S1 0S11| rules for L1: S1 0S11| L2 is the same asL1
Design examples L = {0n1m0m1n | n 0, m 0} These strings have nested structure: outer part: 0n1n inner part: 1m0m S 0S1|I I 1I0|
Design examples L = {x: xhas two 0-blocks with same number of 0s} 01011, 001011001, 10010101001 01001000, 01111 allowed not allowed 10010011010010110 A: e, or ends in 1 initial part middle part final part C: e, or begins with 1 A B C
Design examples 10010011010010110 A: e, or ends in 1 C: e, or begins with 1 A B C S → ABC B has recursive structure: A → e | U1 00110100 U → 0U | 1U | e D C → e | 1U B → 0D0 | 0B0 at least one 0 D → 1U1 | 1
Context-free versus regular • Write a CFG for the language (0 + 1)*111 • Can you do so for every regular language? • Proof: • S A111A e | 0A | 1A Every regular language is context-free regularexpression NFA DFA
From regular to context-free CFG regular expression Æ grammar with no rules e S→ e S →a a (alphabet symbol) S→ S1 | S2 E1 + E2 S→ S1S2 E1E2 S→ SS1 | e E1* In all cases, S becomes the newstart symbol
Context-free versus regular • Is every context-free language regular? • No! We already saw some examples: • This language is context-free but not regular L = {0n1n: n ≥ 0} S → 0S1 | e
Parse tree • Derivations can also be represented using parse trees E E E + E | E - E | (E) | V V x | y | z E + E E E + E V + E x + E x + (E) x + (E - E) x + (V - E) x + (y - E) x + (y - V) x + (y - z) V ( E ) x - E E V V y z
Left derivation • Always derive the leftmost variable first: • Corresponds to a left-to-right traversal of parse tree E E E + E V + E x + E x + (E) x + (E - E) x + (V - E) x + (y - E) x + (y - V) x + (y - z) E + E V ( E ) - x E E V V y z
Ambiguity • A grammar is ambiguous if some strings have more than one parse tree • Example: E E + E | E - E | (E) | V V x | y | z E E E + E E + E x + y + z E + E V V E + E V V z x V V x y y z
Why ambiguity matters • The parse tree represents the intended meaning: E E E + E E + E x + y + z E + E V V E + E V V z x V V x y y z “first add y and z, and then add this to x” “first add x and y, and then add z to this”
Why ambiguity matters • Suppose we also had multiplication: E E + E | E - E | E E | (E) | V V x | y | z E E E + E E * E x y + z E E V V E + E V V z x V V x y y z “first y + z, then x ” “first x y, then + z”
Examples • Which of these are ambiguous: L = numbers without leading 0s L = {0n1n | n 0} S → 0|LN S 0S1| N → NA|e A → 0|P L = {0n1n0m1m | n, m 0} L → 1|2|3|4|5|6|7|8|9 S S1S1 S1 0S11| L = {x: xhas two 0-blocks with same number of 0s} L = {0n1m0m1n | n, m 0} B → 0D0 | 0B0 S → ABC D → 1U1 | 1 A → e | U1 S 0S1|I U → 0U | 1U | e C → e | 1U I 1I0|
Disambiguation • Sometimes we can rewrite the grammar to remove the ambiguity • Rewrite grammar so cannot be broken by +: E E + E | E - E | E E | (E) | V V x | y | z E T | E + T | E - TT F | T F F (E) | V V x | y | z T stands for term:x * (y + z)F stands for factor: x, (y + z) A term always splits into factors A factor is either a variable or a parenthesized expression
Disambiguation • Example E E T | E + T | E - TT F | T F F (E) | V V x | y | z E T T F F T F V V V x y + z
Disambiguation • In general, disambiguation is not possible • There exist inherently ambiguous languages:Every CFG for this language is ambiguous • There is no general procedure that can tell if a grammar is ambiguous • However, grammars used in programming languages can typically be disambiguated
Analysis example • What is the language? • Is it ambiguous? e ab baba abbbaa a bba • S aB | bA • A a | aS | bAA • B b | bS | aBB