1 / 26

CSCI 3130: Automata theory and formal languages

Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Context-free languages. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Context-free grammar. A → 0 A 1 A → B B → #. A , B are variables. 0 , 1 , # are terminals.

elma
Download Presentation

CSCI 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2010 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages Context-free languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. Context-free grammar A → 0A1A → BB →# A, B are variables 0, 1, # are terminals A is the start variable A 0A1 00A11 000A111 000B111 000#111 this is a derivation

  3. Context-free grammar • A context-free grammar (CFG) is (V, S, R, S) where • V is a finite set of variables or non-terminals • S is a finite set of terminals (VS = ) • R is a set of productions or substitution rules of the formwhere A is a variable V and a is a string of variables and terminals • S is a variable called the start variable A → a

  4. The grammar of English ART NOUN PREP ART NOUN VERB ART NOUN CMPLX-NOUN CMPLX-NOUN CMPLX-NOUN NOUN-PHRASE PREP-PHRASE CMPLX-VERB VERB-PHRASE NOUN-PHRASE a girl with a flower likes the boy SENTENCE

  5. The grammar of English ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB variables: SENTENCE, NOUN-PHRASE, … terminals: a, the, boy, girl, flower, likes, touches, sees, with start variable:SENTENCE This grammar describes (a part of) English

  6. Derivations in English (10) (11) (12) (13) (14) (15) (16) (17) (18) ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB (1) (2) (3) (4) (5) (6) (7) (8) (9) SENTENCE  NOUN-PHRASE VERB-PHRASE (1)  CPLX-NOUN VERB-PHRASE (2)  ARTICLE NOUN VERB-PHRASE (7)  a NOUN VERB-PHRASE (10)  a boy VERB-PHRASE (12)  a boy CPLX-VERB (4)  a boy VERB (9)  a boy sees (17)

  7. Grammars for programming languages bash-3.2$ python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) >>> (2+3)*5 25 E  E + E E  E * E E (E) E 0 E  1 … E  9 E  E * E (E)* E  (E + E)* E (2+ E)* E (2+3)* E (2+3)*5 meaning: “add 2 and 3, and then multiply by 5” Variables:E Terminals:+*()0123456789

  8. Notation and conventions E  E + E E  E * E E  (E) E N N  0N N  1N N  0 N  1 Variables: E, N Terminals:+, *, (, ), 0, 1 Start variable: E conventions: shorthand: Variables in UPPERCASE E  E + E | E * E | (E)| N N  0N | 1N | 0 | 1 Start variable comes first

  9. Derivation • A derivation is a sequential application of productions: E  E * E  (E)* E  (E)* N  (E + E)*1  (E + E)*1  (E + N)*1  (N + N)*1  (N +1N)*1  (N +10)*1  (1+10)*1 E  E + E | E * E | (E)| N N  0N | 1N | 0 | 1 derivation a  b b obtained from a in one production * a  b b obtained from a in zero or more productions * E  (1+10)*1

  10. Context-free languages • The language of a CFG is the set of all strings of terminals that can be derived from the start variable * L(G) = {w : w  S* and S  w } • Questions we will ask: I give you a CFG, what is the language? I give you a language, write a CFG for it

  11. Analysis example 1 • Can you derive: A → 0A1 | BB →# L(G) = {0n#1n: n ≥ 0} 00#11 A 0A1 00A11 00B11 00#11 A B # # 00#111 No, there is an uneven number of 0s and 1s 00##11 No, there are too many #

  12. Analysis example 1 • Can you derive: • What is the language of this CFG? A → 0A1 | BB →# variables: A, Bterminals: 0, 1, # start variable: A 00##11 00#11 # 00#111 L = {0n#1n: n ≥ 0}

  13. Analysis example 2 • Can you derive S  SS | (S) |  • S  (S) (2) •  () (3) • S  (S) •  (SS) •  ((S)S) •  ((S)(S)) •  (()(S)) •  (()()) (()()) ()

  14. Parse trees • A parse tree gives a more compact representation: S  SS | (S) |  S • S  (S) •  (SS) •  ((S)S) •  ((S)(S)) •  (()(S)) •  (()()) ) ( S S S ( ) ( ) S S (()()) e e

  15. Parse trees • One parse tree can represent several derivations • S  (S) •  (SS) •  ((S)S) •  ((S)(S)) •  (()(S)) •  (()()) • S  (S) •  (SS) •  (S(S)) •  ((S)(S)) •  (()(S)) •  (()()) S ( ) S S S • S  (S) •  (SS) •  ((S)S) •  (()S) •  (()(S)) •  (()()) • S  (S) •  (SS) •  (S(S)) •  (S()) •  ((S)()) •  (()()) ( ) ( ) S S e e

  16. Analysis example 2 • Can you derive S  SS | (S) |  No, because there is an uneven number of ( and ) (()() ())()) No, because there is a prefix with an excess of )

  17. Analysis example 2 S  SS | (S) |  L(G) = {w: w has the same number of ( and ) no prefix of w has more )than(} S S S Parsing rules: Divide w up in blocks with same number of ( and ) S S S S Each block is in L(G) S S e e e Parse each block recursively ( ( ) ( ) ) ( )

  18. Design example 1 L = {0n1n | n 0} These strings have recursive structure: 000000111111 0000011111 00001111 000111 0011 01 e 0S1|  S 

  19. Design example 2 L = numbers without leading zeros 0, 109, 2, 23 e, 01, 003 allowed not allowed S → 0|LN 1052870032 N → ND|e any number N D → 0|L leading digit L L → 1|2|3|4|5|6|7|8|9

  20. Design examples L = {0n1n0m1m | n 0, m 0} 010011 00110011 These strings have two parts: 000111 L = L1L2 L1 = {0n1n | n 0} S  S1S1 L2 = {0m1m | m 0} S1 0S11|  rules for L1: S1 0S11|  L2 is the same asL1

  21. Design examples L = {0n1m0m1n | n 0, m 0} 011001 0011 These strings have nested structure: 1100 00110011 outer part: 0n1n inner part: 1m0m S  0S1|I I  1I0| 

  22. Design examples L = {x: xhas two 0-blocks with same number of 0s} 01011, 001011001, 10010101001 01001000, 01111 allowed not allowed 10010011010010110 A: e, or ends in 1 initial part middle part final part C: e, or begins with 1 A B C

  23. Design examples 10010011010010110 A: e, or ends in 1 C: e, or begins with 1 A B C U: any string S → ABC B has recursive structure: A → e | U1 00110100 U → 0U | 1U | e D C → e | 1U B → 0D0 | 0B0 same number of 0s at least one 0 D → 1U1 | 1 D: begins and ends in 1

  24. Context-free versus regular • Write a CFG for the language (0 + 1)*111 • Can you do so for every regular language? • S  U111U  0U | 1U | e Every regular language is context-free regularexpression NFA DFA

  25. From regular to context-free CFG regular expression Æ grammar with no rules e S→ e S →a a (alphabet symbol) S→ S1 | S2 E1 + E2 S→ S1S2 E1E2 S→ SS1 | e E1* In all cases, S becomes the newstart symbol

  26. Context-free versus regular • Is every context-free language regular? L = {0n1n: n ≥ 0} S → 0S1 | e Is context-free but not regular regular context-free

More Related