Chapter 5 Context-Free Grammars

Chapter 5Context-Free Grammars

Grammar (Review) • A grammar is a 4-tuple G = (NT, T, P, S), where • NT: a finite set of non-terminal symbols • T: a finite set of terminal (alphabet) symbols • S: is the starting symbol S NT • P: a finite set of production rules: x→y • x (NT  T)+y (NT  T)* • Given w = uxv and production x → y • Apply rule replacing x with y to obtain z = uyv (written as w z) • Derivation of wT*: • S w1··· wnw (written as S *w) • The language generated by the grammar is • L(G) = {wT* : S *w}

Linear and Regular Grammars (Review) Given a grammar G = (NT, T, P, S) let A and B NT and wT* A grammar is linear if at most one NT is on the right side of any rule A right-linear grammar has productions of the form A → w, or A → wB A left-linear grammar has productions of the form A → w, or A →Bw A regular grammar is either right-linear or left-linear A regular grammar is linear but not all linear grammars are regular. For example, grammar S → aSa | bSb | a | b | , is linear but not regular.

5.1: Context-Free Grammars (1) Context-Free Grammars (CFG) are language-description mechanisms used to generate the strings of a language A grammar is said to be context-free if every rule has a single non-terminal on the left-hand side A A NT: single non-terminal on the left hand side  (NT  T)*: string of terminals and non-terminals A language L is called context-free language (CFL) if and only if there is a context-free grammar G (CFG) that generates it (i.e., L = L(G))

5.1: Context-Free Grammars (2) The term context-free is based on the fact that the substitution of a NT on the left can be made anytime that NT appears in a sentential form, i.e., it does not depend on the rest of the sentential form (the context). This means you can apply the rule in any context. You might imagine a context-sensitive rule aaAb → aayb where the substitution of the variable depends on its context. In the example, we can only replace A with y when it is in the given context.

5.1: Context-Free Grammars (3) Example 5.1 G = ({S}, {a, b}, S, {S → aSa | bSb | }) context-free? linear? regular? typical derivation might be: S aSa aaSaa aabSbaa aabbaa Grammar generates language L(G)={wwR: w  {a, b}*}

5.1: Context-Free Grammars (4) Example 5.2 G = ({S, A, B}, {a, b}, S, {S → abB A → aaBb B → bbAa A → }) context-free? linear? regular? grammar generates language L(G) = {ab (bbaa)n bba (ba)n: n ≥ 0} See Examples 5.3 and 5.4

5.1: Context-Free Grammars (5) What is the language of this grammar? G = ({S}, {a, b}, S, {S →aSb, S → ab}) L(G) = {anbn | n 1} What is the language of this grammar? G = ({S}, {a, b}, S, {S→aSb, S→e}) L(G) = {anbn | n 0}

5.1: Context-Free Grammars (6) What is the language of this grammar? G = ({S, A}, {a, b}, S, {S→aSa | aAa A→bA | b } L(G) = {anbman | n 1, m 1} What is the language of this grammar? G1 = ({S, A, B}, {a, b}, S, {S →AB A→aA | a B→bB | e } G2 = ({S, B}, {a, b}, S, {S→aS | aB B→bB | e } L(G) = {a+b*} = {anbm | n 1, m 0}

5.1: Context-Free Grammars (7) What is the language of this grammar? G = ({S, B}, {a, b, c}, S, {S→abScB | e B→bB | b } L = {(ab)n(cbm)n | n 0, m 1} What is the grammar of this language? L = {anbmcmdn | n 1, m 1} G = ({S, A}, {a, b, c, d}, S, { S→aSd | aAd A→bAc | bc } What is the grammar of this language? L = {anbmcmd2n | n 0, m 1} G = ({S, A}, {a, b, c, d}, S, { S→aSdd | A A→bAc | bc }

5.1: Context-Free Grammars (8) What is the grammar of this language? L = {a*ba*ba*} G1 = ({S, A}, {a, b}, S, {S→AbAbA A→aA | e } G2 = ({S, A, C}, {a, b}, S, {S→aS | bA A→aA | bC } C→aC | e

5.1: Context-Free Grammars (9) What is the grammar of this language? A language over {a, b} with at least 2 b’s G1 = ({S, A}, {a, b}, S, {S→AbAbA A→aA | bA | e } G2 = ({S, A, C}, {a, b}, S, {S→aS | bA A→aA | bC C→aC | bC | e }

5.1: Context-Free Grammars (10) What is the grammar of this language? A language over {a, b} of even-length strings G = ({S, O}, {a, b}, S, {S→aO | bO| e O→aS | bS } What is the grammar of this language? A language over {a, b} of an even no. of b’s G = ({S, B, C}, {a, b}, S, { S→aS | bA| e A→aA | bS }

5.1: Context-Free Grammars (11) What is the grammar of this language? A language over {a, b, c} that do not contain the substring abc G = ({S, B, C}, {a, b, c}, S, {S→bS | cS| aA | e A→aA | bB | cS | e B→aA | bS | e}

5.1: Context-Free Grammars (12) Write a CFG for the following Languages L1 = {wcwR | w {a, b}*} S aSa | bSb | c L2 = {anbncmdm| n 1, m  1} S XY X  aXb | ab Y  cYd | cd

5.1: Context-Free Grammars (13) The following languages are NOT CFLs L1 = {wcw | w {a, b}*} L2 = {anbmcndm | n 1, m  1} L3 = {anbncn | n 0} G = ({S}, {a, b}, S, {S →aSb | SS | }) is this grammar context-free? yes, there is a single variable on the left hand side is this grammar linear? no, a rule has more than one variable on the right hand side

5.1: Context-Free Grammars (14) Are regular languages context-free? Why? yes, because context-free means that there is a single variable on the left side of each rule. All regular languages are generated by grammars that have a single variable on the left side of each grammar rule. But, not all context-free grammars are regular. Regular languages are a proper subset of the class of context-free languages. Regular grammars are a proper subset of context-free grammars. Non-regular languages can be generated by context-free grammars, so the term context-free generally includes non-regular languages and regular languages.

5.1: Context-Free Grammars (15)Leftmost & Rightmost Derivations • Given the grammar • S → aAB, A → bBb, B → A |  • String abbbb can be derived in different ways: • Left-most derivation: • always replace the leftmost NT in each sentential form • S aAB abBbB abAbB abbBbbB abbbbB abbbb • Righ-tmost derivation: • always replace the rightmost NT in each sentential form • S aAB aA abBb abAb abbBbb abbbb

5.1: Context-Free Grammars (16) DEFINITION  derives in one step  derives in  one step  indicates that the derivation utilizes the rules of grammar G + * G

5.1: Context-Free Grammars (17)   lm rm Derivation Order Leftmost: Replace the leftmost non-terminal symbol E E op E  id op E  id *E  id * id Rightmost: Replace the rightmost non-terminal symbol E E op E  E opid  E*id  id * id lm lm lm lm rm rm rm rm Important Notes:A y If xAzxyz , what’s true about x ? terminals If xAzxyz , what’s true about z ? terminals

5.1: Context-Free Grammars (18) Example • S  S + E | E • E  number | ( S ) • Left-most derivation • S  S+E  E+E  (S)+E  (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E (1+2+(S))+E (1+2+(S+E))+E (1+2+(E+E))+E (1+2+(3+E))+E (1+2+(3+4))+E (1+2+(3+4))+5 • Right-most derivation • S  S+E E+5  (S)+5  (S+E)+5 (S+(S))+5 (S+(S+E))+5 (S+(S+4))+5 (S+(E+4))+5 (S+(3+4))+5 (S+E+(3+4))+5 (S+2+(3+4))+5 (E+2+(3+4))+5 (1+2+(3+4))+5

5.1: Context-Free Grammars (19)Derivation (Parsing) Trees Given the grammar S → aaSB | , B → bB | b A leftmost derivation S aaSB aaB aab A rightmost derivation S aaSB aaSb aab Both derivations correspond to the parse (or derivation) tree above.

5.1: Context-Free Grammars (20) In a parse tree, the nodes labeled with NTs correspond to the left side of a production rule and the children of that node correspond to the right side of the rule, e.g., S → aaSB. The tree structure shows the rule that is applied to each NT (for a specific derivation), without showing the order of rule application. Each internal node of the tree corresponds to a NT, and the leaves of the derivation tree represent the string of terminals. Note the tree applies to a specific derivation and may not include all rules, e.g., B → bB is not shown above.

5.1: Context-Free Grammars (21) Definition 5.3: Let G = (NT, T, S, P) be a context-free grammar. An ordered tree is a derivation tree for G if and only if it has the following properties: 1. the root is labeled S 2. every leaf has a label from T {} 3. every non-leaf (interior) vertex has a label from NT. 4. if a vertex has label A NT, and its children are labeled (left to right) a1, a2, ..., an, then P contain a production of the form A → a1a2…an 5. a leaf labeled has no siblings; that is, a vertex with a child labeled can have no other children Partial derivation tree: a tree having properties 3–5, but for which property 1 may not hold, and for which property 2 is replaced with: 2a. every leaf has a label from NT T {}

5.1: Context-Free Grammars (22) S  S +E E+E  (S)+E  (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E … (1+2+(3+4))+E (1+2+(3+4))+5 Derivation  Parse Tree S Parse Tree S + E S  S + E | E E  number | ( S ) E 5 ( S ) S + E ( S ) S + E E S + E 2 4 1 E 3 Derivation

5.1: Context-Free Grammars (23) E E E E E E E E op op op op E E E E id * id id * E  E op E E  E op E | ( E ) | - E | idop  + | - | * | / |   id op E id  id * E  id * id

5.1: Context-Free Grammars (24) • A derivation tree for grammar G yields a sentence of the language L(G) • A partial derivation tree yields a sentential form • The yield of a parse tree is the string of leaf symbols obtained by reading the tree left-to-right • the order they are encountered when the tree is traversed in a depth-first manner, always taking the leftmost unexplored branch

5.1: Context-Free Grammars (25) • Given the grammar • S → aAB A → bBbB → A | , • the yield of the partial derivation tree is the sentential form abBbB

5.1: Context-Free Grammars (26) • Given the grammar • S → aAB A → bBbB → A | , • the yield of the derivation tree is the sentence abbbb L(G).

5.1: Context-Free Grammars (27) • Theorem 5.1 (Relationship between Sentential Forms & Derivation Trees) • Let G = (NT, T, S, P) be a CFG. Then • for every w L(G), there exists a derivation tree of G whose yield is w. • the yield, w, of any derivation tree of G is such that w L(G). • if tGis any partial derivation tree for G whose root is labeled S, then the yield of tGis a sentential form of G. • As a side note, any w L(G) has a leftmost and a rightmost derivation.

5.1: Context-Free Grammars (28) • Given G = ({S, A, B}, {a, b}, S, {S→abB, A→aaBb|, B→bbAa}), • show L(G) = {ab(bbaa)nbba(ba)n : n ≥ 0}. • Basis: S abB abbbAa abbba =ab(bbaa)0bba(ba)0 • Inductive hypothesis: assume S *ab(bbaa)ibba(ba)i • Inductive step: if we “back up” one step and rather than terminating the hypothesis step via an A → substitution, we continue to expand, we get: • S ⇒*ab(bbaa)ibbAa(ba)i ab(bbaa)ibba(ba)i •  ab(bbaa)ibbaaBba(ba)i •  ab(bbaa)i+1B(ba)i+1 •  ab(bbaa)i+1bbAa(ba)i+1 •  ab(bbaa)i+1bba(ba)i+1 • By induction S *ab(bbaa)nbba(ba)n

5.1: Context-Free Grammars (29) • Given G = ({S, A, B}, {a, b}, S, { S → abB, A → aaBb | , B → bbAa}) • a leftmost derivation tree for w = abbbaabbaba.

5.1: Context-Free Grammars (30) • This grammar only has leftmost or rightmost derivations since there can never be more than one variable on the right hand side of the sentential forms. • Even so, to do a leftmost derivation from the tree, we simply expand the leftmost variables first starting from the root. The result is shown below • S abB abbbAa abbbaaBba abbbaabbAaba abbbaabbaba

In practical applications, it is usually not enough to decide whether a string belongs to a language. It is also important to know how to derive the string from the language. Parsing uncovers the syntactical structure of a string, which is represented by a parse tree. The syntactical structure is important for assigning semantics to the string -- for example, if it is a computer program. 5.2: Parsing and Ambiguity (1)

5.2: Parsing and Ambiguity (2) • Application of parsing (Compilers) • Let G be a context-free grammar for the C language. Let the string w be a C program. • One thing a compiler does - in particular, the part of the compiler called the “parser” - is determine whether w is a syntactically correct C program. • It also constructs a parse tree for the program that is used in code generation. • There are many sophisticated and efficient algorithms for parsing. You may study them in more advanced classes (for example, on compilers).

5.2: Parsing and Ambiguity (3) • S-grammars • Definition 5.4: A context-free grammar G = (NT, T, S, P) is said to be a simple grammar or s-grammar if all of its productions are of the form • A → ax where A  NT, a  T, x NT*, and any pair (A, a) occurs at most once in P. • Examples: • S → aS | bSS | c is an s-grammar. • S → aS| bSS | aSB| c is not an s-grammar. • because pair (S, a) occurs in two productions: • S → aS, and S → aSB

5.2: Parsing and Ambiguity (4) • Example: • given s-grammar G with production • S → aS | bSS | c • show the derivation of the string w = abcc • Since G is an s-grammar, • the only way to get the a up front is via rule • S → aS • the only way to get the b is via rule • S → bSS • the only way to get each c is via rule • S → c • Thus, we have parsed the string in 4 steps as • S aS abSS abcS abcc

5.2: Parsing and Ambiguity (5) • Find an s-grammar for • aaa*b | b • We need to start with an a or have only the b. • S → aA • S → b • A → aB • B → aB • B → b

5.2: Parsing and Ambiguity (6) • A terminal string may be generated by a number of different derivations • Let G be a CFG. A string w is in L(G), iff there is • a leftmost derivation of w from S. (S w) • Question • Is there a unique leftmost derivation of every sentence (string) in the language of a grammar? • NO * lm

5.2: Parsing and Ambiguity (7) id id id E E E E * + E E Consider the expression grammar: E  E+E | E*E | (E) | -E | id Two different leftmost derivations of id + id * id E E+ E  id+ E  id+E* E  id+id* E  id+id* id

5.2: Parsing and Ambiguity (8) E E E * E E + E id id id Consider the expression grammar: E  E+E | E*E | (E) | -E | id Two different leftmost derivations of id + id * id E E* E  E+ E * E  id +E* E  id + id *E  id + id * id

5.2: Parsing and Ambiguity (9)Ambiguity • Definition 5.5: • a grammar G is ambiguous if there is a string with at least two parse trees, • two or more leftmost or rightmost derivations. • Example: CFG G with productions S → aSb | SS | is ambiguous because there are two parse tree for w = aabb as shown below.

5.2: Parsing and Ambiguity (10) • A CFG is ambiguous if there is a string w L(G) that can be derived by two distinct leftmost derivation • A grammar G is ambiguous if there exists a sentence in G with more than one derivation (parsing) tree • A grammar that is not ambiguous is called unambiguous • If G is ambiguous then L(G) is not necessarily ambiguous • A language L is inherently ambiguous if there is no unambiguous grammar that generates it

5.2: Parsing and Ambiguity (11) • Let G be S →aS | Sa | a • G is ambiguous since the string aa has 2 distinct leftmost derivation • S aS  aa • S Sa  aa • L(G) = a+ • This language is also generated by the unambiguous grammar S→aS | a • L(G) is not ambiguous. Why?

5.2: Parsing and Ambiguity (12) • Let G be S →bS | Sb | aL(G) = b*ab* • G is ambiguous since the string bab has 2 distinct leftmost derivation • S bS  bSb  bab • S Sb  bSb  bab • The ability to generate the b’s in either order must be eliminated to obtain an unambiguous grammar • This language is also generated by the unambiguous grammars G2: S→bS | A A→Ab | a G1: S→bS | aA A→bA | e

5.2: Parsing and Ambiguity (13) 1 3 2 E E E E + * E E E E E * E E + E 3 1 2 • Eliminating Ambiguity • Consider the following grammar • E  E + E | E * E | num • Consider the sentence 1 + 2 * 3 • Leftmost Derivation 1 • E  E + E  1 + E  1 + E * E  1 + 2 * E  1 + 2 * 3 • Leftmost Derivation 2 • E E * E  E + E * E  1 + E * E  1 + 2 * E  1 + 2 * 3

5.2: Parsing and Ambiguity (14) • Different parse trees correspond to different evaluations (meaning) LMD-1 LMD-2 * + + * 1 2 3 1 2 3 = 9 = 7

5.2: Parsing and Ambiguity (15) • Ambiguous: E →E + E | E * E | number • Both + and * have the same precedence • To remove ambiguity, you have to give + and * different precedence • Let us say * has higher precedence than + • E E + T | T • T T*num | num

5.2: Parsing and Ambiguity (16) 3 2 E num T * E + T T num num 1 Eliminating Ambiguity • E E + T | T • T T * num | num • Consider the sentence 1 + 2 * 3 • Leftmost Derivation (only one LMD) • E  E + T  T + T  num + T  1 + T  1 + T * num  1 + num * num  1 + 2 * num  1 + 2 * 3 = 7

5.2: Parsing and Ambiguity (17) 3 2 E E * T 1 • Let us say + has higher precedence than * • E  E * T | T • T T+ num | num • Consider the sentence 1 + 2 * 3 • Leftmost Derivation (only one LMD) • E  E * T  T * T  T + num * T  num + num * T  1 + num * T  1 + 2 * T  1 + 2 * num  1 + 2 * 3 T num num T + num = 9

Chapter 5 Context-Free Grammars