210 likes | 429 Views
Discussion #3 Grammar Formalization & Parse-Tree Construction. Topics. Grammar Definitions Parse Trees Constructing Parse Trees. Formal Definition of a Grammar. A grammar G is a 4-tuple: G = (V N , V T , S, ), where
E N D
Topics • Grammar Definitions • Parse Trees • Constructing Parse Trees
Formal Definition of a Grammar A grammar G is a 4-tuple: G = (VN, VT, S, ), where • VN , VT , sets of non-terminal and terminal symbols • SVN, a start symbol • = a finite set of relations from (VT VN)+ to (VT VN)* • an element of , (, ), is written as and is called a production rule or a rewriting rule
Definition of a Context-Free Grammar • A context-free grammar is a grammar with the following restriction: • The relation is a finite set of relations from VN to (VT VN)+ • i.e. the left hand side of a production is a single non-terminal • i.e. the right hand side of any production cannot be empty • Context-free grammars generate context-free languages. With slight variations, essentially all programming languages are context-free languages.
Examples of Grammars (again) Which are context-free grammars?
Backus-Naur Form (BNF) • A traditional meta language to represent grammars for programming languages • Every non-terminal is enclosed in < and > • Instead of the symbol we use ::= • Example • I L | ID | IL • L a | b | … | z • D 0 | 1 | … | 9 • BNF: • <I> ::= <L> | <I><D> | <I><L> • <L> ::= a | b | … | z • <D> ::= 0 | 1 | … | 9
Definition: Direct Derivative Let G = (VN, VT, S, ) be a grammar and , (VN VT)*, is said to be a direct derivative of , (written ) if there are strings 1 and 2 (including possibly empty strings) such that = 1B2, = 12, B VN and B is a production of G.
Example: Direct Derivatives G = (VN, VT, S, ), where: VN = {I, L, D} VT = {a, b, …, z, 0, 1, …, 9} S = I = { I L | ID | IL L a | b | … | z D 0 | 1 | … | 9 }
Definition: Derivation Let G = (VN, VT, S, ) be a grammar A string produces ( reduces to or is the derivation of , written + ), if there are strings 0, 1, …, n (n>0) such that = 0 1, 1 2, …, n-1 n, n .
Example: Derivation • LetG = (VN, VT, S, ), where: VN = {I, L, D} VT = {a, b, …, z, 0, 1, …, 9} S = I = { I L | ID | IL L a | b | … | z D 0 | 1 | … | 9 } • I produces abc12 I ID IDD ILDD ILLDD LLLDD aLLDD abLDD abcDD abc1D abc12
Definition: Language • A sentential form is any derivative of the start symbol S. • A language L generated by a grammar G is the set of all sentential forms whose symbols are all terminals; that is, L(G) = { | S + and VT*}
Example: Language • LetG = (VN, VT, S, ), where: VN = {I, L, D} VT = {a, b, …, z, 0, 1, …, 9} S = I = { I L | ID | IL L a | b | … | z D 0 | 1 | … | 9 } • I produces abc12 • L(G) = {abc12, x, m934897773645, a1b2c3, …} I ID IDD ILDD ILLDD LLLDD aLLDD abLDD abcDD abc1D abc12
Syntax Analysis: Parsing • The parse of a sentence is the construction of a derivation for that sentence • The parsing of a sentence results in • acceptance or rejection • and, if acceptance, then also a parse tree • We are looking for an algorithm to parse a sentence (i.e. to parse a program) and produce a parse tree.
Parse Trees • A parse tree is composed of • interior nodes representing syntactic categories (non-terminal symbols) • leaf nodes representing terminal symbols • For each interior node N, the transition from N to its children represents the application of a production.
Parse Tree Construction • Top-down • Starts with the root (starting symbol) • Proceeds downward to leaves using productions • Bottom-up • Starts from leaves • Proceeds upward to the root • Although these seem like reasonable approaches to develop a parsing algorithm, we’ll see that neither works well so we’ll need to find a better way.
E * E E + E D D D 4 2 3 Example: Top-Down Parse for 4 * 2 + 3 • VN = {E, D} • VT = {0, 1, …, 9, +, , *, /, (, )} • S = E • = { E D | ( E ) | E + E| E – E | E * E| E / E , • D 0 | 1 | … | 9 } E • Problems: • How do we guess • which rule applies? • Note that we produced • the wrong parse tree • (precedence is wrong)
E E E + E E E * D E + E E * E D 3 D D 4 D D 4 2 2 3 Ambiguous GrammarTwo Different Parse Trees for 4*2+3 • = { E D | ( E ) | E + E| E – E | E * E| E / E , D 0 | 1 | … | 9 }
A ( A + A ) ( ( A * A ) + A ) ( ( A * ( A + A ) ) + I ) ( ( V * ( V + V ) ) + I D) Problem: I ?? D ( ( L * ( L + L ) ) + D D) Example: Bottom-Up Parse Problem: scanning the entire program repeatedly • A V | I | (A + A) | (A * A) • V L | VL | VD • I D | ID • D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • L x | y | z ( ( z * ( x + y ) ) + 1 2 )
So, how do we develop a parsing algorithm? • “Fix” the grammar • So that we can go top down, left to right, with no backup • LL(1) grammar: Left-to-right, Left-most non-terminal, one symbol look ahead • “Fix” (How?) • Observe grammar properties: determine what’s needed to make them LL(1) • Transform grammars to make them LL(1) • Note: works for many grammars, but not all