1 / 50

Parsing Context-Free Grammars (CFG) and Ambiguous Grammars

This article explores the concept of parsing context-free grammars (CFG) and dealing with ambiguous grammars using examples and explanations.

wandaa
Download Presentation

Parsing Context-Free Grammars (CFG) and Ambiguous Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE P501 – Compilers Parsing Context Free Grammars (CFG) Ambiguous Grammars Next Jim Hogg - UW - CSE - P501

  2. Parsing ‘Middle End’ Back End Target Source Front End chars IR IR Scan Select Instructions Optimize tokens IR Allocate Registers Parse IR AST Emit Convert IR IR Machine Code AST = Abstract Syntax Tree IR = Intermediate Representation Jim Hogg - UW - CSE - P501

  3. Valid Tokens != Valid Program MiniJava includes the following tokens (among many others): • class int [ ( . true < this ) + * ; while = if id ilit ! / new { So a MiniJavaScanner would happily accept the following program: • int ; = true { while ( x < true * if { or 123 ) goto count_99 We rely on a MiniJavaParser to reject this kind of gibberish But how do we specify what makes a validMiniJava program? Jim Hogg - UW - CSE - P501

  4. What is Parsing? • Analogous to parsing an English sentence • Analyze words into subject, verb, object, etc • How to parse a program? • Analyze tokens into language constructs: assignment, if-clause, function-call, while-loop, etc Jim Hogg - UW - CSE - P501

  5. Parsing – so what’s the problem? • The set of valid programs is infinite • The set of invalid programs is infinite • Q: How to specify all valid programs, succinctly? • A: Define a Grammar • More specifically, a Context-Free Grammar (CFG) Jim Hogg - UW - CSE - P501

  6. Context-Free Grammars (CFG) Grammar for the Hokum Language • ProgStm;Prog|Stm • StmAsStm|IfStm • AsStmVar=Exp • IfStmifExpthenAsStm • VorCVar|Const • ExpVorC|VorC+VorC • Var[a-z] • Const[0-9] • Context-Free Grammar ~ CFG ~ Grammar ~ Backus-Naur Form ~ BNF • Productions, or Rules • Terminals & Non-Terminals; Start (Symbol) • Multiple languagespresent in the description Jim Hogg - UW - CSE - P501

  7. Example Hokum Programs Legal Hokum BNF Grammar a= 1; b = a + 4; z = 1; if b + 3 then z = 2 ProgStm;Prog|Stm StmAsStm|IfStm AsStm Var=Exp IfStm ifExpthenAsStm VorC Var|Const Exp VorC|VorC+VorC Var [a-z] Const [0-9] Illegal a= x < 20 b= a + 4 + 5 ; z = 1 if (a == 33) z < 2 ; But how do we know which programs are legal or illegal, in Hokum? Jim Hogg - UW - CSE - P501

  8. ProgStm;Prog|Stm StmAsStm|IfStm AsStm Var=Exp IfStm ifExpthenAsStm VorC Var|Const Exp VorC|VorC+VorC Var [a-z] Const [0-9] Derivation Prog => Stm; Prog => AsStm; Prog => Var= Exp ; Prog => a = Exp; Prog => a = VorC ; Prog => a = Const ; Prog => a = 1 ; Prog => a = 1 ; Stm => a = 1 ; IfStm => a = 1 ; if Expthen AsStm => a = 1 ; if VorC+ VorC then AsStm => a = 1 ; if Var + VorC then AsStm => a = 1 ; if a + VorCthen AsStm => a = 1 ; if a + Const then AsStm => a = 1 ; if a + 1 then AsStm => a = 1 ; if a + 1 then Var = Exp => a = 1 ; if a + 1 then b = Exp => a = 1 ; if a + 1 then b = VorC => a = 1 ; if a + 1 then b = Const => a = 1 ; if a + 1 then b = 2 • => versus  • Leftmost, rightmost, middlemost • Sentential Form & Sentence • What is a Context-Sensitive Grammar? Jim Hogg - UW - CSE - P501

  9. ProgStm;Prog|Stm StmAsStm|IfStm AsStm Var=Exp IfStm ifExpthenAsStm VorC Var|Const Exp VorC|VorC+VorC Var [a-z] Const [0-9] Parse Tree Prog Prog ; Stm Stm AsStm IfStm Var = Exp then Exp if AsStm a VorC Var = Exp VorC + VorC Const VorC b Var Const Const 1 1 a 2 Jim Hogg - UW - CSE - P501

  10. Junk Nodes in the Parse Tree Prog Prog ; Stm Stm AsStm IfStm Var = Exp then Exp if AsStm a VorC Var = Exp VorC + VorC Const VorC b Var Const Const 1 1 a 2 Jim Hogg - UW - CSE - P501

  11. AST (Abstract Syntax Tree) Prog = IfStm Var:a Const:1 + = Const:2 Var:b Var:a Const:1 Jim Hogg - UW - CSE - P501

  12. Why Not Just RegEx? Try to invent a regex description for arithmetic expressions: single-character variable names; operators + - and  [Note: red denotes terminals, below] v = [a-z] // variable o = + | - |  | // operator v ( o v )* // derives a + b  cok, but now add ( ) (? v (o v )? )* // derives (a + b)  c // but also gibberish like:a + b)  c ( Almost every programming language includes such balanced pairs: ( ), { }, begin end. Conclusion: regex won’t work. More generally, regex correspond to DFAs. They can only ‘count’ pairs up to a finite limit. Jim Hogg - UW - CSE - P501

  13. Context-Sensitive Grammar? • All compiler work uses Context-Free Grammars, or CFGs • Why so-called? Alternatively: • What is a non-context-free grammar? (ie, a Context-Sensitive Grammar) • Suppose production B   • CFG: we can replace B by , no matter what • eg:  B  =>    • CSG: we can replace B by  only in certain contexts. Ie, only when B is preceded and/or followed by certain strings • eg: c B  d  Jim Hogg - UW - CSE - P501

  14. Example CSG The following CSG generates the language an bncn for n >= 1 • S  a b c • | a S B c • c B  W B • W B  W X • W X  B X • B X  B c • b B  b b Note: CSGs will not be discussed further, nor examined, as part of P501 Jim Hogg - UW - CSE - P501

  15. Parsing • The syntax of most programming languages can be specified by a Context-Free Grammar or CFG • Parsing = "How to fill the gap between Start Symbol and Sentence" • L(G) = the language generated by G = the set of sentences generated by G • Parsing: Given G and a sentencewin L(G ), construct the derivation, (parse tree) for w in some order • As we parse, do something useful at each node in the tree Jim Hogg - UW - CSE - P501

  16. "in some order" • Top-down • Start with the root • Traverse the parse tree depth-first; scan tokens, Left-to-right; create a Leftmost derivation • LL(k) • Bottom-up • Start at leaves and build up to the root; scan tokens, Left-to-right; create a Rightmost derivation (in reverse) • LR(k) and subsets: LALR(k) and SLR(k) Jim Hogg - UW - CSE - P501

  17. "do something useful at each node" • Perform some semantic action: • Construct nodes of full parse tree (rare) • Construct abstract syntax tree (common) • Construct linear, lower-level representation • like assembler code • Generate target code on the fly • 1-pass compiler • not common in production compilers: poor code quality Jim Hogg - UW - CSE - P501

  18. Context-Free Grammars – Formal Description • A grammar G is a tuple <N, T, P, S> where • N a finite set of Non-terminal symbols • T a finite set of Terminal symbols • P a finite set of Productions • A subset of N × (N  T) * • S the start symbol, a distinguished element of N • If not specified otherwise, this is taken as the Non-Terminal on the LHS of the first production Jim Hogg - UW - CSE - P501

  19. Standard Notations • a, b, c element of T • A, B, C element of N • w, x, y, z elements of T* • X, Y, Z element of N T • , ,  elements of (N  T)* • (A, ) P => A  Jim Hogg - UW - CSE - P501

  20. Derivation Relations (1) • if B P then B =>  • simply affirms G is context-free • A =>*  • denotes there is a chain of zero-or-more productions, starting with A, that generates  • transitive closure Jim Hogg - UW - CSE - P501

  21. Derivation Relations (2) • if B P then w B =>lm w  • derives leftmost • prefix of A is all terminals (by construction) • if B P then B w =>rm w • derives rightmost • prefix of A may include terminals and non-terminals • We will only be interested in leftmost and rightmost derivations – not random orderings Jim Hogg - UW - CSE - P501

  22. Languages • All the sentences (strings of Terminals) I can generate from NonTerminal A: • For A  N, L(A) = { w | A =>* w } • All the sentences (strings of Terminal) I can generate from start symbol S: • If S is the start symbol of grammar G, define L(G ) = L(S) Jim Hogg - UW - CSE - P501

  23. Reduced Grammars • Grammar G is reduced iff for every productionA in P there is some derivation S =>* x A z => x  z =>* xyz • ie, no production is useless • Convention: we will use only reduced grammars Jim Hogg - UW - CSE - P501

  24. Ambiguous Grammars • Grammar G is unambiguous iff every w in L(G ) has a unique leftmost (or rightmost) derivation • Fact: unique leftmost or unique rightmost implies the other • A grammar lacking this property is ambiguous • Note: other grammars that generate the same language may be unambiguous • So, "ambiguous" applies to a grammar – not a language • We need unambiguous grammars for parsing (well mostly: see later) Jim Hogg - UW - CSE - P501

  25. Example: Ambiguous Grammar ExpExp Op Exp | Dig Op + | - | * | / Dig  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • Exercise: show that this is ambiguous • How? Show two different leftmost or rightmost derivations for the same string • Equivalently: show two different parse trees for the same string Jim Hogg - UW - CSE - P501

  26. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 1 Give a leftmost derivation of 2+3*4; show the parse tree Exp Exp Jim Hogg - UW - CSE - P501

  27. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 2 Exp Exp => Exp+ Exp Exp Exp + Jim Hogg - UW - CSE - P501

  28. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 3 Exp Exp Exp => Exp+ Exp => Dig + Exp Exp Exp Dig + Jim Hogg - UW - CSE - P501

  29. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 4 Exp Exp Exp => Exp+Exp => Dig + Exp => 2 + Exp Exp Exp Dig + 2 Jim Hogg - UW - CSE - P501

  30. Exp ::= Exp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig::= [0-9] 2+3*4 – part 5 Exp Exp Exp => Exp+ Exp => Dig + Exp => 2 +Exp => 2 + Exp * Exp Exp Exp Exp Exp Dig 2 + * Jim Hogg - UW - CSE - P501

  31. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 6 Exp Exp Exp => Exp+ Exp => Dig + Exp => 2 +Exp => 2 + Exp * Exp => 2 + Dig * Exp Exp Exp Exp Exp Dig Dig 2 * + Jim Hogg - UW - CSE - P501

  32. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 7 Exp Exp Exp => Exp+ Exp => Dig + Exp => 2 +Exp => 2 + Exp* Exp => 2 + Dig * Exp => 2 + 3 * Exp Exp Exp Exp Exp Dig Dig 2 + 3 * Jim Hogg - UW - CSE - P501

  33. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 8 Exp Exp Exp => Exp+ Exp => Dig + Exp => 2 +Exp => 2 + Exp* Exp => 2 + Dig * Exp => 2 + 3 * Exp => 2 + 3 * Dig Exp Exp Exp Exp Dig Dig Dig 2 3 * + Jim Hogg - UW - CSE - P501

  34. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 9 Exp Exp Exp => ExpOp Exp => Dig Op Exp => 2 OpExp => 2 + Exp => 2 + ExpOp Exp => 2 + Dig Op Exp => 2 + 3 OpExp => 2 + 3 * Exp => 2 + 3 * Dig => 2 + 3 * 4 Exp Exp Exp Op Exp Dig Dig Dig 2 3 * 4 + Jim Hogg - UW - CSE - P501

  35. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] 2+3*4 – part 10 Give a different leftmost derivation of 2+3*4 Exp Exp => Exp * Exp => Exp + Exp * Exp => 2 + Exp * Exp => 2 + 3 *Exp => 2 + 3 * 4 Exp Exp Exp Exp Dig Dig Dig 4 2 + 3 * Jim Hogg - UW - CSE - P501

  36. Are derivations equivalent? * + 4 * + 2 3 4 2 3 Result = 2 + (3 * 4) = 14 Result = (2 + 3) * 4 = 20 Jim Hogg - UW - CSE - P501

  37. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] Another example • Give two different derivations of 5 – 6 – 7 Jim Hogg - UW - CSE - P501

  38. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] Another example - result = 6 Give two different rightmost derivations of 5 – 6 – 7 Exp => Exp - Exp => Exp - Exp - Exp => Exp - Exp- 7 => Exp- 6 - 7 => 5 - 6 - 7 5 - 6 7 Exp => Exp- Exp => Exp- 7 => Exp - Exp- 7 => Exp- 6 - 7 => 5 - 6 - 7 result = -8 - 7 - 6 5 Jim Hogg - UW - CSE - P501

  39. ExpExp + Exp | Exp – Exp | Exp * Exp | Exp / Exp | Dig Dig [0-9] Another example - result = 6 Give two different leftmost derivations of 5 – 6 – 7 Exp => Exp- Exp => 5 - Exp => 5 - Exp- Exp => 5 - 6 - Exp => 5 - 6 - 7 5 - 6 7 result = -8 - Exp => Exp- Exp => Exp- Exp - Exp => 5 - Exp - Exp => 5 - 6 - Exp => 5 - 6 - 7 7 - 6 5 Jim Hogg - UW - CSE - P501

  40. What went wrong? • Grammar did not capture precedence or associativity • Eg: 2 + (3 * 4) = 14 versus (2 + 3) * 4 = 20 • Eg: 5 - (6 - 7) = 6 versus (5 - 6) - 7 = -8 • Solution • Create a non-terminal for each level of precedence • Isolate the corresponding part of the grammar • Force the parser to recognize higher precedence sub-expressions first Jim Hogg - UW - CSE - P501

  41. Classic Expression Grammar expexp + term | exp – term | term term  term * factor | term / factor | factor factor int | ( exp ) int 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 E  E + T | E – T | T T  T * F | T / F | F F  ( E ) | D D [0-9] Jim Hogg - UW - CSE - P501

  42. E  E + T | E – T | T T  T * F | T / F | F F  ( E ) | D D [0-9] Derive 2 + 3 * 4 E => E + T => E + T * F => E + T * D => E + T * 4 => E + F * 4 => E + D * 4 => E + 3 * 4 => T + 3 * 4 => F + 3 * 4 => D + 3 * 4 => 2 + 3 * 4 + * 2 4 3 Result = 2 + (3 * 4) = 14 This grammar yields the correct, expected (school algebra) result Jim Hogg - UW - CSE - P501

  43. E  E + T | E – T | T T  T * F | T / F | F F  ( E ) | D D [0-9] Derive 5 - 6 - 7 E => E - T => E - F => E - D => E - 7 => E - T - 7 => E - F - 7 => E - D - 7 => E - 6 - 7 => F - 6 - 7 => D - 6 - 7 => 5 - 6 - 7 result = -8 - 7 - 6 5 • This grammar yields the correct, expected (school algebra) result • Note how left-recursive rules yield left-associativity Jim Hogg - UW - CSE - P501

  44. Classic Example of Ambiguous Grammar • Grammar for conditional statements stm if ( cond ) stm | if ( cond ) stm else stm • Exercise: show that this is ambiguous • How? “The Dangling Else” - a 'weakness' in C, Pascal, etc Jim Hogg - UW - CSE - P501

  45. Two Derivations stmif ( cond ) stm | if ( cond ) stmelse stm stm if cond ) stm ( if (cond) if (cond) stm else stm if stm else ) cond ( stm stm if stm else ) cond ( stm if cond ) stm ( Jim Hogg - UW - CSE - P501

  46. Solving the Dangling Else • Fix the grammar to separate if statements with else clause from those without • Done in Java reference grammar • Adds lots of non-terminals • Use some ad-hoc rule in parser • “else matches closest unpaired if” • Change the language • Only possible if you 'own' the language Jim Hogg - UW - CSE - P501

  47. Resolving Ambiguity with Grammar (1) StmIfElse | IfNoElse IfElse if ( Exp ) IfElse else IfElse IfNoElse if ( Exp ) Stm | if ( Exp ) IfElse else IfNoElse • formal, no additional rules beyond syntax • sometimes obscures original grammar Jim Hogg - UW - CSE - P501

  48. Resolving Ambiguity with Grammar (2) • If you can (re-)design the language, avoid the problem entirely IfStm if Exp then Stm end | if Exp then Stm else Stm end • formal, clear, elegant • allows sequence of Stms in then and else branches, no { } needed • extra end required for every if Jim Hogg - UW - CSE - P501

  49. Parser Tools and Operators • Most parser tools cope with ambiguous grammars • Earlier productions chosen before later ones • Longest match used if there is a choice • Makes life simpler if used with discipline • But be sure the tool does what you really want • Specify operator precedence & associativity • Allows simpler, ambiguous grammar with fewer non-terminals • Used in CUP Jim Hogg - UW - CSE - P501

  50. Next • Next • LR (bottom-up / shift-reduce) parsing • Reading • Continue Cooper&Torczon chapter 3 • Note • Note: LR parsing is the toughest session in P501 Jim Hogg - UW - CSE - P501

More Related