1 / 87

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 1

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 1. Mayer Goldberg and Roman Manevich Ben-Gurion University. Books. Compilers Principles, Techniques, and Tools Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman.

mareo
Download Presentation

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 1 Mayer Goldberg and Roman Manevich Ben-Gurion University

  2. Books CompilersPrinciples, Techniques, and ToolsAlfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Modern Compiler Implementation in JavaAndrew W. Appel Modern Compiler DesignD. Grune, H. Bal, C. Jacobs, K. Langendoen Advanced Compiler Design and ImplementationSteven Muchnik

  3. Today • Understand role of syntax analysis • Context-free grammars • Basic definitions • Ambiguities • Top-down parsing • Predictive parsing • Next time: bottom-up parsing method

  4. The bigger picture Program consists of legal tokens Program included in a given context-free language Type checking, legal inheritance graph, variables initialized before used Memory safety: null dereference, array-out-of-bounds access, data races, assertion violation • Compilers include different kinds of program analyses each further constrains the set of legal programs • Lexical constraints • Syntax constraints • Semantic constraints • “Logical” constraints(Verifying Compiler grand challenge)

  5. Role of syntax analysis High-levelLanguage(scheme) LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep.(IR) CodeGeneration Executable Code • Recover structure from stream of tokens • Parse tree / abstract syntax tree • Error reporting (recovery) • Other possible tasks • Syntax directed translation (one pass compilers) • Create symbol table • Create pretty-printed version of the program, e.g., Auto Formatting function in Eclipse

  6. + num * num id From tokens to abstract syntax trees 5 + (7 * x) program text Lexical Analyzer Regular expressionsFinite automata token stream Grammar:E id E num E  E+EE  E*EE  (E ) Parser Context-free grammarsPush-down automata valid syntaxerror Abstract Syntax Tree

  7. Example grammar shorthand for Statement S  S;S S id :=E S print(L) E id E num E  E+E L  E L  L,E shorthand for Expression shorthand for List(of expressions)

  8. CFG terminology S  S;S S id :=E S print(L) E id E num E  E+E L  E L  L,E Symbols:Terminals(tokens): ; := ( )id num printNon-terminals: S E L Start non-terminal: SConvention: the non-terminal appearingin the first derivation rule Grammar productions (rules)N  α

  9. Language of a CFG • A sentence ω is in L(G) (valid program) if • There exists a corresponding derivation • There exists a corresponding parse tree

  10. Derivations • Show that a sentence ω is in a grammar G • Start with the start symbol • Repeatedly replace one of the non-terminals by a right-hand side of a production • Stop when the sentence contains only terminals • Given a sentence αNβ and rule NµαNβ => αµβ • ω is in L(G) if S =>* ω • Rightmost derivation • Leftmost derivation

  11. Leftmost derivation a := 56 ; b := 7 + 3 S S  S;S S id :=E S print(L) E id E num E  E+E L  E L  L,E => S ; S => id := E ; S => id := num ; S => id := num ; id := E => id := num ; id := E + E => id := num ; id := num + E => id := num ; id := num + num id := num ; id := num + num

  12. Rightmost derivation a := 56 ; b := 7 + 3 S S  S;S S id :=E S print(L) E id E num E  E+E L  E L  L,E => S ; S => S ; id := E => S ; id := E + E => S ; id := E + num => S ; id := num + num => id := E ; id := num + num => id := num ; id := num + num id := num ; id := num + num

  13. Parse trees N µ1 µk … Tree nodes are symbols, children ordered left-to-right Each internal node is non-terminal and its children correspond to one of its productions N  µ1 … µk Root is start non-terminal Leaves are tokens Yield of parse tree: left-to-right walk over leaves

  14. Parse tree example S  S;S S id :=E S print(L) E id E num E  E+E L  E L  L,E Draw parse tree for expression id := num ; id := num + num

  15. Parse tree example Order-independent representation S S  S;S S id :=E S print(L) E id E num E  E+E L  E L  L,E S S E E E E id := num ; id := num + num Equivalently add parentheses labeled by non-terminal names (S(Sa := (E56)E)S ; (Sb := (E(E7)E + (E3)E)E)S)S

  16. Capabilities and limitations of CFGs p. 173 • CFGs naturally express • Hierarchical structure • A program is a list of classes,A Class is a list of definition,A definition is either… • Beginning-end type of constraints • Balanced parentheses S  (S)S | ε • Cannot express • Correlations between unbounded strings (identifiers) • Variables are declared before use: ω S ω • Handled by semantic analysis

  17. Sometimes there are two parse trees Arithmetic expressions:E id E num E  E+EE  E*EE  (E ) 1 + 2 + 3 1 + (2 + 3) (1 + 2) + 3 E E E E E E E E E E num(1) + num(2) + num(3) num(1) + num(2) + num(3) Leftmost derivationEE + Enum + Enum + E +Enum + num +E num + num + num Rightmost derivationEE +EE +numE + E + numE + num + numnum + num + num

  18. Is ambiguity a problem? Arithmetic expressions:E id E num E  E+EE  E*EE  (E ) 1 + 2 + 3 1 + (2 + 3) (1 + 2) + 3 Depends on semantics E E E E E E E E E E num(1) + num(2) + num(3) = 6 num(1) + num(2) + num(3) = 6 Leftmost derivationEE + Enum + Enum + E +Enum + num +E num + num + num Rightmost derivationEE +EE +numE + E + numE + num + numnum + num + num

  19. Problematic ambiguity example Arithmetic expressions:E id E num E  E+EE  E*EE  (E ) 1 + 2 * 3 1 + (2 * 3) (1 + 2) * 3 This is what we usually want: * has precedence over + E E E E E E E E E E num(1) + num(2) * num(3) = 7 num(1) + num(2) * num(3) = 9 Leftmost derivationEE + Enum + Enum + E * Enum + num * E num + num * num Rightmost derivationEE * EE *numE + E * numE + num * numnum + num * num

  20. Ambiguous grammars • A grammar is ambiguous if there exists a sentence for which there are • Two different leftmost derivations • Two different rightmost derivations • Two different parse trees • Property of grammars, not languages • Some languages are inherently ambiguous – no unambiguous grammars exist • No algorithm to detect whether arbitrary grammar is ambiguous

  21. Drawbacks of ambiguous grammars • Ambiguous semantics • Parsing complexity • May affect other phases • Solutions • Transform grammar into non-ambiguous • Handle as part of parsing method • Using special form of “precedence” • Wait for bottom-up parsing lecture

  22. Transforming ambiguous grammars to non-ambiguous by layering Unambiguous grammar E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Ambiguous grammarE  E + EE  E * EE  id E  num E  ( E ) Each layer takes care of one way of composing sub-strings to form a string:1: by +2: by *3: atoms Layer 1 Layer 2 Layer 3 Let’s derive 1 + 2 * 3

  23. Transformed grammar: * precedes + Unambiguous grammar E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Ambiguous grammarE  E + EE  E * EE  id E  num E  ( E ) Parse tree E DerivationE=> E + T=> T + T=> F + T=> 1 + T=> 1 + T * F=> 1 + F * F=> 1 + 2 * F=> 1 + 2 * 3 E T T T F F F 1 + 2 * 3

  24. Transformed grammar: + precedes * Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) Ambiguous grammarE  E + EE  E * EE  id E  num E  ( E ) Parse tree E DerivationE=> E * T=> T * T=> T + F * T=> F + F * T=> 1 + F * T=> 1 + 2 * T=> 1 + 2 * F=> 1 + 2 * 3 E T T T F F F 1 + 2 * 3

  25. Another example for layering Unambiguous grammar S P S | ε P  ( S ) Ambiguous grammarP ε | P P | ( P ) Takes care of “concatenation” Takes care of nesting

  26. “dangling-else” example p. 174 Ambiguous grammar S ifEthenS S |ifEthenS else S | other This is what we usually want: match else to closest unmatched then ifE1then ifE2thenS1else S2 ifE1then (ifE2thenS1else S2) ifE1then (ifE2thenS1)else S2 S S if E then S else S if E then S E1 E1 S2 if E then S else S if E then S E2 S1 S2 E2 S1

  27. “dangling-else” example p. 174 Ambiguous grammar S ifEthenS S |ifEthenS else S | other Unambiguous grammar S  M | UM ifEthenM else M | otherU ifEthenS |ifEthenM else U Matched statements Unmatched statements ifE1then ifE2thenS1else S2 ifE1then (ifE2thenS1else S2) ifE1then (ifE2thenS1)else S2 S S if E then S else S if E then S E1 E1 S2 if E then S else S if E then S E2 S1 S2 E2 S1

  28. Broad kinds of parsers • Parsers for arbitrary grammars • Earley’s method, CYK method O(n3) • Not used in practice • Top-Down • Construct parse tree in a top-down matter • Find the leftmost derivation • Predictive: for every non-terminal and k-tokens predict the next production LL(k) • Preorder tree traversal • Bottom-Up • Construct parse tree in a bottom-up manner • Find the rightmost derivation in a reverse order • For every potential right hand side and k-tokens decide when a production is found LR(k) • Postorder tree traversal

  29. Top-down vs. bottom-up • Top-down parsing • Beginning with the start symbol, try to guess the productions to apply to end up at the user's program • Bottom-up parsing • Beginning with the user's program, try to apply productions in reverse to convert the program back into the start symbol

  30. Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T T F F F 1 + 2 * 3

  31. Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) We need this rule to get the * E 1 + 2 * 3

  32. Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T 1 + 2 * 3

  33. Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T F 1 + 2 * 3

  34. Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T T F F 1 + 2 * 3

  35. Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T T F F F 1 + 2 * 3

  36. Top-down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T T F F F 1 + 2 * 3

  37. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) 1 + 2 * 3

  38. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) F 1 + 2 * 3

  39. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) F F 1 + 2 * 3

  40. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) T F F 1 + 2 * 3

  41. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) T F F F 1 + 2 * 3

  42. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) T T F F F 1 + 2 * 3

  43. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) T T T F F F 1 + 2 * 3

  44. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E T T T F F F 1 + 2 * 3

  45. Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) E E T T T F F F 1 + 2 * 3

  46. Challenges in top-down parsing • Top-down parsing begins with virtually no • information • Begins with just the start symbol, which matches every program • How can we know which productions to apply? • In general, we can‘t • There are some grammars for which the best we can do is guess and backtrack if we're wrong • If we have to guess, how do we do it? • Parsing as a search algorithm • Too expensive in theory (exponential worst-case time) and practice

  47. Predictive parsing • Given a grammar G and a word w attempt to derive w using G • Idea • Apply production to leftmost nonterminal • Pick production rule based on next input token • General grammar • More than one option for choosing the next production based on a token • Restricted grammars (LL) • Know exactly which single rule to apply • May require some lookahead to decide

  48. Boolean expressions example E  LIT | (E OP E) | not E LIT true|false OP and | or | xor not ( not true or false ) production to apply known from next token E E E => notE => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not ( E OP E ) not LIT or LIT true false

  49. Recursive descent parsing • Define a function for every nonterminal • Every function work as follows • Find applicable production rule • Terminal function checks match with next input token • Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead

  50. Matching tokens E  LIT | (E OP E) | not E LIT true|false OP and | or | xor match(token t) { if (current == t) current = next_token() else error } Variable current holds the current input token

More Related