1 / 69

Syntax Definition in Programming Languages: Context-Free Grammars

Learn about context-free grammars, hierarchical structures, and productions in programming languages. Explore examples illustrating grammar rules, parse trees, and ambiguity. Understand operator associativity, precedence, and syntax-directed translation.

omcloud
Download Presentation

Syntax Definition in Programming Languages: Context-Free Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2 A Simple One-Pass Compiler Yu-Chen Kuo

  2. 2.1 Overview • Programming language: • What its program look like (Syntax : context-free grammars) • What its program mean (Semantics : more difficult) Yu-Chen Kuo

  3. 2.2 Syntax Definition • Context-free grammar • Grammar : hierarchical structure • stmtif (expr) stmtelsestmt • production • token: if, (, else • nonterminal: expr, stmt Yu-Chen Kuo

  4. Context-free Grammar • A set of tokens (terminals) • Digits • Sign (+, -, <, =) • if, while • A set of nonterminals • A set of productions • nonterminal  ternimal/nonterminal • left side  right side • First nonterminal symbol: start symbol Yu-Chen Kuo

  5. Example 2.1: Grammars of expression ‘9-5+2’ • Example 2.1: grammars of expression ‘9-5+2’ list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9 • list  list+digit | list-digit | digit • nonterminal: list (start symbol), digit • terminal (token): 0| 1| 2| 3| 4| 5| 6| 7| 8| 9 Yu-Chen Kuo

  6. Example 2.1: Grammars of expression ‘9-5+2’ • Token strings are derived from the start symbol and repeatedly replacing a nonterminal by the right side of a production • Empty string:  • All possible token strings form the language defined by the grammar Yu-Chen Kuo

  7. Example 2.2: Parse Tree • Show how the start symbol derives a string list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9 Yu-Chen Kuo

  8. Parse Trees A X Y Z • Root is labeled by start symbol • Each leaf is labeled by a token or  • Each interior is labeled by a nonterminal • If A is the nonterminal node and X1, X2,..Xn are the labels of children of that node from left to right, then A  X1, X2,..Xn , is a production Yu-Chen Kuo

  9. Example 2.3: Pascal begin-end blocks block begin opt_stmts end opt_stmts  stmt_list | stmt_list  stmt_list ; stmt | stmt stmtif (expr) stmtelsestmt | assignment stmt Yu-Chen Kuo

  10. Ambiguity of A Grammar • A grammar is said to be ambiguous if it can have more than one parser tree generating a given string. Yu-Chen Kuo

  11. Ambiguity of A Grammar string  string+string | string-string string  0|1|2|3|4|5|6|7|8|9 • Two expressions (9-5)+2 and 9-(5+2) Yu-Chen Kuo

  12. Associativity of Operators • Left Associative: 9+5-2  (9+5)-2 • +, -, *, / • Parse tree grows down towards the left • Right Associative: a=b=c  a=(b=c) • Parse tree grows down towards the right Yu-Chen Kuo

  13. Associativity of Operators right  letter = right | letter letter a|b|c|…|z Yu-Chen Kuo

  14. Precedence of Operators • 9+5*2  9+(5*2) • * , / has higher precedence than +, - • *, /, +, - are all left associative • term for *, / • term  term * factor | term / factor | factor • expr for +,- • expr  expr + factor | expr – factor | factor • factor digit |(expr) Yu-Chen Kuo

  15. Precedence of Operators • Syntax of expression expr  expr + term | expr – term | term term  term * factor | term / factor | factor factor digit |(expr) • Syntax of statement for Pascal (ambiguous?) stmt id := expr | if expr then stmt | if expr then stmt else stmt | while expr do stmt | begin opt_stmts end Yu-Chen Kuo

  16. 2.3 Syntax-Directed Translation • The syntax-directed definition and translation schema are two formalisms for specifying translations for programming language • A syntax-directed definition uses a context-grammar to specify the syntactic structure • With each grammar symbol X, it associates a set of attributes, and with each production, a set of semantic rules for computing value of the attributes X.a of the symbols • The grammar and the set of semantic rules constitute the syntax-directed definition Yu-Chen Kuo

  17. 2.3 Syntax-Directed Translation • A syntax-directed definition for translating expressions consisting of digits separated by plus or minus into postfix notation Yu-Chen Kuo

  18. Postfix Notation • If E is a variable, then postfix(E)=E • If E is an expression of form E1opE2, then the postfix(E)= E1 E2  op, where E1  = postfix(E1)= and E2  = postfix(E2) • If E is an expression of the form (E1), then postfix(E)= postfix (E1) • postfix(9-5+2)=95-2+ Yu-Chen Kuo

  19. Postfix Notation Yu-Chen Kuo

  20. Robot’s position Yu-Chen Kuo

  21. Robot’s position Yu-Chen Kuo

  22. Robot’s position Yu-Chen Kuo

  23. Depth-First Traversals Yu-Chen Kuo

  24. Translation Schemes • A translation scheme is a context-free grammar in which semantic actions are embedded within the right sides of productions • A translation scheme is like a syntax-directed definition, except the order of evaluation of the semantic rules is explicitly shown Yu-Chen Kuo

  25. Translation Schemes Yu-Chen Kuo

  26. 2.4 Parsing • Parsing is the process of determining if a string of tokens can be generated by a grammar. • For any context-free grammar, a parser will takes at most O(n3) time to parse a string of n tokens, too expensive. • Given a programming language, we can generally construct a grammar that can be parsed in linear time ( make a single left-to-right scan, looking ahead one token at a time) Yu-Chen Kuo

  27. 2.4 Parsing • Top-down parser: parser tree construction starts at the root and proceeds towards the leaves • Bottom-up parser : parser tree construction starts at the leaves and proceeds towards the root. (most class of grammars) Yu-Chen Kuo

  28. Top-Down Parsing • The construction of parser tree is done by started with the root, labeled with the starting nonterminal, and repeatedly performing the following two steps. • At node n, labeled with A, select one of production for A and construct children at n for the symbols on the right side of production. • Find the next node at which a subtree is to be constructed. Yu-Chen Kuo

  29. Example type  simple |id|array[simple]of type simple  integer |char|num dotdot num e.x.; array[num dotdot num]ofinteger Yu-Chen Kuo

  30. Example (Cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num e.x.; array[num dotdot num]ofinteger Yu-Chen Kuo

  31. Example (Cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num Yu-Chen Kuo

  32. Example (Cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num Yu-Chen Kuo

  33. Predictive Parsing • Recursive-descent parsing is a top-down parsing Yu-Chen Kuo

  34. Predictive Parsing (Cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num Yu-Chen Kuo

  35. Predictive Parsing (Cont.) • Use lookahead symbol and first symbol (FIRST)of a production to unambiguously determine the procedure selected for each nonterminal. • FIRST (): the set of tokens that appear as the first symbols of one or more strings generated from  • FIRST (simple) = { integer, char, num} • FIRST (id ) = {} • FIRST (array [ simple] oftype) = {array} • A   | , then FIRST ()  FIRST () in predictive parsing Yu-Chen Kuo

  36. When to Use -Production stmt begin opt_stmts end opt_stmts  stmt_list | • While parsing opt_stmts, if lookahead symbol is not in FIRST(stmt_list), then –production is used, lookahead symbol is end; otherwise, error Yu-Chen Kuo

  37. Designing a Predictive Parser • Consisting of a procedure for every nonterminal • Each procedure does two things. • Decide which production to use by looking at the lookahead symbol. The production with right side  is used if the lookahead symbol is in FIRST(). If the lookahead symbol is not in the FIRST set of any other right hand side, a production with  on the right side is used. • The procedure uses a production by mimicking the right side. A nonterminal results in a procedure call for the nonterminal. A token matching the lookahead symbol results in reading the next input token. Yu-Chen Kuo

  38. Eliminating Left Recursion • expr  expr + term | term • Loop forever expr( ) • A  A  |   A R R  R |  • expr  expr + term | term • expr  term rest rest  + term rest | Yu-Chen Kuo

  39. Eliminating Left Recursion (Cont.) Yu-Chen Kuo

  40. A Translator for Simple Expressions Yu-Chen Kuo

  41. Adapting the Translation Scheme • Eliminate left recursion • A  A  | A  |  A   R R  R | R |  • expr  expr + term {print(‘+’)} • expr  term rest rest  + term {print(‘+’)} rest |- term {print(‘-’)} rest |  term 0{print(‘0’)}  term 9{print(‘9’)} Yu-Chen Kuo

  42. Adapting the Translation Scheme (Cont.) Yu-Chen Kuo

  43. Procedures for the Nonterminals expr, term, and rest Yu-Chen Kuo

  44. Optimizing the Translator • Replacing tail recursion by iteration rest ( ) { L: if (lookahead == ‘+’) { match(‘+’); term ( ); putchar(‘+’); goto L; } else if (lookahead == ‘-’) { match(‘-’); term ( ); putchar(‘-’); goto L; } else; } Yu-Chen Kuo

  45. Optimizing the Translator (Cont.) Yu-Chen Kuo

  46. The Complete Program Yu-Chen Kuo

  47. The Complete Program (Cont.) Yu-Chen Kuo

  48. The Complete Program (Cont.) Yu-Chen Kuo

  49. 2.6 Lexical Analysis • Removal of White Space and Comments • Blanks, tabs, newlines • Constants • Adding production to the grammar for expressions • Creating a token num for constants • 31 + 28 + 59 • <num, 31> <+, > < num, 28> <+, > < num, 59> • Recognizing Identifiers and Keywords • Keywords are reserved • begin /* keyword */ count = count + increment; /* id = id + id */ end Yu-Chen Kuo

  50. Interface to the Lexical Analyzer • A lexical analyzer reads characters, group into lexemes , and passes the tokens formed by the lexemes, together with their attribute values to the later stages of the compiler. Yu-Chen Kuo

More Related