Syntax Definition in Programming Languages: Context-Free Grammars

Chapter 2 A Simple One-Pass Compiler Yu-Chen Kuo

2.1 Overview • Programming language: • What its program look like (Syntax : context-free grammars) • What its program mean (Semantics : more difficult) Yu-Chen Kuo

2.2 Syntax Definition • Context-free grammar • Grammar : hierarchical structure • stmtif (expr) stmtelsestmt • production • token: if, (, else • nonterminal: expr, stmt Yu-Chen Kuo

Context-free Grammar • A set of tokens (terminals) • Digits • Sign (+, -, <, =) • if, while • A set of nonterminals • A set of productions • nonterminal  ternimal/nonterminal • left side  right side • First nonterminal symbol: start symbol Yu-Chen Kuo

Example 2.1: Grammars of expression ‘9-5+2’ • Example 2.1: grammars of expression ‘9-5+2’ list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9 • list  list+digit | list-digit | digit • nonterminal: list (start symbol), digit • terminal (token): 0| 1| 2| 3| 4| 5| 6| 7| 8| 9 Yu-Chen Kuo

Example 2.1: Grammars of expression ‘9-5+2’ • Token strings are derived from the start symbol and repeatedly replacing a nonterminal by the right side of a production • Empty string:  • All possible token strings form the language defined by the grammar Yu-Chen Kuo

Example 2.2: Parse Tree • Show how the start symbol derives a string list  list + digit list  list – digit list  digit digit  0| 1| 2| 3| 4| 5| 6| 7| 8| 9 Yu-Chen Kuo

Parse Trees A X Y Z • Root is labeled by start symbol • Each leaf is labeled by a token or  • Each interior is labeled by a nonterminal • If A is the nonterminal node and X1, X2,..Xn are the labels of children of that node from left to right, then A  X1, X2,..Xn , is a production Yu-Chen Kuo

Example 2.3: Pascal begin-end blocks block begin opt_stmts end opt_stmts  stmt_list | stmt_list  stmt_list ; stmt | stmt stmtif (expr) stmtelsestmt | assignment stmt Yu-Chen Kuo

Ambiguity of A Grammar • A grammar is said to be ambiguous if it can have more than one parser tree generating a given string. Yu-Chen Kuo

Ambiguity of A Grammar string  string+string | string-string string  0|1|2|3|4|5|6|7|8|9 • Two expressions (9-5)+2 and 9-(5+2) Yu-Chen Kuo

Associativity of Operators • Left Associative: 9+5-2  (9+5)-2 • +, -, *, / • Parse tree grows down towards the left • Right Associative: a=b=c  a=(b=c) • Parse tree grows down towards the right Yu-Chen Kuo

Associativity of Operators right  letter = right | letter letter a|b|c|…|z Yu-Chen Kuo

Precedence of Operators • 9+5*2  9+(5*2) • * , / has higher precedence than +, - • *, /, +, - are all left associative • term for *, / • term  term * factor | term / factor | factor • expr for +,- • expr  expr + factor | expr – factor | factor • factor digit |(expr) Yu-Chen Kuo

2.3 Syntax-Directed Translation • The syntax-directed definition and translation schema are two formalisms for specifying translations for programming language • A syntax-directed definition uses a context-grammar to specify the syntactic structure • With each grammar symbol X, it associates a set of attributes, and with each production, a set of semantic rules for computing value of the attributes X.a of the symbols • The grammar and the set of semantic rules constitute the syntax-directed definition Yu-Chen Kuo

2.3 Syntax-Directed Translation • A syntax-directed definition for translating expressions consisting of digits separated by plus or minus into postfix notation Yu-Chen Kuo

Postfix Notation • If E is a variable, then postfix(E)=E • If E is an expression of form E1opE2, then the postfix(E)= E1 E2  op, where E1  = postfix(E1)= and E2  = postfix(E2) • If E is an expression of the form (E1), then postfix(E)= postfix (E1) • postfix(9-5+2)=95-2+ Yu-Chen Kuo

Postfix Notation Yu-Chen Kuo

Robot’s position Yu-Chen Kuo

Depth-First Traversals Yu-Chen Kuo

Translation Schemes • A translation scheme is a context-free grammar in which semantic actions are embedded within the right sides of productions • A translation scheme is like a syntax-directed definition, except the order of evaluation of the semantic rules is explicitly shown Yu-Chen Kuo

Translation Schemes Yu-Chen Kuo

2.4 Parsing • Parsing is the process of determining if a string of tokens can be generated by a grammar. • For any context-free grammar, a parser will takes at most O(n3) time to parse a string of n tokens, too expensive. • Given a programming language, we can generally construct a grammar that can be parsed in linear time ( make a single left-to-right scan, looking ahead one token at a time) Yu-Chen Kuo

2.4 Parsing • Top-down parser: parser tree construction starts at the root and proceeds towards the leaves • Bottom-up parser : parser tree construction starts at the leaves and proceeds towards the root. (most class of grammars) Yu-Chen Kuo

Top-Down Parsing • The construction of parser tree is done by started with the root, labeled with the starting nonterminal, and repeatedly performing the following two steps. • At node n, labeled with A, select one of production for A and construct children at n for the symbols on the right side of production. • Find the next node at which a subtree is to be constructed. Yu-Chen Kuo

Example type  simple |id|array[simple]of type simple  integer |char|num dotdot num e.x.; array[num dotdot num]ofinteger Yu-Chen Kuo

Example (Cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num e.x.; array[num dotdot num]ofinteger Yu-Chen Kuo

Example (Cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num Yu-Chen Kuo

Predictive Parsing • Recursive-descent parsing is a top-down parsing Yu-Chen Kuo

Predictive Parsing (Cont.) type  simple |id|array[simple]of type simple  integer |char|num dotdot num Yu-Chen Kuo

Predictive Parsing (Cont.) • Use lookahead symbol and first symbol (FIRST)of a production to unambiguously determine the procedure selected for each nonterminal. • FIRST (): the set of tokens that appear as the first symbols of one or more strings generated from  • FIRST (simple) = { integer, char, num} • FIRST (id ) = {} • FIRST (array [ simple] oftype) = {array} • A   | , then FIRST ()  FIRST () in predictive parsing Yu-Chen Kuo

When to Use -Production stmt begin opt_stmts end opt_stmts  stmt_list | • While parsing opt_stmts, if lookahead symbol is not in FIRST(stmt_list), then –production is used, lookahead symbol is end; otherwise, error Yu-Chen Kuo

Designing a Predictive Parser • Consisting of a procedure for every nonterminal • Each procedure does two things. • Decide which production to use by looking at the lookahead symbol. The production with right side  is used if the lookahead symbol is in FIRST(). If the lookahead symbol is not in the FIRST set of any other right hand side, a production with  on the right side is used. • The procedure uses a production by mimicking the right side. A nonterminal results in a procedure call for the nonterminal. A token matching the lookahead symbol results in reading the next input token. Yu-Chen Kuo

Eliminating Left Recursion (Cont.) Yu-Chen Kuo

A Translator for Simple Expressions Yu-Chen Kuo

Adapting the Translation Scheme • Eliminate left recursion • A  A  | A  |  A   R R  R | R |  • expr  expr + term {print(‘+’)} • expr  term rest rest  + term {print(‘+’)} rest |- term {print(‘-’)} rest |  term 0{print(‘0’)}  term 9{print(‘9’)} Yu-Chen Kuo

Adapting the Translation Scheme (Cont.) Yu-Chen Kuo

Procedures for the Nonterminals expr, term, and rest Yu-Chen Kuo

Optimizing the Translator • Replacing tail recursion by iteration rest ( ) { L: if (lookahead == ‘+’) { match(‘+’); term ( ); putchar(‘+’); goto L; } else if (lookahead == ‘-’) { match(‘-’); term ( ); putchar(‘-’); goto L; } else; } Yu-Chen Kuo

Optimizing the Translator (Cont.) Yu-Chen Kuo

The Complete Program Yu-Chen Kuo

The Complete Program (Cont.) Yu-Chen Kuo

2.6 Lexical Analysis • Removal of White Space and Comments • Blanks, tabs, newlines • Constants • Adding production to the grammar for expressions • Creating a token num for constants • 31 + 28 + 59 • <num, 31> <+, > < num, 28> <+, > < num, 59> • Recognizing Identifiers and Keywords • Keywords are reserved • begin /* keyword */ count = count + increment; /* id = id + id */ end Yu-Chen Kuo

Interface to the Lexical Analyzer • A lexical analyzer reads characters, group into lexemes , and passes the tokens formed by the lexemes, together with their attribute values to the later stages of the compiler. Yu-Chen Kuo

Syntax Definition in Programming Languages: Context-Free Grammars

Syntax Definition in Programming Languages: Context-Free Grammars

Presentation Transcript

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2:

Chapter 2

chapter 2

chapter 2

Chapter 2-2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2