510 likes | 518 Views
Learn about the meaning of syntax and semantics in programming languages, the role of the lexical analyzer, syntax analysis, semantic analysis, intermediate code generation, and code optimization.
E N D
Chapter 2: A Simple Syntax-directed Translator Salam R. Al-E’mari Adham University College
Languages are described by their syntaxes and semantics What is the meaning of Syntax? • It means the form or structure of the expressions, statements and program units. The Syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. What is the meaning of Semantics? It means the meaning of the expressions, statements, and program units.
character stream The Role of the Lexical Analyzer Lexical Analyzer Analysis part Front end of the compiler token stream Syntax Analyzer (Parser) syntax tree Semantic Analyzer Symbol Table syntax tree Phases of a compiler Intermediate Code Generator intermediate representation Machine-Independent Code Optimizer Synthesis part Back end of the compiler intermediate representation Code Generator target-machine code Machine-Dependent Code Optimizer target-machine code
Syntax Definition • Definition of Grammars • Derivations • Parse trees • Ambiguity • Associativity of operators • Precedence of operators
Definition of Grammars • Context-free grammar is a 4-tuple with • A set of tokens (terminal symbols) • A set of nonterminals • A set of productions • A designated start symbol • Lexemeis basic component during lexical analysis. A lexeme consists of related alphabet from the language. e.g. numeric literals, operators (+), and special words (begin). • Token (variable names) : of a language is a category of its lexemes, and a lexeme is an instance of a token.
Example: consider the following Java statement • index = 2 * count +17; • The lexemes and tokens of this statement are represented in the following table
Formal Methods of Describing Syntax The formal language that is used to describe the syntax of programming languages is called grammar. • BNF (Backus-Naur Form) and Context-Free Grammars (FG) BNF is widely accepted way to describe the syntax of a programming language. • Context-Free Grammars Regular expression and context-free grammar are useful in describing the syntax of a programming language. Regular expression describes how a token is made of alphabets, and context-free grammar determines how the tokens are put together as a valid sentence in the grammar.
BNF Fundamentals • Metalanguage is a language that is used to describe another language. BNF is a metalanguage for programming languages. • The BNF consists of rules (or productions). A rule has a left-hand side (LHS) as the abstraction, and a right-hand side (RHS) as the definition. As in java assignment statement the definition could be as follows: <assign> → <var> = <expression> • The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply nonterminals. • Lexemes and tokens of the rules are called terminal symbols, or simply terminals.
Example Grammar Context-free grammar for simple expressions: G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with productions P = list list+digit list list-digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
2. Derivation • Given a CF grammar we can determine the set of all strings (sequences of tokens) generated by the grammar using derivation • We begin with the start symbol • In each step, we replace one nonterminal in the current sentential form with one of the right-hand sides of a production for that nonterminal
Derivation for the Example Grammar listlist+digit list-digit+digit digit-digit+digit 9 -digit+digit 9 - 5 +digit 9 - 5 + 2 This is an example leftmost derivation, because we replacedthe leftmost nonterminal (underlined) in each step.Likewise, a rightmost derivation replaces the rightmostnonterminal in each step
Example : A derivation of a program in this language follows: <program> → begin <stmt_list> end <stmt_list> → <stmt> | <stmt>; <stmt_list> <stmt> → <var> = <expression> <var> → A | B | C <expression> → <var> + <var> | <var> - <var> | <var> <program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A= <expression>; <stmt_list> end => begin A = <var> + < var> ; <stmt_list> end => begin A = B+ < var> ; <stmt_list> end => begin A = B + C; <stmt_list> end => begin A = B + C; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = <var> end => begin A = B + C ; B = C end
Homework: a grammar for a Simple Assignment Statements <assign> → <id> = <expr> <id> → A | B | C | <expr> → <id> + <expr> | <id>*<expr> | (<expr>) | <id> A = B*(A+C) is generated by the leftmost derivation:
3. Parse Trees • The root of the tree is labeled by the start symbol • Each leaf of the tree is labeled by a terminal (=token) or • Each interior node is labeled by a nonterminal • If AX1 X2 … Xn is a production, then node A has immediate childrenX1, X2, …, Xn where Xi is a (non)terminal or ( denotes the empty string)
Parse Tree for the Example Grammar Parse tree of the string 9-5+2 using grammar G list list digit list digit digit The sequence ofleaf is called theyield of the parse tree 9 - 5 + 2
Parse Tree for the Example Grammar <assign> A parse tree for the simple statement: A = B * (A + C) <id> <expr> = <id> <expr> * A ( ) <expr> B <id> <expr> + <id> A C
4. Ambiguity Consider the following context-free grammar: G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string> with production P = string string+string | string-string | 0 | 1 | … | 9 This grammar is ambiguous, because more than one parse treerepresents the string 9-5+2
Ambiguity (cont’d) string string string string string string string string string string 9 - 5 + 2 9 - 5 + 2
Ambiguity Example : Two distinct parse trees (generated by the grammar on the next slide) for the same sentence, A = B +C * A <assign> <assign> <id> <expr> = <id> <expr> = <expr> <expr> A + <expr> <expr> A * <expr> <expr> * <expr> <expr> <id> + <id> <id> <id> <id> <id> B A A 19 C C B
5- Associativity of Operators Left-associative operators have left-recursive productions left left+term | term String a+b+c has the same meaning as (a+b)+c Right-associative operators have right-recursive productions right term=right | term String a=b=c has the same meaning as a=(b=c)
Associativity of Operators Operator associativity can also be indicated by a grammar. <expr> => <expr> + <expr> | const (ambiguous) <expr> => <expr> + const | const (unambiguous) <expr> <expr> const + <expr> const + const
6. Precedence of Operators Operators with higher precedence “bind more tightly” expr expr+term | termterm term *factor | factorfactor number | ( expr ) String 2+3*5 has the same meaning as 2+(3*5) expr expr term term term factor factor factor number number number + * 2 3 5
Example: If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> → <expr> - <term> | <term> <term> → <term> / const | const
Extended BNF - EBNF has the same expression power as BNF - The addition includes optional constructs (parts), repetition, and multiple choices, very much like regular expression. - Optional parts are placed in brackets ([ ]) • <if_stmt> → if (<expression>) <statement> [else <statement>] - Alternative parts of RHSs in parentheses and separate them with vertical bars • <term> → <term> (* | / | %) <factor> - Put repetitions (0 or more) are placed inside braces ({}) • <ident_list> → <identifier> {, <identifier>}
BNF <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> EBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}
Syntax-Directed Translation • Uses a CF grammar to specify the syntactic structure of the language • AND associates a set of attributes with the terminals and nonterminals of the grammar • AND associates with each production a set of semantic rules to compute values of attributes • A parse tree is traversed and semantic rules applied: after the tree traversal(s) are completed, the attribute values on the nonterminals contain the translated form of the input
Synthesized and Inherited Attributes • An attribute is said to be: • Synthesized if its value at a parse-tree node is determined from the attribute values at the children of the node. • Inherited if its value at a parse-tree node is determined by the parent (by enforcing the parent’s semantic rules)
Example Attribute Grammar String concat operator Production Semantic Rule expr expr1+termexpr expr1-termexpr termterm 0term 1…term 9 expr.t := expr1.t // term.t // “+”expr.t := expr1.t // term.t // “-”expr.t := term.tterm.t := “0”term.t := “1”… term.t := “9”
Example Annotated Parse Tree expr.t = “95-2+” expr.t = “95-” term.t = “2” expr.t = “9” term.t = “5” term.t = “9” 9 - 5 + 2
Depth-First Traversals procedure visit(n : node);begin for each child m of n, from left to right dovisit(m); evaluate semantic rules at node nend
Depth-First Traversals (Example) expr.t = “95-2+” expr.t = “95-” term.t = “2” expr.t = “9” term.t = “5” term.t = “9” 9 - 5 + 2 Note: all attributes areof the synthesized type
Translation Schemes • A translation scheme is a CF grammar embedded with semantic actions rest +term { print(“+”) } rest Embeddedsemantic action rest + term { print(“+”) } rest
Example Translation Scheme expr expr+termexpr expr-termexpr termterm 0term 1…term 9 { print(“+”) }{ print(“-”) }{ print(“0”) }{ print(“1”) }…{ print(“9”) }
Example Translation Scheme (cont’d) expr { print(“+”) } + expr term { print(“2”) } { print(“-”) } - 2 expr term { print(“5”) } 5 term { print(“9”) } 9 Translates 9-5+2 into postfix 95-2+
Parsing • Parsing = process of determining if a string of tokens can be generated by a grammar • For any CF grammar there is a parser that takes at most O(n3) time to parse a string of n tokens • Linear algorithms suffice for parsing programming language source code • Top-down parsing“constructs” a parse tree from root to leaves • Bottom-up parsing“constructs” a parse tree from leaves to root
Predictive Parsing • Recursive descent parsing is a top-down parsing method • Each nonterminal has one (recursive) procedure tha tis responsible for parsing the nonterminal’s syntactic category of input tokens • When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information • Predictive parsing is a special form of recursive descent parsing where we use one lookahead token to unambiguously determine the parse operations
Example Predictive Parser (Grammar) type simple|^ id|array [ simple ] of typesimple integer|char| num dotdot num
Example Predictive Parser (Program Code) procedure match(t : token);begin if lookahead = t thenlookahead := nexttoken()else error()end;procedure type();begin if lookahead in { ‘integer’, ‘char’, ‘num’ } thensimple()else if lookahead = ‘^’thenmatch(‘^’); match(id)else if lookahead = ‘array’thenmatch(‘array’); match(‘[‘); simple();match(‘]’); match(‘of’); type()else error()end; procedure simple();begin if lookahead = ‘integer’thenmatch(‘integer’)else if lookahead = ‘char’thenmatch(‘char’)else if lookahead = ‘num’thenmatch(‘num’);match(‘dotdot’);match(‘num’)else error()end;
Example Predictive Parser (Execution Step 1) type() Check lookaheadand call match match(‘array’) Input: array [ num dotdot num ] of integer lookahead
Example Predictive Parser (Execution Step 2) type() match(‘array’) match(‘[’) Input: array [ num dotdot num ] of integer lookahead
Example Predictive Parser (Execution Step 3) type() match(‘array’) match(‘[’) simple() match(‘num’) Input: array [ num dotdot num ] of integer lookahead
Example Predictive Parser (Execution Step 4) type() match(‘array’) match(‘[’) simple() match(‘num’) match(‘dotdot’) Input: array [ num dotdot num ] of integer lookahead
Example Predictive Parser (Execution Step 5) type() match(‘array’) match(‘[’) simple() match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of integer lookahead
Example Predictive Parser (Execution Step 6) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of integer lookahead
Example Predictive Parser (Execution Step 7) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of integer lookahead
Example Predictive Parser (Execution Step 8) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type() match(‘num’) match(‘dotdot’) match(‘num’) simple() match(‘integer’) Input: array [ num dotdot num ] of integer lookahead
Left Factoring When more than one production for nonterminal A startswith the same symbols, the FIRST sets are not disjoint stmt if expr then stmtendif | if expr then stmt else stmt endif We can use left factoring to fix the problem stmt if expr then stmt opt_elseopt_else else stmt endif|endif
Left Recursion When a production for nonterminal A starts with aself reference then a predictive parser loops forever A A | | We can eliminate left recursive productions by systematicallyrewriting the grammar using right recursive productions A R| RR R|
A Translator for Simple Expressions expr expr+termexpr expr-termexpr termterm 0term 1…term 9 { print(“+”) }{ print(“-”) }{ print(“0”) }{ print(“1”) }…{ print(“9”) } After left recursion elimination: expr term rest rest +term { print(“+”) } rest | -term { print(“-”) } rest | term 0 { print(“0”) }term 1 { print(“1”) }…term 9 { print(“9”) }