280 likes | 313 Views
Lecture 5: Syntactic Analysis. (Revised based on the Tucker’s slides). Compilation Process. ID EQ ID PLUS INT TIMES ID. Intermediate Code (IC). Intermediate Code (IC). Abstract Syntax. Tokens. z = x + 2 * y. Machine Code. Lexical Analyzer. Syntactic Analyzer. Code Optimizer.
E N D
Lecture 5: Syntactic Analysis (Revised based on the Tucker’s slides) CS485, Lecture 5 Syntactic Analysis
Compilation Process ID EQ ID PLUS INT TIMES ID Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens z = x + 2 * y Machine Code Lexical Analyzer Syntactic Analyzer Code Optimizer Code Generator Semantic Analyzer Find syntax errors Find semantic errors CS485, Lecture 5 Syntactic Analysis
Syntactic Analysis • Purpose is to recognize source structure • Phase also known as: parser • Input: tokens • Output: parse tree or abstract syntax tree • A recursive descent parser is one in which each nonterminal in the grammar is converted to a function which recognizes input derivable from the nonterminal. CS485, Lecture 5 Syntactic Analysis
Program Structure consists of: • Expressions: x + 2 * y • Assignment Statement: z = x + 2 * y • Loop Statements: • while (i < n) a[i++] = 0; • Function definitions • Declarations: int i; CS485, Lecture 5 Syntactic Analysis
SiEtSS’ | a S’ eS | e E b How to detect Syntax Error? • What the following strings are correct? • ibta • ibtb CS485, Lecture 5 Syntactic Analysis
First Set • First(X) is set of leftmost terminal symbols derivable from X First(X) = { a T | X →* a w, w (N T)* } • Since there are no production rules for terminal symbols: First(a) = a, a T CS485, Lecture 5 Syntactic Analysis
SiEtSS’ | a S’ eS | e E b First(X) is the set of terminals that begins the strings derived from X S S’ i E t S …… i b t b a CS485, Lecture 5 Syntactic Analysis
More on First • Sometimes we need to consider First (ω), where ω = X1X2…Xi…Xn, 1<=i<=n • First(X1X2…Xi…Xn )=First(X1) U…U First(Xi) where X1…Xi-1 are nullable and Xi is not nullable. • A is nullable if A =>* CS485, Lecture 5 Syntactic Analysis
Recursive Descent Parsing • Method/function for each nonterminal • Recognize longest sequence of tokens derivable from the nonterminal • Need an algorithm for converting productions to code • Based on EBNF CS485, Lecture 5 Syntactic Analysis
T(EBNF) = Code: A → w • 1 If w is nonterminal, call it. • 2 If w is terminal, match it against given token. • 3 If w is { w' }: while (token in First(w')) T(w') • 4 If w is: w1 | ... | wn, switch (token) { case First(w1): T(w1); break; ... case First(wn): T(wn); break; CS485, Lecture 5 Syntactic Analysis
5 Switch (cont.): If some wi is empty, use: default: break; Otherwise default: error(token); • 6 If w = [ w' ], rewrite as ( | w' ) and use rule 4. • 7 If w = X1 ... Xn, T(w) = • T(X1); ... T(Xn); CS485, Lecture 5 Syntactic Analysis
Augmentation Production • Gets first token • Calls method corresponding to original start symbol • Checks to see if final token is end token E.g.: end of file token CS485, Lecture 5 Syntactic Analysis
private void match (int t) { if (token.type() == t) token = lexer.next(); else error(t); } CS485, Lecture 5 Syntactic Analysis
private void error(int tok) { System.err.println( “Syntax error: expecting” + tok + “; saw: ” + token); System.exit(1); } CS485, Lecture 5 Syntactic Analysis
private void assignment( ) { // Assignment → Identifier= // Expression ; match(Token.Identifier); match(Token.Assign); expression( ); match(Token.Semicolon); } CS485, Lecture 5 Syntactic Analysis
private void expression( ) { // Expression → Term { AddOp Term // } term( ); while (isAddOp()) { token = lexer.next( ); term( ); } } CS485, Lecture 5 Syntactic Analysis
Linking Syntax and Semantics • Output: parse tree is inefficient/complicated • many nonterminals convey no information • Shape of parse tree that is important CS485, Lecture 5 Syntactic Analysis
Discard from Parse Tree • Separator/punctuation terminal symbols • All trivial root nonterminals • Replace remaining nonterminal with leaf terminal CS485, Lecture 5 Syntactic Analysis
Assignment→ IDENTIFIEREQExpression ; { Assignement } • Expression → Term{AddOpTerm} • AddOp → PLUS | MINUS • Term → Factor{MulOpFactor} • MulOp → TIMES | DIVISON • Factor → [UnaryOp]Primary • UnaryOp → MINUS | NOT • Primary → IDENTIFIER | LITERAL | LEFTPARExpressionRIGHTPAR |NUMBER CS485, Lecture 5 Syntactic Analysis
Abstract Syntax Example • Assignment = Variabletarget; Expressionsource • Expression = Variable | Value | Binary | Unary • Binary = Operatorop; Expression term1, term2 • Unary = Operatorop; Expressionterm • Variable = Stringid • Value = Integervalue • Operator = + | - | * | / | ! CS485, Lecture 5 Syntactic Analysis
UML Class Diagram • UML Class Diagram is an important notation to represent the structure of an OO program Class Name Attributes Operations CS485, Lecture 5 Syntactic Analysis
The Corresponding C# program class Flight { int flightNumber; Date departureTime; Minutes flightDuration; public Date delayFlight(int numberOfMinutes); public Date getArrivalTime(); } CS485, Lecture 5 Syntactic Analysis
UML Relations • Generalization: inheritance • An association connects two UML Classes Person Employee Employee Department CS485, Lecture 5 Syntactic Analysis
UML CD for AST of Assignment Assignment source target term2 Expression Variable term term1 String id Value Unary Binary op op Operator CS485, Lecture 5 Syntactic Analysis
abstract class Expression { } class Binary extends Expression { Operator op; Expression term1, term2; } class Unary extends Expression { Operator op; Expression term; } CS485, Lecture 5 Syntactic Analysis
Modify T(A → w) to Return AST • Make A the return type of function, as defined by abstract syntax. • If w is a nonterminal, assign returned value. • If w is a non-punctuation terminal, capture its value. • Add a return statement that constructs the appropriate object. CS485, Lecture 5 Syntactic Analysis
private Assignment assignment( ) { // Assignment → Identifier = Expression ; Variable target = match(Token.Identifier); match(Token.Assign); Expression source = expression( ); match(Token.Semicolon); return new Assignment(target, source); } CS485, Lecture 5 Syntactic Analysis
private String match (int t) { String value = Token.value( ); if (token.type( ) == t) token = lexer.next( ); else error(t); return value; } CS485, Lecture 5 Syntactic Analysis