280 likes | 315 Views
Learn about the process of syntactic analysis in programming, including parsing, recognizing source structure, and identifying syntax errors. Explore recursive descent parsing and the linkage between syntax and semantics.
E N D
Lecture 5: Syntactic Analysis (Revised based on the Tucker’s slides) CS485, Lecture 5 Syntactic Analysis
Compilation Process ID EQ ID PLUS INT TIMES ID Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens z = x + 2 * y Machine Code Lexical Analyzer Syntactic Analyzer Code Optimizer Code Generator Semantic Analyzer Find syntax errors Find semantic errors CS485, Lecture 5 Syntactic Analysis
Syntactic Analysis • Purpose is to recognize source structure • Phase also known as: parser • Input: tokens • Output: parse tree or abstract syntax tree • A recursive descent parser is one in which each nonterminal in the grammar is converted to a function which recognizes input derivable from the nonterminal. CS485, Lecture 5 Syntactic Analysis
Program Structure consists of: • Expressions: x + 2 * y • Assignment Statement: z = x + 2 * y • Loop Statements: • while (i < n) a[i++] = 0; • Function definitions • Declarations: int i; CS485, Lecture 5 Syntactic Analysis
SiEtSS’ | a S’ eS | e E b How to detect Syntax Error? • What the following strings are correct? • ibta • ibtb CS485, Lecture 5 Syntactic Analysis
First Set • First(X) is set of leftmost terminal symbols derivable from X First(X) = { a T | X →* a w, w (N T)* } • Since there are no production rules for terminal symbols: First(a) = a, a T CS485, Lecture 5 Syntactic Analysis
SiEtSS’ | a S’ eS | e E b First(X) is the set of terminals that begins the strings derived from X S S’ i E t S …… i b t b a CS485, Lecture 5 Syntactic Analysis
More on First • Sometimes we need to consider First (ω), where ω = X1X2…Xi…Xn, 1<=i<=n • First(X1X2…Xi…Xn )=First(X1) U…U First(Xi) where X1…Xi-1 are nullable and Xi is not nullable. • A is nullable if A =>* CS485, Lecture 5 Syntactic Analysis
Recursive Descent Parsing • Method/function for each nonterminal • Recognize longest sequence of tokens derivable from the nonterminal • Need an algorithm for converting productions to code • Based on EBNF CS485, Lecture 5 Syntactic Analysis
T(EBNF) = Code: A → w • 1 If w is nonterminal, call it. • 2 If w is terminal, match it against given token. • 3 If w is { w' }: while (token in First(w')) T(w') • 4 If w is: w1 | ... | wn, switch (token) { case First(w1): T(w1); break; ... case First(wn): T(wn); break; CS485, Lecture 5 Syntactic Analysis
5 Switch (cont.): If some wi is empty, use: default: break; Otherwise default: error(token); • 6 If w = [ w' ], rewrite as ( | w' ) and use rule 4. • 7 If w = X1 ... Xn, T(w) = • T(X1); ... T(Xn); CS485, Lecture 5 Syntactic Analysis
Augmentation Production • Gets first token • Calls method corresponding to original start symbol • Checks to see if final token is end token E.g.: end of file token CS485, Lecture 5 Syntactic Analysis
private void match (int t) { if (token.type() == t) token = lexer.next(); else error(t); } CS485, Lecture 5 Syntactic Analysis
private void error(int tok) { System.err.println( “Syntax error: expecting” + tok + “; saw: ” + token); System.exit(1); } CS485, Lecture 5 Syntactic Analysis
private void assignment( ) { // Assignment → Identifier= // Expression ; match(Token.Identifier); match(Token.Assign); expression( ); match(Token.Semicolon); } CS485, Lecture 5 Syntactic Analysis
private void expression( ) { // Expression → Term { AddOp Term // } term( ); while (isAddOp()) { token = lexer.next( ); term( ); } } CS485, Lecture 5 Syntactic Analysis
Linking Syntax and Semantics • Output: parse tree is inefficient/complicated • many nonterminals convey no information • Shape of parse tree that is important CS485, Lecture 5 Syntactic Analysis
Discard from Parse Tree • Separator/punctuation terminal symbols • All trivial root nonterminals • Replace remaining nonterminal with leaf terminal CS485, Lecture 5 Syntactic Analysis
Assignment→ IDENTIFIEREQExpression ; { Assignement } • Expression → Term{AddOpTerm} • AddOp → PLUS | MINUS • Term → Factor{MulOpFactor} • MulOp → TIMES | DIVISON • Factor → [UnaryOp]Primary • UnaryOp → MINUS | NOT • Primary → IDENTIFIER | LITERAL | LEFTPARExpressionRIGHTPAR |NUMBER CS485, Lecture 5 Syntactic Analysis
Abstract Syntax Example • Assignment = Variabletarget; Expressionsource • Expression = Variable | Value | Binary | Unary • Binary = Operatorop; Expression term1, term2 • Unary = Operatorop; Expressionterm • Variable = Stringid • Value = Integervalue • Operator = + | - | * | / | ! CS485, Lecture 5 Syntactic Analysis
UML Class Diagram • UML Class Diagram is an important notation to represent the structure of an OO program Class Name Attributes Operations CS485, Lecture 5 Syntactic Analysis
The Corresponding C# program class Flight { int flightNumber; Date departureTime; Minutes flightDuration; public Date delayFlight(int numberOfMinutes); public Date getArrivalTime(); } CS485, Lecture 5 Syntactic Analysis
UML Relations • Generalization: inheritance • An association connects two UML Classes Person Employee Employee Department CS485, Lecture 5 Syntactic Analysis
UML CD for AST of Assignment Assignment source target term2 Expression Variable term term1 String id Value Unary Binary op op Operator CS485, Lecture 5 Syntactic Analysis
abstract class Expression { } class Binary extends Expression { Operator op; Expression term1, term2; } class Unary extends Expression { Operator op; Expression term; } CS485, Lecture 5 Syntactic Analysis
Modify T(A → w) to Return AST • Make A the return type of function, as defined by abstract syntax. • If w is a nonterminal, assign returned value. • If w is a non-punctuation terminal, capture its value. • Add a return statement that constructs the appropriate object. CS485, Lecture 5 Syntactic Analysis
private Assignment assignment( ) { // Assignment → Identifier = Expression ; Variable target = match(Token.Identifier); match(Token.Assign); Expression source = expression( ); match(Token.Semicolon); return new Assignment(target, source); } CS485, Lecture 5 Syntactic Analysis
private String match (int t) { String value = Token.value( ); if (token.type( ) == t) token = lexer.next( ); else error(t); return value; } CS485, Lecture 5 Syntactic Analysis