1 / 28

Lecture 5: Syntactic Analysis

Lecture 5: Syntactic Analysis. (Revised based on the Tucker’s slides). Compilation Process. ID EQ ID PLUS INT TIMES ID. Intermediate Code (IC). Intermediate Code (IC). Abstract Syntax. Tokens. z = x + 2 * y. Machine Code. Lexical Analyzer. Syntactic Analyzer. Code Optimizer.

mhardin
Download Presentation

Lecture 5: Syntactic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5: Syntactic Analysis (Revised based on the Tucker’s slides) CS485, Lecture 5 Syntactic Analysis

  2. Compilation Process ID EQ ID PLUS INT TIMES ID Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens z = x + 2 * y Machine Code Lexical Analyzer Syntactic Analyzer Code Optimizer Code Generator Semantic Analyzer Find syntax errors Find semantic errors CS485, Lecture 5 Syntactic Analysis

  3. Syntactic Analysis • Purpose is to recognize source structure • Phase also known as: parser • Input: tokens • Output: parse tree or abstract syntax tree • A recursive descent parser is one in which each nonterminal in the grammar is converted to a function which recognizes input derivable from the nonterminal. CS485, Lecture 5 Syntactic Analysis

  4. Program Structure consists of: • Expressions: x + 2 * y • Assignment Statement: z = x + 2 * y • Loop Statements: • while (i < n) a[i++] = 0; • Function definitions • Declarations: int i; CS485, Lecture 5 Syntactic Analysis

  5. SiEtSS’ | a S’ eS | e E b How to detect Syntax Error? • What the following strings are correct? • ibta • ibtb CS485, Lecture 5 Syntactic Analysis

  6. First Set • First(X) is set of leftmost terminal symbols derivable from X First(X) = { a  T | X →* a w, w  (N  T)* } • Since there are no production rules for terminal symbols: First(a) = a, a  T CS485, Lecture 5 Syntactic Analysis

  7. SiEtSS’ | a S’ eS | e E b First(X) is the set of terminals that begins the strings derived from X S S’ i E t S …… i b t b a CS485, Lecture 5 Syntactic Analysis

  8. More on First • Sometimes we need to consider First (ω), where ω = X1X2…Xi…Xn, 1<=i<=n • First(X1X2…Xi…Xn )=First(X1) U…U First(Xi) where X1…Xi-1 are nullable and Xi is not nullable. • A is nullable if A =>* CS485, Lecture 5 Syntactic Analysis

  9. Recursive Descent Parsing • Method/function for each nonterminal • Recognize longest sequence of tokens derivable from the nonterminal • Need an algorithm for converting productions to code • Based on EBNF CS485, Lecture 5 Syntactic Analysis

  10. T(EBNF) = Code: A → w • 1 If w is nonterminal, call it. • 2 If w is terminal, match it against given token. • 3 If w is { w' }: while (token in First(w')) T(w') • 4 If w is: w1 | ... | wn, switch (token) { case First(w1): T(w1); break; ... case First(wn): T(wn); break; CS485, Lecture 5 Syntactic Analysis

  11. 5 Switch (cont.): If some wi is empty, use: default: break; Otherwise default: error(token); • 6 If w = [ w' ], rewrite as ( | w' ) and use rule 4. • 7 If w = X1 ... Xn, T(w) = • T(X1); ... T(Xn); CS485, Lecture 5 Syntactic Analysis

  12. Augmentation Production • Gets first token • Calls method corresponding to original start symbol • Checks to see if final token is end token E.g.: end of file token CS485, Lecture 5 Syntactic Analysis

  13. private void match (int t) { if (token.type() == t) token = lexer.next(); else error(t); } CS485, Lecture 5 Syntactic Analysis

  14. private void error(int tok) { System.err.println( “Syntax error: expecting” + tok + “; saw: ” + token); System.exit(1); } CS485, Lecture 5 Syntactic Analysis

  15. private void assignment( ) { // Assignment → Identifier= // Expression ; match(Token.Identifier); match(Token.Assign); expression( ); match(Token.Semicolon); } CS485, Lecture 5 Syntactic Analysis

  16. private void expression( ) { // Expression → Term { AddOp Term // } term( ); while (isAddOp()) { token = lexer.next( ); term( ); } } CS485, Lecture 5 Syntactic Analysis

  17. Linking Syntax and Semantics • Output: parse tree is inefficient/complicated • many nonterminals convey no information • Shape of parse tree that is important CS485, Lecture 5 Syntactic Analysis

  18. Discard from Parse Tree • Separator/punctuation terminal symbols • All trivial root nonterminals • Replace remaining nonterminal with leaf terminal CS485, Lecture 5 Syntactic Analysis

  19. Assignment→ IDENTIFIEREQExpression ; { Assignement } • Expression → Term{AddOpTerm} • AddOp → PLUS | MINUS • Term → Factor{MulOpFactor} • MulOp → TIMES | DIVISON • Factor → [UnaryOp]Primary • UnaryOp → MINUS | NOT • Primary → IDENTIFIER | LITERAL | LEFTPARExpressionRIGHTPAR |NUMBER CS485, Lecture 5 Syntactic Analysis

  20. Abstract Syntax Example • Assignment = Variabletarget; Expressionsource • Expression = Variable | Value | Binary | Unary • Binary = Operatorop; Expression term1, term2 • Unary = Operatorop; Expressionterm • Variable = Stringid • Value = Integervalue • Operator = + | - | * | / | ! CS485, Lecture 5 Syntactic Analysis

  21. UML Class Diagram • UML Class Diagram is an important notation to represent the structure of an OO program Class Name Attributes Operations CS485, Lecture 5 Syntactic Analysis

  22. The Corresponding C# program class Flight { int flightNumber; Date departureTime; Minutes flightDuration; public Date delayFlight(int numberOfMinutes); public Date getArrivalTime(); } CS485, Lecture 5 Syntactic Analysis

  23. UML Relations • Generalization: inheritance • An association connects two UML Classes Person Employee Employee Department CS485, Lecture 5 Syntactic Analysis

  24. UML CD for AST of Assignment Assignment source target term2 Expression Variable term term1 String id Value Unary Binary op op Operator CS485, Lecture 5 Syntactic Analysis

  25. abstract class Expression { } class Binary extends Expression { Operator op; Expression term1, term2; } class Unary extends Expression { Operator op; Expression term; } CS485, Lecture 5 Syntactic Analysis

  26. Modify T(A → w) to Return AST • Make A the return type of function, as defined by abstract syntax. • If w is a nonterminal, assign returned value. • If w is a non-punctuation terminal, capture its value. • Add a return statement that constructs the appropriate object. CS485, Lecture 5 Syntactic Analysis

  27. private Assignment assignment( ) { // Assignment → Identifier = Expression ; Variable target = match(Token.Identifier); match(Token.Assign); Expression source = expression( ); match(Token.Semicolon); return new Assignment(target, source); } CS485, Lecture 5 Syntactic Analysis

  28. private String match (int t) { String value = Token.value( ); if (token.type( ) == t) token = lexer.next( ); else error(t); return value; } CS485, Lecture 5 Syntactic Analysis

More Related