310 likes | 475 Views
4. Phase 2 : Syntax Analysis Part II. The unit directory. What you must do. Example run. syner.cxx The lookahead convention. Error detection and recovery. Symbol table lookup. Parsing declarations. Parsing statements. Parsing expressions. The Unit Directory.
E N D
4. Phase 2 : Syntax Analysis Part II • The unit directory. • What you must do. • Example run. • syner.cxx • The lookahead convention. • Error detection and recovery. • Symbol table lookup. • Parsing declarations. • Parsing statements. • Parsing expressions.
The Unit Directory • The unit directory for phase 2 is : /usr/users/staff/aosc/cm049icp/phase2 • Among other things, it contains the following : • synprog.cxx : the test bed program for phase 2. • syner.template : A template file for your phase 2 program. • syner.h : header file for phase2. • AST : Abstract syntax tree data structure. • SymTab : Symbol table data structure. • syntax : Array of syntax error messages. • statics : Array of static semantic error messages. • type : Array of type error messages. • makefile : The makefile for phase 2. • syner : An executable for my phase 2 program.
The Unit Directory II • tests/test*.c-- : Testing programs for the demo. • printers.cxx : Printing subprograms. void printAST(AST *ast, // AST int &line, // Line number int indent) // Indentation void printST(SymTab *st) // Symbol Table • utilities.cxx : General utilities void error(int number, // Error no. LexToken lexToken) // Token bool lookup(LexToken lexToken, // Token SymTab *st, // Sym. Tab. SymTab *&match) // Entry
You must write this subprogram synprog.cxx • The test bed program is as follows : #include “.../phase2/syner.h” void main() { SymTab *st = NULL ; AST *ast = NULL ; int line = 1 ; int indent = 0 ; int label = 0 ; synAnal(st, ast, label) ; printAST(ast, line, indent) ; printST(st) ; } • printAST : Pretty prints the AST. • printST : Pretty prints the symbol table. • They ensure that your program’s output format is the same as mine.
What You Must Do • Your implementation of synAnal must be in a file called syner.cxx in your directory. • Take a copy of makefile and syner.template. • Print out a copy of syner.h. • Print out a copy of utilities.cxx. • Print out a copy of printers.cxx. • Useful commands : testphase2, demophase2. • Shell scripts for running the phase 2 demo. • They rely on your synAnal using the printing subprograms from printers.cxx. • You must #includeprinters.cxx and utilities.cxx into your syner.cxx file.
Example Run • Assuming we have the following valid C-- program in a file called prog.c-- : int i = 10 ; { while (i > 0) { cout << i ; } ; } • Make and run syner : jaguar> make syner jaguar> syner < prog.c-- 1 -- { 2 -- while (i > 0) /* Start label : 0 */ 3 -- { 4 -- cout << i ; 5 -- } /* End label : 1 */ ; 6 -- } Name Type Constant Initial Value i int False 10 jaguar> Output format may change. Make sure printers.cxx is #included in your syner.cxx.
Example Run II • Assuming we have the following invalid C-- program in a file called prog.c-- : bool b = true ; { cout >> b ; } • Run syner : jaguar> syner < prog.c-- Type error 218. Attempt to input to a non-integer variable. IDENTIFIER : b jaguar>
Top Level Structure For syner.cxx • Contents of syner.template : #include <iostream.h> #include <iomanip.h> #include <ctype.h> #include <stddef.h> #include <stdlib.h> #include “.../lib/cstring.h” #include “.../phase2/syner.h” #include “.../phase1/lexer.h” #include “.../phase2/printers.cxx #include “.../phase2/utilities.cxx void synDec(SymTab *&st, LexToken &lexToken) { cout << “synDec\n” ; } void synDeclarations(SymTab *&st, LexToken &lexToken) { cout << “synDeclarations\n” ; }
Top Level Structure For syner.cxx II // Forward declaration. void synExpression(SymTab *st, Expression *&expr, LexToken &lexToken, DataType &type) ; void synFactor(SymTab *st, Factor *&fact, LexToken &lexToken) { cout << “synFactor\n” ; } void synTerm(SymTab *st, Term *&term, LexToken &lexToken, DataType &type) { cout << “synTerm\n” ; } void synBasicExp(SymTab *st, BasicExp *&bexp, LexToken &lexToken) DataType &type) { cout << “synBasicExp\n” ; }
Top Level Structure For syner.cxx III void synExpression(SymTab *st, Expression *&expr, LexToken &lexToken, DataType &type) { cout << “synExpression\n” } //Forward declaration. void synStatements(SymTab *st, AST *&ast, int &label, LexToken &lexToken) ; void synIfSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { cout << “synIfSt\n” ; } void synWhileSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { cout << “synWhileSt\n” ; }
Top Level Structure For syner.cxx IV void synCinSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { cout << “synCinSt\n” ; } void synCoutSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { cout << “synCoutSt\n” ; } void synAssignSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { cout << synAssignSt\n” ; }
Top Level Structure For syner.cxx V void synStatement(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { cout << “synStatement\n” ; } void synStatements(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { cout << “synStatements\n” ; }
Top Level Structure For syner.cxx VI void synAnal(SymTab *&st, AST *&ast, int &label) { LexToken lexToken ; // Current token skipWhiteComments() ; lexAnal(lexToken) ; st = NULL ; ast = NULL ; synDeclarations(st, lexToken) ; synStatements(st, ast, label, lexToken) ; } • Note the strange way pointers are passed as reference parameters : • label holds the number of the next M68K label to be used. All labels are of the form L0, L1, .. etc.
Lookahead Conventions • C-- has an LL(1) grammar. • Sometimes we must read one token beyond the end of a syntactic construct. • When parsing an if statement to determine whether or not there’s an else part. • Conventions : • 1. All the syntax analysis subprograms assume that they will be passed their first token as a parameter. • 2. All the syntax analysis subprograms will read one token beyond the end of the syntactic construct they are attempting to parse and will pass that token back to their caller. • Exception : synStatements does not follow convention 2. • Otherwise it would read past the end of the input file. • Luckily, synStatements does not need to lookahead.
Error Detection & Recovery • Detecting errors is one thing that we’ll award marks for. • RTFC for syner.cxx to see what errors you have to detect. • Syntax errors. 27 = A ; • Type errors. bool b ; { cin >> b ; } • Static semantic errors. string s ; • As usual, upon detection of an error simply call error with an error number and the offending lexical token. error(101, lexToken) ; • error prints an error message and calls exit.
Symbol Table Lookup • One of the most common static semantic errors is the use of an undeclared variable. • The following subprogram is in utilities.cxx : bool lookup(LexToken lexToken, SymTab *st, SymTab *&match) • lookup looks inspects the symbol table to see if the identifier held in lexToken has already been declared. • If not already declared returns false and match is set to NULL. • If already declared returns true and sets match to a copy of the Entry for the identifier.
Symbol Table Lookup II • lookup is also useful for checking that identifiers have the correct type. • Note that lookup is a value returning subprogram which may also assign a value to a reference parameter. • i.e. it is a side effecting ‘function’. • This is disgusting programming practice. • It’s also standard programming practice in this instance. • Don’t do this kind of thing yourself unless you really want a zero mark.
Code For synDeclarations void synDeclarations(SymTab *&st, LexToken &lexToken) { while (lexToken.tag != LBRACE) { synDec(st, lexToken) ; if (lexToken.tag != TERMINATOR) error(3, lexToken) ; else lexAnal(lexToken) ; } } • error is in utilities.cxx. • Prints out the error message corresponding to the number. • 0..99 : Syntax errors. • 100..199 : Static semantic errors. • 200..299 : Type errors. • Prints out the offending token (using writeToken). • Calls exit to terminate the program.
Top Level Code For synDec void synDec(SymTab *&st, LexToken &lexToken) { SymTab *newEntry ; // For this declaration SymTab *dummy ; // For lookup LexToken idToken ; // Var/const name Create new Symtab entry and initialise it (set type to VOIDDATA). if (lexToken.tag == CONST) { newEntry->constFlag = true ; Lex another token into lexToken. } lexToken is the type. Set type field. If not valid type raise an error. Lex the identifier token. Copy it to idToken for better error reporting. if (lexToken.tag == IDENT) newEntry->ident = lexToken.ident ; else error(1, idToken) ; if (lookup(lexToken, st, dummy)) error(101, idToken) ;
Top Level Code For synDec II Lex next token. if (lexToken.tag == ASSIGN) { newEntry->initialise = new Factor ; Lex next token. if (lexToken.tag == BOOLLIT) { if (newEntry->type != BOOLDATA) error(217, idToken) ; newEntry->initialise->literal = true ; newEntry->initialise->type = BOOLDATA ; newEntry->initialise->litBool = lexToken.boolLit ; } else if (lexToken.tag == STRINGLIT) { As above but for a string. } else if (lexToken.tag == INTLIT) { As above but for an int. } else Not a literal so call error. Lex next token. }
Top Level Code For synDec III if ((newEntry->constFlag) && (newEntry->Initialise == NULL)) error(103,idToken) ; if ((newEntry->type == STRINGDATA) && (!newEntry->constFlag)) error(104,idToken) ; Add newEntry to the front of st. } // synDec • Note that this will build the symbol table in reverse order of the order of the declarations. • This is no problem. • When parsing some languages (e.g. C, C++) it’s actually an advantage.
Top Level Code For synStatements void synStatements(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { AST *newast = NULL ; // Next statement AST *temp1 = NULL ; // For reversing AST *temp2 = NULL ; // the statement AST *temp3 = NULL ; // list if (lexToken.tag != LBRACE) error(4, lexToken) ; Lex the next token. while (lexToken.tag != RBRACE) { newast = new AST ; synStatement(st, newast, label, lexToken) ; if (lexToken.tag != TERMINATOR) error(8, lexToken) ;
Top Level Code For synStatements II if (lexToken.tag != TERMINATOR) error(8, lexToken) ; newast->next = ast ; ast = newast ; Lex the next token. } temp1 = ast ; while (temp1 != NULL) { temp3 = temp1->next ; temp1->next = temp2 ; temp2 = temp1 ; temp1 = temp3 ; } ast = temp2 ; } // synStatements Statement list built in reverse order. Must reverse it back again after it has been parsed.
synStatement • synStatement will call the following subprograms : • synIfSt, synWhileSt, synCinSt, synCoutSt, synAssignSt. • synStatement decides which to call via an if statement which examines lexToken.tag. • IF token : call synIfSt. WHILE token : call synWhileSt. CIN token : call synCinSt. COUT token : call synCoutSt, IDENT token : call synAssignSt. None of the above : call error. • synIfSt and synWhileSt will call synStatements to syntax analyse their compound statements. • Recursive Descent parsing : a set of mutually recursive subprograms, one per production rule in the EBNF syntax.
Parsing Input Statements • By far the easiest of the statement parsing subprograms are synCinSt and synCoutSt : void synCinSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { ast->tag = CINST ; ast->cinst = new CinSt ; lexAnal(lexToken) ; if (lexToken.tag != INOP) error(9, lexToken) ; lexAnal(lexToken) ; if (lexToken.tag != IDENT) error(11, lexToken) ;
Parsing Input Statements II if (!lookup(lexToken, st, ast->cinst->invar)) error(102, lexToken) ; if (ast->cinst->invar->type != INTDATA) error(216, lexToken) ; if (ast->cinst->invar->constFlag) error(105, lexToken) ; lexAnal(lexToken) ; } • Most of the code is to check for syntax, static semantic and type errors.
Parsing if-else Statements • This is slightly more complex : • Need to use lookahead. • Must handle the label field. • Must parse the conditional expression. • Must parse the enclosed statements. void synIfSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken) { DataType type = VOIDDATA ; Expression *expr = NULL ; AST *thenpart = NULL ; AST *elsepart = NULL ;
Parsing if-else Statements II ast->tag = IFST ; ast->ifst = new IfSt ; ast->ifst->elselabel = label++ ; ast->ifst->endlabel = label++ ; lexAnal(lexToken) ; if (lexToken.tag != LPAREN) error(6, lexToken) ; lexAnal(lexToken) ; synExpression(st, expr, lexToken, type) ; if (type != BOOLDATA) error(202, lexToken) ; if (lexToken.tag != RPAREN) error(7, lexToken) ;
Parsing if-else Statements III lexAnal(lexToken) ; synStatements(st, thenpart, label, lexToken) ; lexAnal(lexToken) ; if (lexToken.tag == ELSE) { lexAnal(lexToken) ; synStatements(st, elsepart, label, lexToken) ; lexAnal(lexToken) ; } ast->ifst->condition = expr ; ast->ifst->thenstats = thenpart ; ast->ifst->elsestats = elsepart ; } // synIfSt synStatements doesn’t lex ahead so must lex another token after each call to it.
Expressions • synIfSt must call synExpression to parse the conditional expression. • So must synWhileSt and synAssignSt. • The next lecture covers how to write synExpression. • For now, assume that all expressions are simply literal constants. if (true) while (false) x = 42; { ... } ; { ... } ; • Code to parse literals is on slide 20. • Handling expressions can be a bit fiddly. • Not difficult, just fiddly.
Summary • Copy syner.template, makefile and syner (renamed dhsyner) into your directory. • Print out syner.h, utilities.cxx and printers.cxx. • Rename syner.template to syner.cxx. • Complete the stubs in syner.cxx in the following order : • synDeclarations, synDec, synStatements, synStatement, synCinSt, synCoutSt, synAssignSt, synWhileSt, synIfSt. • For now, assume all expressions are simply literal constants. • synExpression uses code from slide 20.