220 likes | 333 Views
Problem 2. A Scanner / Parser for Simple C. Outline. Language syntax for SC Requirements for the scanner Requirement for the parser companion files Main classes. Language Syntax for SC. 0. Comments: 1. Data types: 2. Literals and Identifiers 3. Operators:
E N D
Problem 2 A Scanner / Parser for Simple C
Outline • Language syntax for SC • Requirements for the scanner • Requirement for the parser • companion files • Main classes
Language Syntax for SC 0. Comments: 1. Data types: 2. Literals and Identifiers 3. Operators: 4. Control Statements: 5. Functions: 6. Program Syntax: 7. Forward Function Declarations: 8. Built-in library functions: 9. Nested lexical scoping.
0. Comments: // Ignore to the end of line 1. Data types: • int • void • There can be arrays of int. ex: int x, y[10]; • There are no boolean variables, but expressions may have type boolean if they are the result of a comparison. (e.g., x > y )
2. Literals and identifiers • IDENTIFIER: • used for variable or function names • type names ‘int’ and ‘void’ are key words. • format: (letter or _ ) (letter or _ or digit )* • (integer) CONSTANT: • non-negative decimal number( < 231). • legal: 23, 54, 0, • illegal: -10, 01, 001. • note: -10 is regarded as two tokens – and 10. • STRING_LITERAL: “ followed by a sequence of characters in which “, and \ must be escaped by a preceding \, and finally followed by a matching “.
3. Operators • arithmetic +, -, *, / • relational ==, !=, <, > • logical &&, ||, ! • All binary arithmetic and logical operators are left-associative. • NOT (!) and UNARY_MINUS (-) are not associative. I.e., (- - 2) is illegal; you must use -(-2). • Arithmetic operators require integer operands, logical operators require boolean operands, The relational operators == and != work on both bool & int. • The relational operators < and > work on only int. All return a boolean.
Operator Precedence • Highest - (unary minus) * / + - == != > < ! Lowest && || • You have to assign the appropriate precedences in CUP. • All math operators (+, -, *, /) and comparisons (==, !=, <, >) aredefined for only integers. • Logical operators (&&, ||, !) apply only to results of comparisons (booleans).
4. Control Statements • Assignment: lhs = Expression ; • function call: foo(…); • return: return [ expr ]; • blockStatement: { statement_list } • Statement_list : Statement_list statement | e • SC has three control statements: • if, if-else, and while. • if ( condition ) { statement_list } • if ( condition ) { statement_list } else { statement_list } • while ( condition ){ statement_list }
Control Statements • There can be any level of nesting of these control statements. Curly braces are required for all SC control statements. • Ex: the code if (x != 0) a = b + c; should produce a syntax error. The correct syntax would be if (x != 0) { a = b + c; } • There is no implicit type casting in conditional statements. • For instance : “ if (x) " is not valid if "x" is an int. You need to write “ if ( x != 0) "
5. Functions • All functions must be declared with type int or void. • Function calls in SC can appear either on the RHS of an assignment or as a statement: • x = a + foo() + 10;// foo must return an integer • foo();// foo may return either integer or void • A function call will have either 0 or 1 arguments. • Functions may have 'return' statements. There can be one or more return statements in the statement list. The syntax for the return statements are • return;// when returns void • return exp;// when returns integer
6. Program Syntax: • A valid SC program consists of zero or more global variable declarations, followed by one or more function definitions. • The body of the functions may contain local variable declarations, followed by the code. • The function structure : [ int | void ] function_name( 0 or 1 parameters ) { local variable declarations; // optional statement list; }
7. Forward Function Declarations: • A function has to be defined before you can call it. • Ex: the following code should cause bar() to be marked as an undefined function. void foo() { bar(1); } void bar(int x) { y = x; } • Here, we need a forward function declarations. for bar() before we can call it. void bar( int x ); foo() { bar(); } void bar( int y ) { y = 1; } • Forward declarations specify the function return type and parameter type. They must match with actual function declaration.The name of the parameter in the forward declaration is not relevant.
8. Built-in functions • There are three built-in void functions: • printInt(int), printString(StringLiteral), printLine(). • The compiler will automatically generate code for these functions if they are used. printString() takes a literal constant string as argument. • Ex: int x; printInt(1); printInt(x); printInt(x+1); printString("foo"); printLine();
9. Scoping rules • SC applies nested lexical scoping, where every pair of curly brackets creates a new nested scope (and can include new variable declarations). • Variables visible include those declared locally and in enclosing scopes. For instance: A: int x; // x is global foo() { B: int x,y; if (x == 1) // refers to x declared at B: { C: int x,z; x = y+z; //x,z declared at C: ; y declared at B: } else { int i,j; x = 3; // x declared at B:; }} // x,z at C: are not visible
Requirements for lexical analyser • Identify the keywords, operators, identifiers, strings, constants and other necessary characters correctly. • Use JLex to generate the lexical analyzer • You may need to explicitly look for carriage returns on PC-based systems, if the %notunix command to the lexer is not working. You can do this by specifying "\r\n" instead of "\n" as the newline. Goals: 1. find all legal tokens 2. handle comments 3. report unterminated strings Tasks: • add comments • extend strings • extend numbers
Requirements for the SC parser • Use CUP to generate the parser. • Report simple syntax errors, and attempt to recover. • Goals: • 1. accept all legal SC syntax • 2. report simple syntax errors using "error" token Exs: • "illegal statement“, "illegal expression" • "illegal declaration" "missing semicolon" Tasks: • void functions • array declarations • multiple variable declarations • nested lexical scoping • arithmetic, logical, and relational operators • forward function declarations
Implementation • You have been given a toy parser which parses a subset of the SC language. • You need to enhance them to parse the full SC language. • The files are listed below, with a brief description: • mysc.lex: • The JLex specification file for implementing the scanner. Most of the basic tokens have been added for you. All you need to do is extend it for comments, strings, and better handling of identifiers and numbers. • mysc.cup: • The CUP specification file. You need to extend this file in order to implement the parser. Most of your changes should be to this file. A lot of the grammar and actions have been provided.
SymTabEntry.java: • Code for implementing symbol table entries. You should not need to modify this file. • SymTab.java: • Code for implementing the symbol table You should not need to modify this file. • ExpNode.java: • Code for storing information with expression nodes. You should not need to modify this file. • Yylex.java, Parser.java, sym.java: • Files created by JLex & CUP • go.bat: Scriptfile for compiling mycc. • goAll.bat: Script for compiling mycc & testing it on test*.in
goTest.bat: Script testing mycc on test*.in • toy*.sc: Some sample input SC programs handled by the toy front end • test*.sc: Some sample input SC programs you should try to parse • test*.log: Log files created by mysc when using the goAll.bat scripts • test*.out: Some sample output files from a fully implemented front end
Main classes 1.Parser • Generated by CUP, this is the class which performs the actual parse. All action code in your CUP grammar are executed as methods of class "parser". • Important public fields include globalSymTab, currSymTab, and curType. • These fields are used to store information to be transfered between different actions in CUP. • The "parser" class also contains main(), the starting point of the user code. 2. sym • Generated by CUP; • contains the constants for all the token in the grammar. Also used by JLex for the scanner. • Useful constants: sym.INT, sym.BOOL, and sym.VOID.
Main classes (continued) 3. Yylex • Generated by JLex; implements the scanner. 4. SymTab • Stores all the symbols in a scope as a HashMap. • The symbol table also keeps track of its parent. • All local nested scopes are kept in a children list, except for function scopes, which are stored in the entry for the function symbol.
Main classes (continued) 5. SymTabEntry • Stores information for each symbol. • Generic information includes name (as a String) & type (sym.INT, etc). Also includes information specific to arrays & functions. • Note: The symTabEntry for a function stores the symTab for that function. 6. ExpNode • Stores information as to the type and value of an expression. • Helpful for keeping track of intermediate expressions (e.g., 1+2*4).