450 likes | 537 Views
CS 153: Concepts of Compiler Design September 3 Class Meeting. Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak www.cs.sjsu.edu/~mak. Project Teams. P rojects will be done by small project teams . Projects may be broken up into assignments.
E N D
CS 153: Concepts of Compiler DesignSeptember 3 Class Meeting Department of Computer ScienceSan Jose State UniversityFall 2014Instructor: Ron Mak www.cs.sjsu.edu/~mak
Project Teams • Projects will be done by small project teams. • Projects may be broken up into assignments. • Form your own teams of 4 members each. • Choose your team members wisely! • Be sure you’ll be able to meet and communicate with each other and work together well. • No moving from team to team. • Each team member will receive the same score on each team assignment and team project.
Project Teams, cont’d • Each team email to ron.mak@sjsu.eduby Friday, August 29: • Your team name • A list of team members and email addresses • Subject:CS 153TeamTeam Name • Example: CS 153 Team Super Coders_
Compiler Project • Each team develops a working compilerfor a procedure-oriented source language. • A more complete subset of Pascal. • Any other procedure-oriented language. • Invent your own language. • No Lisp-like languages. • Start thinking about and planning for your project early in the semester!
Conceptual Design (Version 2) • We can architect a compiler with three major parts:
Conceptual Design (Version 3) • A compiler and an interpreter can both use the same front end and intermediate tier.
TO: Three Java Packages FROM: Package Class UML package and class diagrams.
field “owns a” abstract class transient relationship class Front End Class Relationships These four framework classes should be source language-independent.
The Abstract Parser Class • Fields iCode and symTab refer to the intermediate code and the symbol table. • Field scanner refers to the scanner. • Abstract parse() and getErrorCount() methods. • To be implemented by language-specific parser subclasses. • “Convenience methods”currentToken() and nextToken() simply call the currentToken() and nextToken() methods of Scanner.
The Abstract Scanner Class • Private field currentToken refers to the current token, which protected method currentToken() returns. • Method nextToken() calls abstract method extractToken(). • To be implemented by language-specific scanner subclasses. • Convenience methods currentChar() and nextChar() call the corresponding methods of Source.
An Apt Quote? Before I came here, I was confused about this subject. Having listened to your lecture, I am still confused, but on a higher level. Enrico Fermi, physicist, 1901-1954
Pascal-Specific Front End Classes • PascalParserTD is a subclass of Parser and implements the parse() and getErrorCount() methods for Pascal. • TD for “top down” • PascalScanner is a subclass of Scanner and implements the extractToken() method for Pascal. StrategyDesignPattern
The Pascal Parser Class • The initial version of method parse() does hardly anything, but it forces the scanner into action and serves our purpose of doing end-to-end testing. public void parse() throws Exception { Token token; long startTime = System.currentTimeMillis(); while (!((token = nextToken()) instanceofEofToken)) {} // Send the parser summary message. float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; sendMessage(new Message(PARSER_SUMMARY, new Number[] {token.getLineNumber(), getErrorCount(), elapsedTime})); } What does this whileloop do?
The Pascal Scanner Class • The initial version of method extractToken()doesn’t do much either, other than create and return either a default token or the EOF token. protected Token extractToken() throws Exception { Token token; char currentChar = currentChar(); // Construct the next token. The current character determines the // token type. if (currentChar == EOF) { token = new EofToken(source); } else { token = new Token(source); } return token; } Remember that the Scanner method nextToken() calls the abstract method extractToken(). Here, the Scanner subclass PascalScanner implements method extractToken().
The Token Class • The Tokenclass’s default extract() method extracts just one character from the source. • This method will be overridden by the various token subclasses. It serves our purpose of doing end-to-end testing. protected void extract() throws Exception { text = Character.toString(currentChar()); value = null; nextChar(); // consume current character }
The Token Class, cont’d • A character (or a token) is “consumed”after it has been read and processed, and the next one is about to be read. • If you forget to consume, you will loop forever on the same character or token._
A Front End Factory Class • A language-specific parser goes together with a scannerfor the same language. • But we don’t want the framework classes to be tied to a specific language. Framework classes should be language-independent. • We use a factoryclass to create a matching parser-scanner pair._ Factory MethodDesignPattern
A Front End Factory Class, cont’d • Good: • Arguments to the createParser() method enable it to create and return a parser bound to an appropriate scanner. • Variable parserdoesn’t have to know what kind of parser subclass the factory created. • Once again, the idea is to maintain loose coupling._ “Coding to the interface.” • Parser parser = FrontendFactory.createParser( … );
A Front End Factory Class, cont’d • Good: • Bad: • Why is this bad? • Now variable parser is tied to a specific language._ • Parser parser = FrontendFactory.createParser( … ); PascalParserTD parser = new PascalParserTD( … )
A Front End Factory Class, cont’d public static ParsercreateParser(String language, String type, Source source) throws Exception { if (language.equalsIgnoreCase("Pascal") && type.equalsIgnoreCase("top-down")) { Scanner scanner = new PascalScanner(source); return new PascalParserTD(scanner); } else if (!language.equalsIgnoreCase("Pascal")) { throw new Exception("Parser factory: Invalid language '" + language + "'"); } else { throw new Exception("Parser factory: Invalid type '" + type + "'"); } }
Initial Back End Subclasses • The CodeGenerator and Executor subclasses will only be (do-nothing) stubs for now. StrategyDesignPattern
The Code Generator Class • All the process() method does for now is send the COMPILER_SUMMARY message. • number of instructions generated (none for now) • code generation time (nearly no time at all for now) public void process(ICodeiCode, SymTabsymTab) throws Exception { long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; intinstructionCount = 0; // Send the compiler summary message. sendMessage(new Message(COMPILER_SUMMARY, new Number[] {instructionCount, elapsedTime})); }
The Executor Class • All the process()method does for now is send the INTERPRETER_SUMMARYmessage. • number of statements executed (none for now) • number of runtime errors (none for now) • execution time (nearly no time at all for now) public void process(ICodeiCode, SymTabsymTab) throws Exception { long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; intexecutionCount = 0; intruntimeErrors = 0; // Send the interpreter summary message. sendMessage(new Message(INTERPRETER_SUMMARY, new Number[] {executionCount, runtimeErrors, elapsedTime})); }
A Back End Factory Class public static BackendcreateBackend(String operation) throws Exception { if (operation.equalsIgnoreCase("compile") { return new CodeGenerator(); } else if (operation.equalsIgnoreCase("execute")) { return new Executor(); } else { throw new Exception("Backend factory: Invalid operation '" + operation + "'"); } }
End-to-End: Program Listings • Here’s the heart of the main Pascalclass’s constructor: source = new Source(new BufferedReader(new FileReader(filePath))); source.addMessageListener(new SourceMessageListener()); parser = FrontendFactory.createParser("Pascal", "top-down", source); parser.addMessageListener(new ParserMessageListener()); backend = BackendFactory.createBackend(operation); backend.addMessageListener(new BackendMessageListener()); parser.parse(); iCode = parser.getICode(); symTab = parser.getSymTab(); backend.process(iCode, symTab); source.close(); The front end parser creates the icode and the symtab of the intermediate tier. The back end processes the icode and the symtab.
Listening to Messages • Class Pascal has inner classes that implement the MessageListener interface. private static final String SOURCE_LINE_FORMAT = "%03d %s"; private class SourceMessageListener implements MessageListener { public void messageReceived(Message message) { MessageType type = message.getType(); Object body[] = (Object []) message.getBody(); switch (type) { case SOURCE_LINE: { intlineNumber = (Integer) body[0]; String lineText = (String) body[1]; System.out.println(String.format(SOURCE_LINE_FORMAT, lineNumber, lineText)); break; } } } } Demo
Is it Really Worth All this Trouble? • Major software engineering challenges: • Managing change. • Managing complexity. • To help manage change, use the open-closed principle. • Close the code for modification. Open the code for extension. • Closed:The language-independent framework classes. • Open: The language-specific subclasses.
Is it Really Worth All this Trouble? cont’d • Techniques to help manage complexity: • Partitioning • Loose coupling • Incremental development • Always build upon working code. • Good object-oriented design • Use design patterns._
Source Files from the Book • Download the Java source code from each chapter of the book: http://www.apropos-logic.com/wci/ • You will not survive this course if you use a simple text editor like Notepad to view and edit the Java code. • The complete Pascal interpreter in Chapter 12 contains 127 classes and interfaces._
Integrated Development Environment (IDE) • You can use either Eclipse or NetBeans. • Eclipse is preferred because there is a JavaCCplug-in. • Learn how to create projects, edit source files, single-step execution, set breakpoints, examine variables, read stack dumps, etc._
The Payoff • Now that we have … • Source language-independent framework classes • Pascal-specific subclasses • Mostly just placeholders for now • An end-to-end test (the program listing generator) • … we can work on the individual components • Without worrying (too much) about breaking the rest of the code._
PascalTokenType • Each token is an enumerated value. public enumPascalTokenTypeimplements TokenType { // Reserved words. AND, ARRAY, BEGIN, CASE, CONST, DIV, DO, DOWNTO, ELSE, END, FILE, FOR, FUNCTION, GOTO, IF, IN, LABEL, MOD, NIL, NOT, OF, OR, PACKED, PROCEDURE, PROGRAM, RECORD, REPEAT, SET, THEN, TO, TYPE, UNTIL, VAR, WHILE, WITH, // Special symbols. PLUS("+"), MINUS("-"), STAR("*"), SLASH("/"), COLON_EQUALS(":="), DOT("."), COMMA(","), SEMICOLON(";"), COLON(":"), QUOTE("'"), EQUALS("="), NOT_EQUALS("<>"), LESS_THAN("<"), LESS_EQUALS("<="), GREATER_EQUALS(">="), GREATER_THAN(">"), LEFT_PAREN("("), RIGHT_PAREN(")"), LEFT_BRACKET("["), RIGHT_BRACKET("]"), LEFT_BRACE("{"), RIGHT_BRACE("}"), UP_ARROW("^"), DOT_DOT(".."), IDENTIFIER, INTEGER, REAL, STRING, ERROR, END_OF_FILE; ... }
PascalTokenType, cont’d • The static set RESERVED_WORDS contains all of Pascal’s reserved word strings in lower case: "and", "array" , "begin" , etc. • We can test whether a token is a reserved word: // Set of lower-cased Pascal reserved word text strings. public static HashSet<String> RESERVED_WORDS = new HashSet<String>(); static { PascalTokenType values[] = PascalTokenType.values(); for (int i = AND.ordinal(); i <= WITH.ordinal(); ++i) { RESERVED_WORDS.add(values[i].getText().toLowerCase()); } } • if (RESERVED_WORDS.contains(text.toLowerCase())) …
PascalTokenType, cont’d • Static hash table SPECIAL_SYMBOLS contains all of Pascal’s special symbols. • Each entry’s key is the string, such as "<" , "=" , "<=” • Each entry’s value is the corresponding enumerated value. // Hash table of Pascal special symbols. // Each special symbol's text is the key to its Pascal token type. public static Hashtable<String, PascalTokenType> SPECIAL_SYMBOLS = new Hashtable<String, PascalTokenType>(); static { PascalTokenType values[] = PascalTokenType.values(); for (int i = PLUS.ordinal(); i <= DOT_DOT.ordinal(); ++i) { SPECIAL_SYMBOLS.put(values[i].getText(), values[i]); } }
PascalTokenType, cont’d • We can test whether a token is a special symbol: • if (PascalTokenType.SPECIAL_SYMBOLS .containsKey(Character.toString(currentChar))) …
Pascal-Specific Token Classes • Each class PascalWordToken, PascalNumberToken, PascalStringToken, PascalSpecial-SymbolToken, and PascalErrorToken is is a subclass of class PascalToken. • PascalTokenis a subclass of class Token. • Each Pascal token subclass overrides the default extract() method of class Token. • The default method could only create single-character tokens. Loosely coupled.Highly cohesive.
How to Scan for Tokens • Suppose the source line contains IF (index >= 10) THEN • The scanner skips over the leading blanks. The current character is I, so the next token must be a word. • The scanner extracts a word token by copying characters up to but not including the first character that is not valid for a word, which in this case is a blank. The blank becomes the current character. • The scanner determines that the word is a reserved word.
How to Scan for Tokens, cont’d • The scanner skips over any blanks between tokens. The current character is (. The next token must be a special symbol. • After extracting the special symbol token, the current character is i. The next token must be a word. • After extracting the word token, the current character is a blank.
How to Scan for Tokens, cont’d • Skip the blank. The current character is >. • Extract the special symbol token. The current character is a blank. • Skip the blank. The current character is 1, so the next token must be a number. • After extracting the number token, the current character is ).
How to Scan for Tokens, cont’d • Extract the special symbol token. The current character is a blank. • Skip the blank. The current character is T, so the next token must be a word. • Extract the word token. • Determine that it’s a reserved word. • The current character is \n, so the scanner is done with this line.
Basic Scanning Algorithm • Skip any blanks until the current character is nonblank. • In Pascal, a comment and the end-of-line character each should be treated as a blank. • The current (nonblank) character determines what the next token is and becomes that token’s first character. • Extract the rest of the next token by copying successive characters up to but not including the first character that does not belong to that token. • Extracting a token consumes all the source characters that constitute the token. After extracting a token, the current character is the first character after the last character of that token.