190 likes | 208 Views
Using Ada 95 in a Compiler Course. SIGAda 2001 S. Tucker Taft CTO AverCom Corp., a Titan Company October 3, 2001 Bloomington, MN. Outline. What are we trying to teach in a compiler course? How can Ada help? The Approach Used In This Course
E N D
Using Ada 95 in a Compiler Course SIGAda 2001 S. Tucker Taft CTO AverCom Corp., a Titan Company October 3, 2001 Bloomington, MN
Outline • What are we trying to teach in a compiler course? • How can Ada help? • The Approach Used In This Course • The Ada Package Structure and What the Student Builds • Conclusion
What are we trying to teach? • Compiler Theory and Compiler Construction Techniques • Tackling a large, complex problem and reducing it to manageable pieces (and getting it all to work!) • Using the data structures and algorithms learned in earlier courses in creative ways; choosing the right ones to use in each circumstance • Using Object-Oriented Programming Techniques in a large, “real-world” problem • Using Ada in a large, “real-world” problem
How Can Ada Help? • Minimize time wasted in debugging • Emphasize high-level (package / subsystem) structure and interfaces • Support approach where Professor provides (visible-part of) package spec, while students provide package (private part and) body • Illustrate the value of abstracting even simple integral types like line numbers, hash codes, lexical levels
The Approach Used in This Course • Focus on phases and their abstractions • Understand high-level structure: • Lexing Source => Lexemes • Parsing Lexemes (leaves) => AST • Semantics AST => AAST/SymTab • IR Generation AAST/SymTab => IR • Optional Flow Optimization IR => (better) IR • Instruction Selection IR => Pseudo Asm • Register Allocation Pseudo Asm => Real Asm
Package / Subsystem Structure • Package/Subsystem for each Phase • Package/Subsystem for each Abstraction Output Interp Lexer Parser Sem IR Gen Inst Sel RegAlc Lexemes StringTab Source ASTs SymTab IR PAsm Asm Flow Language-Specific Machine-Specific
Lexer and Parser Phases • Lexer: • Abstractions • Source File (Abstract Stream of Characters) • Source position (File, Line, Column) • Lexeme/Token (Tagged Type Hierarchy) • String/Identifier/Reserved Word (Hash) Table • Error/Warning Message Generation • Processing • Token Building (Finite State Automaton) • Parser: • Abstractions • Abstract Syntax Tree (AST) -- Building Routines • Processing • LALR Parsing and AST Building Fruit := Apple + Pear;
Semantics Phase • Abstractions: • Annotations for AST (Tagged Type Hierarchy) • Lexical Visibility Stack (LVS) of Symbol Tables • Tables Hashed on String ID • Tables Stored as Annotations on Program Unit • Entries refer to Annotations on Individual Declarations • Processing: • Walk AST • Build LVS, Symbol Tables, and Annotations • Look Up All Identifier References • Implemented As Dispatching Operations of AST Node Type
Interpreter • Abstractions: • Run-Time Display (Analogous to LVS) • Run-Time Value (Tagged Type Hierarchy) • Processing: • Walk AST • Build/Use Display of Values • Follow into Subprogram Bodies • Analogous to Inlining at Compile Time • Invoke Builtin RTS Subprograms (E.g. Put_Line) • Implemented As Dispatching Ops on AST Node Type
Intermediate Representation (IR) Generator • Abstractions: • Low-Level LVS and IR Symbol Table • Analogous to Run-Time Display • IR (Tagged Type Hierarchy) -- Building Routines • IR Stream -- For Declarations, Statements, and Side-Effects • IR Trees -- For “Pure” Expression Evaluation • Processing: • Walk Annotated AST (AAST) • Buil/Used Low-Level LVS and SymTab • Generate IR • Implemented as Dispatching Ops on AST Node Type
Instruction Selection • Abstractions: • “Temp” (Virtual Register) Table • Pseudo Assembly Instructions (Tagged Type Hierarchy) and Stream Thereof • Database of IR (Tree) Patterns and Corresponding Pseudo Assembly Sequences • Processing: • Walk IR • Match IR Trees Against Database of Patterns“Database” • Maximal Munch (Top-Down) or Dynamic Programming (Bottom Up) • Maximal Munch Can Be Implemented As Dispatching Ops of IR Tree Node Type • Generate Instructions and Create/Use “Temps” (Virtual Registers)
Register Allocation • Abstractions: • Basic Blocks (Flow Graph) • Live Sets (Temps alive at entry/exit of Basic Blocks) • Register Map • Temp => Physical Register or Spill Location • Processing: • Compute Live Sets at entry/exit of Basic Blocks • Iterate until they stabilize • Instance of more general iterative flow graph algorithms: • Start with ideal case (e.g. nothing alive) • Iterate away from that until stabilize • Iterate until no more spills: • Generate Conflict/Affinity Matrix • Perform Register Coloring/Spilling/Coalescing • Produce Register Map • Use Register Map to Produce “Real” Assembly Code
A Couple of Pedagogical Issues • Use Dispatching Ops or “Visitor” Pattern? • How Much and What Code To Provide?
Use Dispatching Ops or “Visitor” Pattern? • Semantics, Interpreter, IRGen All Implementable As Dispatching Ops of AST Node • Maximal Munch Instruction Selection Implementable As Dispatching Op of IR Node • Visitor Pattern more complicated, and more work • Single Tree Walk Dispatching Op Takes Visitor Parameter • Create Visitor Type Extension for Each Phase • Use (Compile-Time) Overloading to select Dispatching Operation of Visitor Object to Call • OO Moral Equivalent of Switch/Case Statement?
Answer: • Let Students Experiment and Choose • Interesting Lesson in Tradeoffs between Simplicity, Flexibility, Maintainability • Dispatching Operations are Simpler • Visitor Pattern allows new phase to be added without touching AST abstraction • But… Add a new AST node, and must track down all Phases and make sure Pre/Post-Visit operations are updated • Reminiscent of Switch/Case maintenance problems • But Hopefully many fewer of them to find
FYI: Visitor Example(short quiz next period) package AST is type AST_Node is abstract tagged … type Visitor_Root is abstract tagged null record; procedure Walk(Node : access AST_Node; Visitor : access Visitor_Root’Class) Is abstract; ... with AST.Exprs, AST.Stmts, AST.Decls package AST.Visitors is type Visitor is abstract new AST.Visitor_Root with null record; procedure Pre_Visit(Visitor: access AST_Visitor; Tree : access AST.Exprs.Binary_Op); procedure Post_Visit(Visitor : access AST_Visitor; Tree : access AST.Exprs.Binary_Op); procedure Pre_Visit(Visitor: access AST_Visitor; Tree : access AST.Exprs.Unary_Op); procedure Post_Visit(Visitor : access AST_Visitor; Tree : access AST.Exprs.Unary_Op); procedure Pre_Visit(Visitor: access AST_Visitor; Tree : access AST.Stmts.Asgn_Stmt); procedure Post_Visit(Visitor : access AST_Visitor; Tree : access AST.Exprs.Asgn_Stmt); … with AST.Visitors, AST.Exprs, AST.Stmts, AST.Decls; package Interpreter is type Interp_Visitor is new AST.Visitors.Visitor with … procedure Post_Visit(Visitor: access Interp_Visitor; Tree : access AST.Exprs.Binary_Op); procedure Post_Visit(Visitor: access Interp_Visitor; Tree : access AST.Exprs.Unary_Op); ...
How Much and What Code to Provide? • Names and Explanations of Phases and Abstractions • Actual Package Specs • Package Specs and Sample Code for some of the operations • Depends on Phase or Abstraction
How Much and What Code to Provide (cont’d) • Abstractions => Provide Package Specs • Processing => Explain algorithms • Visitor Pattern vs. Disp. Op Experiment => Provide Sample Code as well
Conclusions • A Compiler Course is a Treasure Trove of Learning Experiences • Ada 95 is an excellent language for teaching a compiler course • Package / Subsystem Structure helps to reinforce Compiler phase/abstraction structure • Compile-time and run-time checks dramatically reduce debugging time • Readability should make the Professor Happy ;-) • Someday real soon now... • There will be a simple compiler written in Ada 95 available for use in teaching