810 likes | 835 Views
Learn about the phases and techniques involved in code generation for compilers, including syntax analysis, contextual analysis, and code selection. Understand the process of transforming high-level language programs into low-level target programs efficiently.
E N D
CSCE 531Compiler ConstructionCh.7: Code Generation Spring 2008 Marco Valtorta mgv@cse.sc.edu
Acknowledgment • The slides are based on the textbook and other sources, including slides from Bent Thomsen’s course at the University of Aalborg in Denmark and several other fine textbooks • The three main other compiler textbooks I considered are: • Aho, Alfred V., Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, & Tools, 2nd ed. Addison-Welsey, 2007. (The “dragon book”) • Appel, Andrew W. Modern Compiler Implementation in Java, 2nd ed. Cambridge, 2002. (Editions in ML and C also available; the “tiger books”) • Grune, Dick, Henri E. Bal, Ceriel J.H. Jacobs, and Koen G. Langendoen. Modern Compiler Design. Wiley, 2000
What This Lecture is About A compiler translates a program from a high-level language into an equivalent program in a low-level language. Triangle Program Compile TAM Program Run Result
Programming Language specification • A Language specification has (at least) three parts: • Syntax of the language: usually formal: EBNF • Contextual constraints: • scope rules (often written in English, but can be formal) • type rules (formal or informal) • Semantics: • defined by the implementation • informal descriptions in English • formal using operational, axiomatic, or denotational semantics
The “Phases” of a Compiler Source Program Syntax Analysis Error Reports Abstract Syntax Tree Contextual Analysis Error Reports Decorated Abstract Syntax Tree Code Generation Chapter 7 Object Code
input input input output output output Source Text AST Decorated AST Object Code Multi Pass Compiler A multi pass compiler makes several passes over the program. The output of a preceding phase is stored in a data structure and used by subsequent phases. Dependency diagram of a typical Multi Pass Compiler: Compiler Driver Chapter 7 calls calls calls Syntactic Analyzer Contextual Analyzer Code Generator
Issues in Code Generation • Code Selection: Deciding which sequence of target machine instructions will be used to implement each phrase in the source language. • Storage Allocation Deciding the storage address for each variable in the source program. (static allocation, stack allocation etc.) • Register Allocation (for register-based machines) How to use registers efficiently to store intermediate results. We use a stack based machine. This is not an issue for us
~ ~ Code Generation Source Program Target program let var n: integer; var c: charin begin c := ‘&’; n := n+1end PUSH 2LOADL 38STORE 1[SB]LOAD 0LOADL 1CALL addSTORE 0[SB]POP 2HALT Source and target program must be “semantically equivalent” Semantic specification of the source language is structured in terms of phrases in the SL: expressions, commands, etc. => Code generation follows the same “inductive” structure. Q: Can you see the connection with denotational semantics?
“Inductive” Code Generation “Inductive” means: code generation for a “big” structure is defined in terms of putting together chunks of code that correspond to the sub-structures. Example: Sequential Command code generation • Semantic specification • The sequential commandC1;C2is executed as follows: first C1 is executed then C2is executed. • Code generation function: • execute : Command -> Instruction* • execute [C1;C2] = • execute [C1] • execute [C2] instructions for C1 instructions for C1;C2 instructions for C2
“Inductive” Code Generation Example: Assignment command code generation • Code generation function: • execute [I:=E] = • evaluate [E] • STOREaddress [I] instructions for Eyield value for E on top of the stack instruction to store result into variable These “pictures” of the code layout for a particular source language construct are called code templates. Inductive means: A code template specifies the object code to which a phrase is translated, in terms of the object code to which its subphrases are translated.
“Inductive” Code Generation Example: code generation for a larger phrase in terms of its subphrases LOAD f LOAD n CALL mult STORE f execute [f := f*n] execute [f := f*n; n := n-1] LOAD n CALL pred STORE n execute [n := n-1]
Specifying Code Generation with Code Templates • For each “phrase class” P in the abstract syntax of the source language: • Define code function fP : P -> Instruction* • that translate each phrase in class P to object code. We specify the function fP by code templates. Typically they look like: fP […Q…R…] = … fQ [Q] … fR [R] … note: A “phrase class” typically corresponds to a non-terminal of the abstract syntax. This in turn corresponds to an abstract class in the Java classes that implement the AST nodes. (for example Expression, Command, Declaration)
Specifying Code Generation with Code Templates Example: Code templates specification for Mini Triangle RECAP: The mini triangle AST Program ::= Command Program Command ::= V-name := ExpressionAssignCmd | let Declaration in CommandLetCmd ... Expression ::= Integer-LiteralIntegerExp | V-name VnameExp | Operator ExpressionUnaryExp | Expression Op ExpressionBinaryExp Declaration ::= ... V-name::= IdentifierSimpleVName
Specifying Code Generation with Code Templates The code generation functions for Mini Triangle Phrase Class Function Effect of the generated code Run program P then halt. Starting and finishing with empty stack Execute Command C. May update variables but does not shrink or grow the stack. Evaluate E, net result is pushing the value of E on the stack. Push value of constant or variable on the stack. Pop value from stack and store in variable V Elaborate declaration, make space on the stack for constants and variables in the decl. run P executeC evaluateE fetchV assignV elaborate D Program Command Expres- sion V-name V-name Decla-ration
Code Generation with Code Templates The code generation functions for Mini Triangle Programs: • run [C] = • execute [C] • HALT Commands: • execute [V:=E] = • evaluate [E] • assign [V] • execute [I(E )] = • evaluate [E] • CALL p wherep is address of the routine named I
E C1 g: C2 h: Code Generation with Code Templates Commands: • execute [C1 ;C2] = • execute [C1] • execute [C2] • execute [if E thenC1 elseC2] = • evaluate [E] • JUMPIF(0) g • execute [C1] • JUMP h • g:execute [C2] • h:
C E E C Alternative While Command code template: • execute [whileE doC] = • g: evaluate [E] • JUMPIF(0) h • execute[C] • JUMP g • h: Code Generation with Code Templates Commands: • execute [whileE doC] = • JUMP h • g: execute [C] • h: evaluate[E] • JUMPIF(1) g
Code Generation with Code Templates Repeat Command code template: • execute [repeatC untilE] = • g: execute [C] • h: evaluate[E] • JUMPIF(0) g • execute [let D inC] = • elaborate[D] • execute [C] • POP(0) s if s>0where s= amount of storage allocated by D C E
Code Generation with Code Templates Expressions: • evaluate [IL] = note: IL is an integer literal • LOADL vwhere v= the integer value of IL • evaluate [V] = note: V is variable name • fetch[V] • evaluate [O E] = note: O is a unary operator • evaluate[E] • CALL pwhere p= address of routine for O • evaluate [E1O E2] = note: O is a binary operator • evaluate[E1] • evaluate[E2] • CALL pwhere p= address of routine for O
Code Generation with Code Templates Variables: note: Mini triangle only needs static allocation (Q: why is that? ) fetch [V] = LOAD d[SB]where d= address of V relative to SB assign [V] = STORE d[SB]where d= address of V relative to SB
Code Generation with Code Templates Declarations: elaborate [const I ~E] = evaluate[E] elaborate [var I :T] = PUSH swhere s= size of T elaborate [D1 ;D2] = elaborate [D1] elaborate [D2] THE END: these are all the code templates for Mini Triangle. Now let’s put them to use in an example.
Example of Mini Triangle Code Generation • execute [while i>0 do i:=i+2] = • JUMP h • g: LOAD i • LOADL 2 • CALL add • STORE i • h: LOAD i • LOADL 0 • CALL gt • JUMPIF(1) g evaluate [i+2] execute [i:=i+2] evaluate [i>0] Note: Picture shows a few steps but not all
Special Case Code Templates There are often several ways to generate code for an expression, command, etc. The templates we defined work, but sometimes we can get more efficient code for special cases => special case code templates. Example: • evaluate [i+1] = • LOAD i • LOADL 1 • CALL add • evaluate [i+1] = • LOAD i • CALL succ more efficient code for the special case “+1” what we get with the “general” code templates
Special Case Code Templates Example: some special case code template for “+1”, “-1”, … evaluate [E+ 1] = evaluate [E] CALL succ evaluate [1 + E] = evaluate [E] CALL succ evaluate [E- 1] = evaluate [E] CALL pred A special-case code template is one that is applicable to phrase of a special form. Such phrases are also covered by a more general form.
Special Case Code Templates Example: “Inlining” known constants. • execute [let const n~7; • var i:Integer • in i:=n*n] = • LOADL 7 • PUSH 1 • LOAD n • LOAD n • CALL mult • STORE i • POP(0) 2 elaborate [const n~7] elaborate [var i:Integer] execute [i:=n*n] This is how the code looks like with no special case templates
Special Case Code Templates Example: “Inlining” known constants. Special case templates for inlining literals. elaborate [const I ~IL] = no code • fetch [I] = special case if I is a known literal constant • LOADL vwhere v is the known value of I • execute[let const n~7; var i:Integer in i:=n*n] = • PUSH 1 • LOADL 7 • LOADL 7 • CALL mult • STORE i • POP(0) 1
Code Generation Algorithm The code templates specify how code is to be generated => determines code generation algorithm. Generating code: traversal of the AST emitting instructions one by one. The code templates determine the order of the traversal and the instructions to be emitted. We will now look at how to implement a Mini Triangle code generator in Java.
Representation of Object Program: Instructions public class Instruction { public byte op; // op-code 0..15 public byte r; // register field (0..15) public byte n; // length field (0..255) public short d; // operand f. (-32767..+32767) public static final byte // op-codes LOADop = 0, LOADAop = 1, ... public static final byte // register numbers CBr = 0, CTr = 1, … SBr = 4, STr = 5, … public Instruction(byte op,byte n, byte r,short d) { ... } }
Representation of Object Program: Emitting Code public class Encoder { private Instruction[] code = new Instruction[1024]; private short nextInstrAddr = 0; private void emit(byte op,byte n, byte r,short d) { code[nextInstrAddr++]=new Instruction( op,n,r,d); } ... lots of other stuff in here of course ... }
Phrase Class visitor method Behavior of the visitor method Program visitProgram generate code as specified by run[P] Command visit…Command generate code as specified by execute[C] Expression visit…Expression generate code as specified by evaluate[E] V-name visit…Vname Return “entity description” for the visited variable or constant name. Declaration visit…Declaration generate code as specified by elaborate[D] Type-Den visit…TypeDen return the size of the type Developing a Code Generator “Visitor”
Developing a Code Generator “Visitor” For variables we have two distinct code generation functions: fetch and assign. => Not implemented as visitor methods but as separate methods. public void encodeFetch(Vname name) { ... as specified by fetch template ... } public void encodeAssign(Vname name) { ... as specified by assign template ... }
Developing a Code Generator “Visitor” public class Encoder implements Visitor{ ... /* Generating code for entire Program */ public Object visitProgram(Program prog, Object arg ) { prog.C.visit(this,arg); emit a halt instruction return null; }
Developing a Code Generator “Visitor” RECAP: • execute [V:=E] = • evaluate [E] • assign [V] /* Generating code for commands */ public Object visitAssignCommand( AssignCommand com,Object arg) { com.E.visit(this,arg); encodeAssign(com.V); return null; }
Developing a Code Generator “Visitor” • execute [I(E)] = • evaluate [E] • CALL p wherepis address of the routine named I public Object visitCallCommand( CallCommand com,Object arg) { com.E.visit(this,arg); short p = address of primitive routine for name com.I emit(Instruction.CALLop, Instruction.SBr, Instruction.PBr, p); return null; }
Developing a Code Generator “Visitor” • execute [C1;C2] = • execute[C1] • execute[C2] public Object visitSequentialCommand( SequentialCommand com,Object arg) { com.C1.visit(this,arg); com.C2.visit(this,arg); return null; } LetCommand, IfCommand, WhileCommand => later. - LetCommand is more complex: memory allocation and addresses - IfCommand and WhileCommand: complications with jumps
Developing a Code Generator “Visitor” evaluate [IL] = LOADL vwhere v is the integer value of IL /* Expressions */ public Object visitIntegerExpression ( IntegerExpression expr,Object arg) { short v = valuation(expr.IL.spelling); emit(Instruction.LOADLop, 0, 0, v); return null; } public short valuation(String s) { ... convert string to integer value ... }
Developing a Code Generator “Visitor” evaluate [E1 O E2] = evaluate [E1] evaluate [E2] CALL pwhere pis the address of routine for O public Object visitBinaryExpression ( BinaryExpression expr,Object arg) { expr.E1.visit(this,arg); expr.E2.visit(this,arg); short p = address for expr.O operation emit(Instruction.CALLop, Instruction.SBr, Instruction.PBr, p); return null; } Remaining expression visitors are developed in a similar way.
C E Controls Structures We have yet to discuss generation for IfCommand and WhileCommand • execute [while E doC] = • JUMP h • g:execute [C] • h: evaluate[E] • JUMPIF(1) g A complication is the generation of the correct addresses for the jump instructions. We can determine the address of the instructions by incrementing a counter while emitting instructions. Backwards jumps are easy but forward jumps are harder. Q: why?
Control Structures • Backwards jumps are easy: • The “address” of the target has already been generated and is known • Forward jumps are harder: • When the jump is generated the target is not yet generated so its address is not (yet) known. • There is a solution which is known as backpatching • 1) Emit jump with “dummy” address (e.g. simply 0). • 2) Remember the address where the jump instruction occurred. • 3) When the target label is reached, go back and patch the jump instruction.
Backpatching Example public Object WhileCommand ( WhileCommand com,Object arg) { short j = nextInstrAddr; emit(Instruction.JUMPop, 0, Instruction.CBr,0); short g = nextInstrAddr; com.C.visit(this,arg); short h = nextInstrAddr; code[j].d = h; com.E.visit(this,arg); emit(Instruction.JUMPIFop, 1, Instruction.CBr,g); return null; } dummy address backpatch • execute [while E doC] = • JUMP h • g: execute [C] • h: evaluate[E] • JUMPIF(1) g
Calculated during generation for • elaborate[D] How to know these? Constants and Variables We have not yet discussed generation of LetCommand. This is the place in MiniTriangle where declarations are. • execute [let D inC] = • elaborate[D] • execute [C] • POP(0) s if s>0where s= amount of storage allocated by D fetch [V] = LOAD d[SB]where d= address of V relative to SB assign [V] = STORE d[SB]where d= address of V relative to SB
Constants and Variables Example Accessing known values and known addresses let const b ~ 10; var i:Integer; in i := i*b PUSH 1 LOAD 4[SB] LOADL 10 CALL mult STORE 4[SB] elaborate[const … ; var …] execute [i:=i*b]
Constants and Variables Example Accessing an unknown value. Not all constants have values known (at compile time). let var x:Integer; in let const y ~ 365 + x in putint(y) Depends on variable x: value not known at compile time. When visiting declarations the code generator must decide whether to represent constants in memory or as a literal value => We have to remember the address or the value somehow.
Constants and Variables Example Accessing an unknown value. let var x:Integer; in let const y ~ 365 + x in putint(y) PUSH 1 LOADL 365 LOAD 4[SB] CALL add STORE 5[SB] LOAD 5[SB] CALL putint elaborate[var x:Integer] elaborate[const y ~ 365 + x] execute [putint(y)]
Constants and Variables Entity descriptions: When the code generator visits a declaration: 1) it decides whether to represent it as a known value or a known address 2) if its an address then emit code to reserve space. 3) make an entity description: an object that describes the variable or constant: its value or address, its size. 4) put a link in the AST that points to the entity description Example and picture on next slide
Constants and Variables letconst b ~ 10; var i:Integer; in i := i*b RECAP: Applied occurrences of Identifiers point to their declaration LetCommand SequentialDeclaration ConstDecl VarDecl Ident Int.Exp Ident Ident Ident Ident 10 b i i i b known value size = 1 value = 10 known address address = 4 size = 1
VarDecl ConstDecl Ident Ident x y Constants and Variables letvar x:Integer; in let const y ~ 365 + x in putint(y) LetCommand Note: There are also unknown addresses. More about these later. Q: When do unknown addresses occur? known address address = 4 size = 1 unknown value address = 5 size = 1
Static Storage Allocation Example 1: Global variables Tam Address: a 0[SB] b 1[SB] c 2[SB] d 3[SB] letvar a: Integer; var b: Boolean; var c: Integer; var d: Integer; in ... Note: In this example all globals have the same size: 1. This is not always the case.
Static Storage Allocation Example 2: Static allocation with nested blocks: overlays Tam Address: a 0[SB] b 1[SB] c 2[SB] d 1[SB] letvar a: Integer; in begin ... let var b: Boolean; var c: Integer; in begin ... end; ... let var d: Integer; in begin ... end; ... end Same address! Q: Why can b and d share thesame address?
Static Storage Allocation: In the Code Generator Entity Descriptions: public abstract class RuntimeEntity { public short size; ... } public class KnownValue extends RuntimeEntity { public short value; ... } public class UnknownValue extends RuntimeEntity { public short address; ... } public class KnownAddress extends RuntimeEntity { public short address; ... }