220 likes | 394 Views
Code Generation. Compiler Baojian Hua bjhua@ustc.edu.cn. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Back End. instruction selector. IR. Assem. register allocator. TempMap. instruction scheduler. Assem. Code Generation.
E N D
Code Generation Compiler Baojian Hua bjhua@ustc.edu.cn
Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR
Back End instruction selector IR Assem register allocator TempMap instruction scheduler Assem
Code Generation • Generating code for some ISA • this course uses x86 • Many components • instruction selection, register allocation, scheduling, … • Many different strategies • for this time, we concentrate on a simple one: stack machine • and later in this course, we’d turn to more advanced (and sophisticated) ones
What’s a stack machine? • A stack machine has only an operand stack and no (or few) registers • all computation performed on the operand stack • architecture very simple and uniform • Long history: • Date back at least to 70’s last century • Renew industry’s interest in the recent decade • Sun’s JVM and Microsoft’s CLR, etc.
Stack Machine ISA: s86 // Sample Program push 8 push 2 push x times sub prog -> instr prog -> instr -> push v -> pop id -> add -> sub -> times -> divide v -> num -> id
The simple expression lang’ // or in ML datatype exp = Int of int | Id of string | Add of exp * exp | Sub of exp * exp | Times of exp * exp | Divide of exp * exp // Sample Program 8-2*x // recall our simple // expression language exp -> num -> id -> exp + exp -> exp – exp -> exp * exp -> exp / exp -> (exp)
Code gen’ from exp to s86 C (num) = push num C (id) = push id C (e1 + e2) = C (e1); C (e2); add C (e1 – e2) = C (e1); C (e2); sub C (e1 * e2) = C (e1); C (e2); times C (e1 / e2) = C (e1); C (e2); divide
Code gen’ from exp to s86 // or in ML fun C (e) = case e of Num i => push i | Id s => push s | Add (e1, e2) => C (e1); C (e2); add | … => (* similar *)
Example C (8-2*x) = C(8); C(2*x); sub = push 8; C(2*x); sub = push 8; C(2); C(x); times; sub = …
Moral • Code generation for stack machine is dirty simple • recursive equation from point view of math • recursive function from point view of CS • think before hack! • But we’d have more to say about: • variable storage • more language features • statement, declarations, functions, etc..
Address space 0xffffffff OS • Address space is the way how programs use memory • highly architecture and OS dependent • right is the typical layout of 32-bit x86/Linux 0xc00000000 stack heap data text 0x08048000 0x00100000 BIOS, VGA 0x00000000
Static Storage • Static storage is an area of space in data section • a typical use is to hold C/C++ file scope variables (static) and extern variable (global) • Exp lang’ has only static variables, all can be stored to static section • so require a pass to collect all variables
Declarations // or in ML datatype decs = T of {var: string, ty: tipe} list // Sample Program int x; 8-2*x; // scale exp a bit prog -> decs exp decs -> int id; decs -> exp -> …
Code gen’ rules D (int id; decs) = id: .int 0 D (decs) D ( ) =
Statement // scale the exp a by adding the following: s -> id = e; -> if (e) s else s // compile: CS (id = e;) = C (e); pop id
Statement, cont’ // s86 should also be modified! // compile: CS (if e s1 s2) = C(e); jz .Lfalse .Ltrue: CS(s1) jmp .Lend .Lfalse: CS(s2) .Lend e s1 s2 …
Moral • It’s also straightforward to translate other control structure in this style • while, for, switch, etc.. • This kind of code generation is called recursive decedent • may be done at parsing time • adopted in many compilers • read the offered article on Borland Turbo Pascal 3.0 • you may safely ignore the Pascal-specific features
From s86 to x86 • Run the generated s86 code? • design a virtual machine • as we did in lab #1 • this is also the way of JVM or CLR • translate to native code and then exec’ it • so-called just-in-time (JIT) • the dominant OO method today… • Next, we discuss the 2nd method • by mapping s86 to x86
Operand Stack // x86 does not have a dedicated operand stack? // Solution 1: use the control stack: ebp, esp // leave to you. // Solution 2: make a fake operand stack, as in: .set PAGE, 4096 .data opStack: .space PAGE, 0xcc top: .int opStack+PAGE // “top” points to stack top, and stack grows // down to lower address
Instructions // map fake s86 instructions to x86’s: .macro s86push x sub dword ptr [top], 4 mov ebx, [top] mov eax, \x mov [ebx], eax .endm // others are similar. // Care must be taken to take account of the // machine constraints. For instance, mem-mem // move is illegal on x86.