Memory Layout

Memory Layout Compiler Baojian Hua bjhua@ustc.edu.cn

Middle and Back End translation AST IR1 translation IR2 more IRs and translation asm

Sources and IRs CODE DATA Procedures Global Static Variables Global Dynamic Data Control Flow Local Variables Temporaries Statements Parameter Passing Data Access Read-only Data

A code generator should… • Translate all “CODE” to machine (or assembly) instructions • target-dependent • Allocate space for variables, etc. (“DATA”) • Respect the calling conventions and other constraints • To do all these, must know details of modern processors! • and the impact on code generation

Overview of a modern processor ALU Control Memory Registers Memory Registers ALU Control

Arithmetic and Logic Unit Most arithmetic and logic operation addl %eax, %ebx incl 4(%ecx) Operands: immediate register memory Memory Registers ALU Control

Arithmetic and Logic Unit Operations may have constraints how to perform a division? cltd; idivl ... Operations may raise exceptions idivl 0 Operations on different types addb, addw, addl, addq Memory Registers ALU Control

Control Executing instructions instructions are in memory (pointed by PC) for (;;) instruction = *PC; PC++; execute (instruction); Memory Registers ALU Control

Registers Limited but high-speed 8 on x86, more on RISC Most are general-purpose but some are of special use Memory Registers ALU Control

Memory 0xffffffff OS • Address space is the way how programs use memory • highly architecture and OS dependent • right is the typical layout of 32-bit x86/Linux 0xc00000000 stack heap data text 0x08048000 0x00100000 BIOS, VGA 0x00000000

Read Only Data OS char *s=“hello”; void f () {printf(s);} stack .text f: pushl $s call printf s: .string “hello” heap Global Static Variables Procedures data Global Dynamic Data Control Flow Local Variables text Temporaries Statements Parameter Passing BIOS, VGA Data Access Read-only Data

Global Static Variables OS int d = 1; void f (){ d++; } stack .text f: movl d, %eax incl %eax movl %eax, d .data d: .int 1 heap Global Static Variables Procedures data Global Dynamic Data Control Flow Local Variables text Temporaries Statements Parameter Passing BIOS, VGA Data Access Read-only Data

Global Dynamic Data OS void f (){ malloc(4); } stack .text f: pushl $4 call malloc movl %eax, %ebx heap Global Static Variables Procedures data Global Dynamic Data Control Flow Local Variables text Temporaries Statements Parameter Passing BIOS, VGA Data Access Read-only Data

Function, or Procedure, or method, or … • High-level abstraction of code • logically-grouped • Good for many things: • design and abstraction • develop, testing, maintain and evolve • … • Implementation? • we start with C-style functions, and deal with more advanced forms later

API & ABI • Application Programming Interface • interfaces between source programs • Application Binary Interface • contracts between binary programs • even compiled from different languages by different compilers • conventions on low-level details: • how to pass arguments? • how to return values? • how to make use of registers? • … • we posted the x86 ABI document on course page

Parameter Passing

Parameter passing • Must answer two problems: • what to pass? • call-by-value • call-by-reference • call-by-need • … • how to pass? • calling convention • http://en.wikipedia.org/wiki/X86_calling_conventions

Call-by-reference // C++ style reference: int f (int &x, int y) { x = 3; y = 4; return 0; } // a call f (a, b); • In languages such as C++ • arguments are escaped • so can not be constants? • actual arguments and formal parameters are aliases

Simulating call-by-reference // simulated: int f (int *x, int y) { *x = 3; y = 4; return 0; } // the call becomes: f (&a, b); // original C++ code: int f (int &x, int y) { x = 3; y = 4; return 0; } // a call f (a, b);

Moral • Call-by-reference is widely considered a wrong design of C++ • the code is inherently inefficient! • the code is ambiguous in nature • x = 4; (?) • A variant of this is the so-called call-by-value/result • looks like call-by-value, but with effect

Call-by-value/result // code: int f (int @x, int y) { x = 3; y = 4; return 0; } // a call f (a, b); • Upon call, the actual arguments is copies • But callee only modifies a local version • Upon exit, callee copies the local version to actual arguments • and formal parameters are aliases

Simulating call-by-value/result // simulated: int f (int *x, int y) { int temp = *x; temp = 3; y = 4; *x = temp; return 0; } // the call becomes: f (&a, b); // original code: int f (int @x, int y) { x = 3; y = 4; return 0; } // a call f (a, b);

Moral • What’s the difference between call-by-value and call-by-value-result? • Is call-by-value/result more efficient than call-by-reference? Why or why not? • We’d come back to a more interesting optimization called register promotion • same idea to pull value into registers

Call-by-name // code: int f (int name x, int y) { if (y) return x; else return 0; } // a call f (a, b); • Some languages, such as Algo60 and Haskell, use call-by-name • Arguments are not evaluated, until they are really needed in the callee • For each argument, create a function, called a thunk

Simulating call-by-name // simulated: int f (fX: unit -> int, int y) { if (y) return fX (); else return 0; } // the call becomes: f (fn () => a, b); // original code: int f (int name x, int y) { if (y) return x; else return 0; } // a call f (a, b); this function is not closed!

Moral • A serious problem with call-by-name, is that the arguments may be evaluated many times • A better solution is to memoize the evaluation result • This method is called call-by-need, or sometimes lazy-evaluation

Simulating call-by-need // simulated: int f (fX: unit -> int, int y) { if (y) return fX() + fX(); else return 0; } // the call becomes: val xMemoize = ref NONE f (fn () => case !xMemoize of NONE => a; store | SOME i => i, b); // original code: int f (int need x, int y) { if (y) return x + x; else return 0; } // a call f (a, b);

Where to pass the parameters? • Different calling conventions: • pass them in registers • pass them on stack (typically: the call stack) • a combination of the two • parts in registers, parts on the stack • This involves not only the ISA, but also the languages

Sample Calling Conventions for C on x86 (from Wiki)

Registers

Register usage • Must be careful on register usage • caller-save: Callee is free to destroy these registers • eax, ecx, edx, eflags, fflags • [and also all FP registers] • callee-save: Callee must restore these registers before returning to caller • ebp, esp, ebx, esi, edi • [and also FP register stack top]

Register usage • Should value reside in caller-save or callee-save registers? • not so easy to determine • and no general rules • must be veryyyyyyyyy careful with language features such as longjmp, goto or exceptions • we’d come back to this later • We’d also come back to this issue later in register allocation part

The Call Stack

Stack on x86 • Two dedicated regs • Stack grows down to lower address • Frame also called activation record high address frame 0 frame 1 %ebp frame 2 %esp low address

Stack Frame int f (int arg0, int arg1, …) { int local1; int local2; …; } … arg1 arg0 ret addr Global Static Variables Procedures old ebp Global Dynamic Data local1 %ebp Control Flow Local Variables local2 Temporaries Statements … Parameter Passing %esp Data Access Read-only Data

Put these together // x86 code main: pushl %ebp movl %esp, %ebp pushl $8 call f incl %eax leave ret // C code int main(void) { return f(8)+1; } int f(int x) { return g(x); } int g(int x) { return x+3; }

Put these together // x86 code f: pushl %ebp movl %esp, %ebp pushl 8(%ebp) call g leave ret // C code int main(void) { return f(8)+1; } int f(int x) { return g(x); } int g(int x) { return x+3; }

Put these together // x86 code g: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax addl $3, %eax leave ret // C code int main(void) { return f(8)+1; } int f(int x) { return g(x); } int g(int x) { return x+3; }

Implementation • Design a frame (activation record) data structure • the frame size • garbage collection info • detailed layout, etc. • Thus, hide the machine-related details • good for retargeting the compiler

Interface signature FRAME = sig type t (* allocate space for a variable in frame *) val allocVar: unit -> unit (* create a new frame *) val new: unit -> t (* current size of the frame *) val size: unit -> int end

Frame on stack • Both function arguments and locals have a FIFO lifetime as with functions • so one can put stack frame on the call stack • But later, we have the chance to see other possibilities • e.g.: higher-order nested functions

Nested Function

Nested Functions int f (int x, int y) { int m; int g (int z) { int h () { return m+z; } return 1; } return 0; } • Functions declared in the body of another function • So the inner one could refer to the variables in the outer ones • such kind of functions are called open

Nested Functions int f (int x, int y) { int m; int g (int z) { int h () { return m+z; } return 1; } return 0; } • How to access those variables in outer functions? • Three classical methods: • lambda lifting • static link • display

Lambda lifting • In lambda lifting, the program is translated into a form such that all procedures are closed • The translation process starts with the inner-most procedures and works its way outwards

Lambda lifting example int f (int x, int y) { int m; int g (int z) { int h () { return m+z; } return 1; } return 0; } int f (int x, int y) { int m; int g (int z) { int h (int &m, &z) { return m+z; } return 1; } return 0; }

Lambda lifting example int f (int x, int y) { int m; int g (int z) { int h () { return m+z; } return 1; } return 0; } int f (int x, int y) { int m; int g (int &m, int z) { int h (int &m, &z) { return m+z; } return 1; } return 0; }

Lambda lifting example int f (int x, int y) { int m; int g (int z) { int h () { return m+z; } return 1; } return 0; } // flatten int f (int x, int y){ int m; return 0; } int g (int &m, int z){ return 1; } int h (int &m, &z){ return m+z; }

Moral • Pros: • easy to implement, source-to-source translations • even before code generation • Cons: • all variables are escaped • extra arguments passing • on some architectures, more arguments are passed in memory, so it’s inefficient

Memory Layout

Memory Layout

Presentation Transcript

Typical Memory Layout

A Type Theory for Memory Allocation and Data Layout

Memory Layout and SLC500 System Addresses

Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory

Memory Layout

Layout

Layout

Memory Layout for Process

Memory Layout

Memory Layout for Process

On Improving Heap Memory Layout by Dynamic Pool Allocation

Layout

Virtual Memory Layout

Memory Layout and SLC500™ System Addresses

LAYOUT

Layout

Program and memory layout