460 likes | 581 Views
COMP 40: Machine Structure and Assembly Language Programming – Spring 2014. Machine/Assembler Language Control Flow & Compiling Function Calls. Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah. Goals for today – learn:.
E N D
COMP 40: Machine Structure and Assembly Language Programming – Spring 2014 Machine/Assembler LanguageControl Flow & Compiling Function Calls Noah Mendelsohn Tufts UniversityEmail: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah
Goals for today – learn: • Review last week’s intro to machine & assembler lang • More examples • Function calls • Control flow
General Purpose Registers 63 0 %rax %eax %ax %ah $al
General Purpose Registers 63 0 %rax %eax %ax %ah $al mov $123,%rax
General Purpose Registers 63 0 %rax %eax %ax %ah $al mov $123,%ax
General Purpose Registers 63 0 %rax %eax %ax %ah $al mov $123,%eax
X86-64 /AMD 64 / IA 64 General Purpose Registers 0 63 31 0 63 %eax %ah %ch %dh %bh $bl $al $dl $cl %dx %bx %ax %cx %rax %r8 %rcx %ecx %r9 %rdx %r10 %edx %rbx %r11 %ebx %si %rsi %r12 %esi %di %rdi %r13 %edi %sp %rsp %r14 %esp %bp %rbp %r15 %ebp
Classes of AMD 64 registers new since last lecture • General purpose registers • 16 registers, 64 bits each • Used to compute integer and pointer values • Used for integer call/return values to functions • XMM registers • 16 Registers, 128 bits each • Used to compute float/double values, and for parallel integer computation • Used to pass double/float call/return values • X87 Floating Point registers • 8 registers, 80 bits each • Used to compute, pass/return long double
Machine code (typical) • Simple instructions – each does small unit of work • Stored in memory • Bitpacked into compact binary form • Directly executed by transistor/hardware logic* * We’ll show later that some machines execute user-provided machine code directly, some convert it to an even lower level machine code and execute that.
Here’s the machine code for our function int times16(inti) { return i * 16; } Remember: This is what’s really in memory and what the machine executes! 89 f8 c1 e0 04 c3
Here’s the machine code for our function int times16(inti) { return i * 16; } But what does it mean?? 89 f8 c1 e0 04 c3 Does it really implement the times16 function?
Consider a simple function in C int times16(inti) { return i * 16; } 0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq
Consider a simple function in C Load i into result register %eax int times16(inti) { return i * 16; } 0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq
Consider a simple function in C Shifting left by 4 is quick way to multiply by 16. int times16(inti) { return i * 16; } 0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq
Consider a simple function in C Return to caller, which will look for result in %eax int times16(inti) { return i * 16; } 0: 89 f8 mov %edi,%eax 2: c1 e0 04 shl $0x4,%eax 5: c3 retq REMEMBER: you can see the assembler code for any C program by running gcc with the –S flag. Do it!!
INTERPRETER Software or hardware that does what the instructions say COMPILER Software that converts a program to another language ASSEMBLER Like a compiler, but the input assembler language is (mostly)1-to-1 with machine instructions
Very simplified view of computer Memory Cache
Very simplified view of computer Memory 89 f8 c1 e0 04 c3 Cache
Instructions fetched and decoded Memory 89 f8 c1 e0 04 c3 Cache
Instructions fetched and decoded Memory 89 f8 c1 e0 04 c3 Cache ALUArithmetic and Logic Unit executes instructions like add and shift updating registers.
Remember We will teach some highlights of machine/assembler code here in class but you musttake significant time to learn and practice on your own! Suggestions for teaching yourself: • Read Bryant and O’Hallaroncarefully • Remember that 64 bit section was added later • Look for online reference material • Some linked from HW5 • Write examples, compile with –S, figure out resulting .s file!
Examples %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4
Examples %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal(%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))
movl Moves the data at the address:similar to: char *ebxp; char edx = ebxp[ecx] Examples %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal(%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))
leal Moves the address itself:similar to: char *ebxp; char *edxp = &(ebxp[ecx]); Examples %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal(%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))
scale factors support indexing larger types similar to: int *ebxp; • intedxp = ebxp[ecx]; Examples %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) movl (%ebx, %ecx, 4), %edx // edx <- *(ebx + (ecx * 4)) leal (%ebx, %ecx, 4), %edx // edx <- (ebx + (ecx * 4))
Simple jumps .L4: …code here… j .L4 // jump back to L4
Conditional jumps .L4: movq (%rdi,%rdx), %rcx leaq (%rax,%rcx), %rsi testq %rcx, %rcx cmovg %rsi, %rax addq $8, %rdx cmpq %r8, %rdx jne .L4 // conditional: jump iff %r8 != %rdf …code here… This technique is the key to compiling if statements, for loops, while loops, etc.
Why have a standard “linkage” for calling functions? • Functions are compiled separately and linked together • We need to standardize enough that function calls will succeed! • Note: optimizing compilers may “cheat” when caller and callee are in the same source file • More on this later
An interesting example int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } See course notes on “Calls and Returns”
The process memory illusion • Process thinks it's running in a private space • Separated into segments, from address 0 • Stack: memory for executing subroutines • Heap: memory for malloc/new • Global static variables • Text segment: where program lives Now we’ll learn how your program usesthese!
The process memory illusion argv, environ Stack • Process thinks it's running in a private space • Separated into segments, from address 0 • Stack: memory for executing subroutines • Heap: memory for malloc/new • Global static variables • Text segment: where program lives Heap (malloc’d) Static uninitialized Static initialized Loaded with your program Text(code) 0
The process memory illusion argv, environ Stack • Process thinks it's running in a private space • Separated into segments, from address 0 • Stack: memory for executing subroutines • Heap: memory for malloc/new • Global static variables • Text segment: where program lives We’re about to study in depth how function calls use the stack. Heap (malloc’d) Static uninitialized Static initialized Loaded with your program Text(code) 0
Function calls on Linux/AMD 64 • Caller “pushes” return address on stack • Where practical, arguments passed in registers • Exceptions: • Structs, etc. • Too many • What can’t be passed in registers is at known offsets from stack pointer! • Return values • In register, typically %rax for integers and pointers • Exception: structures • Each function gets a stack frame • Leaf functions that make no calls may skip setting one up
The stack – general case Before call After callq If callee needs frame Arguments Arguments Arguments %rsp Return address ???? ???? Return address %rsp Callee vars framesize Args to next call? %rsp sub $0x{framesize},%rsp
Arguments/return values in registers Arguments and return values passed in registers when types are suitable and when there aren’t too many Return values usually in %rax, %eax, etc.Callee may change these and some other registers! MMX and FP 87 registers used for floating pointRead the specifications for full details!
int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } Factorial Revisited fact: .LFB2: pushq %rbx .LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je .L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax .L4: popq %rbx ret
Function calls on Linux/AMD 64 (cont.) • Much of what you’ve seen can be skipped for “leaf” functions in the call tree • Inlining: • If small procedure is in same source file (or included header): just merge the code • Unless function is static (I.e. private to source file, the compiler still needs to create the normal version, in case anyone outside calls it! • Optimizing compilers “cheat” • Don’t build full stack • Leave return address for called function (last thing A calls B; last think B does is call C B leaves return address on stack and branches to C instead of calling it…when C does normal return, it goes straigt back to A!) • Many other wild optimizations, always done so other functions can’t tell anything unusual is happening!
int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } What happened to the recursion?!?!? Optimized version fact: .LFB2: pushq %rbx .LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je .L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax .L4: popq %rbx ret fact: .LFB2: testq %rdi, %rdi movl $1, %eax je .L6 .p2align 4,,7 .L5: imulq %rdi, %rax subq $1, %rdi jne .L5 .L6: rep ; ret This version doesn’t need to create a stack frame either! Lightly optimized = O1(what we saw before) Heavily optimized = O2
Getting the details on function call “linkages” • Bryant and O’Halloran has excellent introduction • Watch for differences between 32 and 64 bit • The official specification: • System V Application Binary Interface: AMD64 Architecture Processor Supplement • Find it at: http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf • See especially sections 3.1 and 3.2
Summary • C code compiled to assembler • Data moved to registers for manipulation • Conditional and jump instructions for control flow • Stack used for function calls • Compilers play all sorts of tricks when compiling code