560 likes | 573 Views
This lecture explores data and program representations in the x86-64 or x64 architecture, focusing on data representation, encoding, and machine-level operations. It also discusses the evolution of the Intel x86 architecture and the assembly language used to program it.
E N D
Machine-Level Representations Prior lectures • Data representation This lecture • Program representation • Encoding is architecture dependent • We will focus on the Intel x86-64 or x64 architecture • Prior edition used IA32
Intel x86 Evolutionary design starting in 1978 with 8086 • i386 in 1986: First 32-bit Intel CPU (IA32) • Virtual memory fully supported • Pentium4E in 2004: First 64-bit Intel CPU (x86-64) • Adopted from AMD Opteron (2003) • Core 2 in 2006: First multi-core Intel CPU • New features and instructions added over time • Vector operations for multimedia • Memory protection for security • Conditional data movement instructions for performance • Expanded address space for scaling • But, many obsolete features Complex Instruction Set Computer (CISC) • Many different instructions with many different formats • But we’ll only look at a small subset
2015 • Core i7 Broadwell
How do you program it? Initially, no compilers or assemblers Machine code generated by hand! • Error-prone • Time-consuming • Hard to read and write • Hard to debug
Assemblers Assign mnemonics to machine code • Assembly language for specifying machine instructions • Names for the machine instructions and registers • movq %rax, %rcx • There is no standard for x86 assemblers • Intel assembly language • AT&T Unix assembler • Microsoft assembler • GNU uses Unix style with its assembler gas Even with the advent of compilers, assembly still used • Early compilers made big, slow code • Operating Systems were written mostly in assembly, into the 1980s • Accessing new hardware features before compiler has a chance to incorporate them
Then, via C void sumstore(long x, long y, long *D) { long t = plus(x, y); *D = t; } sumstore: pushq%rbx movq%rdx, %rbx call plus movq%rax, (%rbx) popq%rbx ret
Assembly Programmer’s View CPU Memory Visible State to Assembly Program • RIP • Instruction Pointer or Program Counter • Address of next instruction • Register File • Heavily used program data • Condition Codes • Store status information about most recent arithmetic or logical operation • Used for conditional branching Addresses RIP (PC) Registers Object Code Program Data OS Data Data Condition Codes Instructions Stack Memory • Byte addressable array • Code, user data, OS data • Includes stack used to support procedures
64-bit memory map • 48-bit canonical addresses to make page-tables smaller • Kernel addresses have high-bit set 0x7ffe96110000 user stack (created at runtime) %esp (stack pointer) 0xffffffffffffffff memory mapped region for shared libraries reserved for kernel (code, data, heap, stack) 0x7f81bb0b5000 0xffff800000000000 brk run-time heap (managed by malloc) memory invisible to user code read/write segment (.data, .bss) loaded from the executable file read-only segment (.init, .text, .rodata) 0x00400000 unused 0 cat /proc/self/maps
Registers Special memory not part of main memory • Located on CPU • Used to store temporary values • Typically, data is loaded into registers, manipulated or used, and then written back to memory
x86-64 Integer Registers %rax %r8 %eax %r8d %rbx %r9 %ebx %r9d %rcx %r10 %ecx %r10d %rdx %r11 %edx %r11d %rsi %r12 %esi %r12d %rdi %r13 %edi %r13d %rsp %r14 %esp %r14d %rbp %r15 %ebp %r15d Format different since registers added with x86-64
64-bit registers Multiple access sizes %rax, %rbx, %rcx, %rdx %ah, %al : low order bytes (8 bits) %ax : low word (16 bits) %eax : low “double word” (32 bits) %rax : quad word (64 bits) 31 15 7 0 63 %ax %rax %eax %ah %al Similar access for %rdi, %rsi, %rbp, %rsp
64-bit registers Multiple access sizes %r8, %r9, … , %r15 %r8b : low order byte (8 bits) %r8w : low word (16 bits) %r8d : low “double word” (32 bits) %r8 : quad word (64 bits) 31 15 7 0 63 %r8w %r8 %r8d %r8b
Register evolution The x86 architecture initially “register poor” • Few general purpose registers (8 in IA32) • Initially, driven by the fact that transistors were expensive • Then, driven by the need for backwards compatibility for certain instructions pusha (push all) and popa (pop all) from 80186 • Other reasons • Makes context-switching amongst processes easy (less register-state to store) • Fast caches easier to add to than more registers (L1, L2, L3 etc.)
Instructions A typical instruction acts on 2 or more operands of a particular width • addq %rcx, %rdxadds the contents of rcxto rdx • “addq” stands for add “quad word” • Size of the operand denoted in instruction • Why “quad word” for 64-bit registers? • Baggage from 16-bit processors Now we have these crazy terms • 8 bits = byte = addb • 16 bits = word = addw • 32 bits = double or long word = addl • 64 bits = quad word = addq
Instruction operands %rax %rcx Example instruction movqSource, Dest Three operand types • Immediate • Constant integer data (C constant) • Preceded by $ (e.g., $0x400, $-533) • Encoded directly into instructions • Register: One of 16 integer registers • Example: %rax, %r13 • Note %rspreserved for special use • Memory: a memory address • Multiple modes • Simplest example: (%rax) %rdx %rbx %rsi %rdi %rsp %rbp %rN
Immediate mode Immediate has only one mode • Form: $Imm • Operand value: Imm • movq$0x8000,%rax • movq$array,%rax • int array[30]; /* array = global var. stored at 0x8000 */ 0x8000 Main memory %rax %rcx 0x8000 array %rdx
Register mode Register has only one mode • Form: Ea • Operand value: R[Ea] • movq %rcx,%rax Main memory %rax %rcx 0x0030 0x8000 %rdx
Memory modes Memory has multiple modes • Absolute • specify the address of the data • Indirect • use register to calculate address • Base + displacement • use register plus absolute address to calculate address • Indexed • Indexed • Add contents of an index register • Scaled index • Add contents of an index register scaled by a constant
Memory modes Memory mode: Absolute • Form: Imm • Operand value: M[Imm] • movq0x8000,%rax • movqarray,%rax • long array[30]; /* global variable at 0x8000 */ Main memory %rax %rcx 0x8000 array %rdx
Memory modes Memory mode: Indirect • Form: (Ea) • Operand value:M[R[Ea]] • Register Ea specifies the memory address • movq (%rcx),%rax Main memory %rax %rcx 0x8000 0x8000 %rdx
Memory modes Memory mode: Base + Displacement • Form: Imm(Eb) • Used to access structure members • Operand value: M[Imm+R[Eb]] • Register Eb specifies start of memory region • Imm specifies the offset/displacement • movq 16(%rcx),%rax Main memory 0x8018 %rax 0x8010 0x8008 %rcx 0x8000 0x8000 %rdx
Memory modes Memory mode: Scaled indexed • Most general format • Used for accessing structures and arrays in memory • Form: Imm(Eb,Ei,S) • Operand value: M[Imm+R[Eb]+S*R[Ei]] • Register Eb specifies start of memory region • Ei holds index • S is integer scale (1,2,4,8) • movq8(%rdx,%rcx,8),%rax Main memory 0x8028 0x8020 0x8018 %rax 0x8010 0x8008 %rcx 0x03 0x8000 %rdx 0x8000
Operand examples usingmovq Source Destination C Analog • Memory-memory transfers cannot be done with single instruction Reg movq$0x4,%rax temp = 0x4; Imm Mem movq$-147,(%rax) *p = -147; Reg movq %rax,%rdx temp2 = temp1; movq Reg Mem movq %rax,(%rdx) *p = temp; Mem Reg movq (%rax),%rdx temp = *p;
Addressing Mode walkthrough addl 12(%rbp),%ecx movb(%rax,%rcx),%dl subq %rdx,(%rcx,%rax,8) incw0xA(,%rcx,8) Also note: We do not put ‘$’ in front of constants unless they are used to indicate immediate mode. The following are incorrect Add the double word at address rbp+ 12 to ecx Load the byte at address rax+ rcxinto dl Subtract rdxfrom the quad wordat address rcx+(8*rax) Increment the word at address0xA+(8*rcx) addl$12(%rbp),%ecx subq %rdx,(%rcx,%rax,$8) incw$0xA(,%rcx,$8)
Carnegie Mellon Address computation walkthrough • 0xf000 + 0x8 0xf008 • 0xf000 + 0x100 0xf100 • 0xf000 + 4*0x100 0xf400 • 2*0xf000 + 0x80 0x1e080
Register Value Operand Value %rax %rax 0x100 %rcx 0x1 0x108 %rdx 0x3 $0x108 (%rax) Address Value 8(%rax) 0x100 0xFF 13(%rax, %rdx) 0x108 0xAB 260(%rcx, %rdx) 0x110 0x13 0xF8(, %rcx, 8) 0x118 0x11 (%rax, %rdx, 8) Practice Problem 3.1 0x100 0xAB 0x108 0xFF 0xAB 0x13 0xAB 0xFF 0x11
Example: swap() Memory Registers void swap(long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; } %rdi %rsi %rax %rdx Register Value %rdixp %rsiyp %raxt0 %rdxt1 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret
Understanding swap() Memory Registers Address 0x120 123 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 0x108 %rdx 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret
Understanding swap() Memory Registers Address 0x120 123 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret
Understanding swap() Memory Registers Address 0x120 123 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret
Understanding swap() Memory Registers Address 0x120 456 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret
Understanding swap() Memory Registers Address 0x120 456 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 123 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret
Practice Problem 3.5 A function has this prototype: long decode(long *xp, long *yp, long *zp); Here is the body of the code in assembly language: /* xp in %rdi, yp in %rsi, zp in %rdx */ 1 movq (%rdi), %r8 2 movq (%rsi), %rcx 3 movq (%rdx), %rax 4 movq %r8,(%rsi) 5 movq %rcx,(%rdx) 6 movq %rax,(%rdi) long decode(long *xp, long *yp, long *zp) { long x = *xp; /* Line 1 */ long y = *yp; /* Line 2 */ long z = *zp; /* Line 3 */ *yp = x; /* Line 6 */ *zp = y; /* Line 8 */ *xp = z; /* Line 7 */ return z; } Write C code for this function
Practice walkthrough Suppose an array in C is declared as a global variable: long array[34]; Write some assembly code that: • sets rsito the address of array • sets rbxto the constant 9 • loads array[9] into register rax. Use scaled index memory mode movq$array,%rsi movq$0x9,%rbx movq (%rsi,%rbx,8),%rax
Load address Load Effective Address (Quad) • leaqS, D D ← &S • Loads the address of S in D, not the contents • leaq (%rax),%rdx • Equivalent to movq %rax,%rdx • Destination must be a register • Used to compute addresses without a memory reference • e.g., translation of p = &x[i];
Load address • leaqS, D D ← &S • Commonly used by compiler to do simple arithmetic • If %rdx = x, • leaq 7(%rdx, %rdx, 4), %rdx 5x + 7 • Multiply and add all in one instruction • Example long m12(longx) { return x*12; } Converted to ASM by compiler: leaq(%rdi,%rdi,2), %rax # t <- x+x*2 salq$2, %rax #return t<<2
Practice Problem 3.6 walkthrough %rax= x, %rcx= y Expression Result in %rdx leaq 6(%rax), %rdx x+6 leaq (%rax, %rcx), %rdx x+y leaq (%rax, %rcx, 4), %rdx x+4y leaq 7(%rax, %rax, 8), %rdx 9x+7 leaq 0xA(, %rcx, 4), %rdx 4y+10 leaq 9(%rax, %rcx, 2), %rdx x+2y+9
Carnegie Mellon Two Operand Arithmetic Operations Accumulated operation • Second operand is both a source and destination • A bit like C operators ‘+=‘, ‘-=‘, etc. • Max shift is 64 bits, so k is either an immediate byte, or register (e.g. %clwhere %cl is byte 0 of register %rcx) • FormatComputation addqS, DD= D + S subqS, D D= D S imulqS, D D= D * S salqS, D D= D << S Also called shlq sarqS, D D= D >> S Arithmetic shift right (sign extend) shrqS, D D= D >> SLogical shift right (zero fill) xorqS, D D= D ^ S andqS, D D= D & S orqS, D D= D | S
Carnegie Mellon One Operand Arithmetic Operations • FormatComputation incqDD = D + 1 decqDD= D 1 negqDD= D notqDD= ~D • See book for more instructions
Address Value Register Value %rax 0x100 0x100 0xFF %rcx 0x1 0x108 0xAB %rdx 0x3 0x110 0x13 0x118 0x11 Practice Problem 3.8 Instruction Destination address Result addq %rcx, (%rax) 0x100 0x100 subq %rdx, 8(%rax) 0x108 0xA8 imulq$16, (%rax, %rdx, 8) 0x118 0x110 0x110 0x14 incq 16(%rax) decq %rcx %rcx0x0 subq %rdx, %rax %rax0xFD
Practice Problem 3.9 long shift_left4_rightn(long x, long n) { x <<= 4; x >>= n; return x; } _shift_left4_rightn: movq %rdi, %rax ; get x ; x <<= 4; • movq %rsi, %rcx ; get n ; x >>= n; ret salq $4, %rax sarq %cl, %rax
Carnegie Mellon Arithmetic Expression Example arith: leaq(%rdi,%rsi), %rax # t1 addq%rdx, %rax # t2 leaq(%rsi,%rsi,2), %rdx salq$4, %rdx # t4 leaq4(%rdi,%rdx), %rcx # t5 imulq%rcx, %rax # rval ret long arith (long x, long y, long z) { long t1 = x+y; long t2 = z+t1; long t3 = x+4; long t4 = y * 48; long t5 = t3 + t4; long rval = t2 * t5; return rval; } Compilertrick to generate efficient code
Practice Problem 3.10 What does this instruction do? How might it be different than this instruction? xorq %rdx, %rdx Zeros out register movq $0, %rdx 3-byte instruction versus 7-byte Null bytes encoded in instruction
Exam practice • Chapter 3 Problems (Part 1) • 3.1 x86 operands • 3.2,3.3 instruction operand sizes • 3.4 instruction construction • 3.5 disassemble to C • 3.6 leaq • 3.7 leaq disassembly • 3.8 operations in x86 • 3.9 fill in x86 from C • 3.10 fill in C from x86 • 3.11 xorq
Definitions Architecture or instruction set architecture (ISA) • Instruction specification, registers • Examples: x86 IA32, x86-64, ARM Microarchitecture • Implementation of the architecture • Examples: cache sizes and core frequency Machine code (or object code) • Byte-level programs that a processor executes Assembly code • A text representation of machine code
Disassembling Object Code Disassembled • Disassembler objdump–d sumstore • Useful tool for examining object code • Analyzes bit pattern of series of instructions • Produces approximate rendition of assembly code • Can be run on either a.out (complete executable) or .o file 0000000000400595 <sumstore>: 400595: 53 push %rbx 400596: 48 89 d3 mov%rdx,%rbx 400599: e8 f2 ffffffcallq400590 <plus> 40059e: 48 89 03 mov%rax,(%rbx) 4005a1: 5b pop %rbx 4005a2: c3 retq