1 / 56

x86 Data Access and Operations

This lecture explores data and program representations in the x86-64 or x64 architecture, focusing on data representation, encoding, and machine-level operations. It also discusses the evolution of the Intel x86 architecture and the assembly language used to program it.

arlened
Download Presentation

x86 Data Access and Operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. x86 Data Access and Operations

  2. Machine-Level Representations Prior lectures • Data representation This lecture • Program representation • Encoding is architecture dependent • We will focus on the Intel x86-64 or x64 architecture • Prior edition used IA32

  3. Intel x86 Evolutionary design starting in 1978 with 8086 • i386 in 1986: First 32-bit Intel CPU (IA32) • Virtual memory fully supported • Pentium4E in 2004: First 64-bit Intel CPU (x86-64) • Adopted from AMD Opteron (2003) • Core 2 in 2006: First multi-core Intel CPU • New features and instructions added over time • Vector operations for multimedia • Memory protection for security • Conditional data movement instructions for performance • Expanded address space for scaling • But, many obsolete features Complex Instruction Set Computer (CISC)‏ • Many different instructions with many different formats • But we’ll only look at a small subset

  4. 2015 • Core i7 Broadwell

  5. How do you program it? Initially, no compilers or assemblers Machine code generated by hand! • Error-prone • Time-consuming • Hard to read and write • Hard to debug

  6. Assemblers Assign mnemonics to machine code • Assembly language for specifying machine instructions • Names for the machine instructions and registers • movq %rax, %rcx • There is no standard for x86 assemblers • Intel assembly language • AT&T Unix assembler • Microsoft assembler • GNU uses Unix style with its assembler gas Even with the advent of compilers, assembly still used • Early compilers made big, slow code • Operating Systems were written mostly in assembly, into the 1980s • Accessing new hardware features before compiler has a chance to incorporate them

  7. Then, via C void sumstore(long x, long y, long *D) { long t = plus(x, y); *D = t; } sumstore: pushq%rbx movq%rdx, %rbx call plus movq%rax, (%rbx) popq%rbx ret

  8. Assembly Programmer’s View CPU Memory Visible State to Assembly Program • RIP • Instruction Pointer or Program Counter • Address of next instruction • Register File • Heavily used program data • Condition Codes • Store status information about most recent arithmetic or logical operation • Used for conditional branching Addresses RIP (PC) Registers Object Code Program Data OS Data Data Condition Codes Instructions Stack Memory • Byte addressable array • Code, user data, OS data • Includes stack used to support procedures

  9. 64-bit memory map • 48-bit canonical addresses to make page-tables smaller • Kernel addresses have high-bit set 0x7ffe96110000 user stack (created at runtime)‏ %esp (stack pointer)‏ 0xffffffffffffffff memory mapped region for shared libraries reserved for kernel (code, data, heap, stack)‏ 0x7f81bb0b5000 0xffff800000000000 brk run-time heap (managed by malloc)‏ memory invisible to user code read/write segment (.data, .bss)‏ loaded from the executable file read-only segment (.init, .text, .rodata)‏ 0x00400000 unused 0 cat /proc/self/maps

  10. Registers Special memory not part of main memory • Located on CPU • Used to store temporary values • Typically, data is loaded into registers, manipulated or used, and then written back to memory

  11. x86-64 Integer Registers %rax %r8 %eax %r8d %rbx %r9 %ebx %r9d %rcx %r10 %ecx %r10d %rdx %r11 %edx %r11d %rsi %r12 %esi %r12d %rdi %r13 %edi %r13d %rsp %r14 %esp %r14d %rbp %r15 %ebp %r15d Format different since registers added with x86-64

  12. 64-bit registers Multiple access sizes %rax, %rbx, %rcx, %rdx %ah, %al : low order bytes (8 bits) %ax : low word (16 bits) %eax : low “double word” (32 bits) %rax : quad word (64 bits) 31 15 7 0 63 %ax %rax %eax %ah %al Similar access for %rdi, %rsi, %rbp, %rsp

  13. 64-bit registers Multiple access sizes %r8, %r9, … , %r15 %r8b : low order byte (8 bits) %r8w : low word (16 bits) %r8d : low “double word” (32 bits) %r8 : quad word (64 bits) 31 15 7 0 63 %r8w %r8 %r8d %r8b

  14. Register evolution The x86 architecture initially “register poor” • Few general purpose registers (8 in IA32) • Initially, driven by the fact that transistors were expensive • Then, driven by the need for backwards compatibility for certain instructions pusha (push all) and popa (pop all) from 80186 • Other reasons • Makes context-switching amongst processes easy (less register-state to store)‏ • Fast caches easier to add to than more registers (L1, L2, L3 etc.)

  15. Instructions A typical instruction acts on 2 or more operands of a particular width • addq %rcx, %rdxadds the contents of rcxto rdx • “addq” stands for add “quad word” • Size of the operand denoted in instruction • Why “quad word” for 64-bit registers? • Baggage from 16-bit processors Now we have these crazy terms • 8 bits = byte = addb • 16 bits = word = addw • 32 bits = double or long word = addl • 64 bits = quad word = addq

  16. C types and x86-64 instructions

  17. Instruction operands %rax %rcx Example instruction movqSource, Dest Three operand types • Immediate • Constant integer data (C constant) • Preceded by $ (e.g., $0x400, $-533) • Encoded directly into instructions • Register: One of 16 integer registers • Example: %rax, %r13 • Note %rspreserved for special use • Memory: a memory address • Multiple modes • Simplest example: (%rax) %rdx %rbx %rsi %rdi %rsp %rbp %rN

  18. Immediate mode Immediate has only one mode • Form: $Imm • Operand value: Imm • movq$0x8000,%rax • movq$array,%rax • int array[30]; /* array = global var. stored at 0x8000 */ 0x8000 Main memory %rax %rcx 0x8000 array %rdx

  19. Register mode Register has only one mode • Form: Ea • Operand value: R[Ea] • movq %rcx,%rax Main memory %rax %rcx 0x0030 0x8000 %rdx

  20. Memory modes Memory has multiple modes • Absolute • specify the address of the data • Indirect • use register to calculate address • Base + displacement • use register plus absolute address to calculate address • Indexed • Indexed • Add contents of an index register • Scaled index • Add contents of an index register scaled by a constant

  21. Memory modes Memory mode: Absolute • Form: Imm • Operand value: M[Imm] • movq0x8000,%rax • movqarray,%rax • long array[30]; /* global variable at 0x8000 */ Main memory %rax %rcx 0x8000 array %rdx

  22. Memory modes Memory mode: Indirect • Form: (Ea)‏ • Operand value:M[R[Ea]] • Register Ea specifies the memory address • movq (%rcx),%rax Main memory %rax %rcx 0x8000 0x8000 %rdx

  23. Memory modes Memory mode: Base + Displacement • Form: Imm(Eb)‏ • Used to access structure members • Operand value: M[Imm+R[Eb]] • Register Eb specifies start of memory region • Imm specifies the offset/displacement • movq 16(%rcx),%rax Main memory 0x8018 %rax 0x8010 0x8008 %rcx 0x8000 0x8000 %rdx

  24. Memory modes Memory mode: Scaled indexed • Most general format • Used for accessing structures and arrays in memory • Form: Imm(Eb,Ei,S)‏ • Operand value: M[Imm+R[Eb]+S*R[Ei]] • Register Eb specifies start of memory region • Ei holds index • S is integer scale (1,2,4,8)‏ • movq8(%rdx,%rcx,8),%rax Main memory 0x8028 0x8020 0x8018 %rax 0x8010 0x8008 %rcx 0x03 0x8000 %rdx 0x8000

  25. Operand examples usingmovq Source Destination C Analog • Memory-memory transfers cannot be done with single instruction Reg movq$0x4,%rax temp = 0x4; Imm Mem movq$-147,(%rax)‏ *p = -147; Reg movq %rax,%rdx temp2 = temp1; movq Reg Mem movq %rax,(%rdx)‏ *p = temp; Mem Reg movq (%rax),%rdx temp = *p;

  26. Addressing Mode walkthrough addl 12(%rbp),%ecx movb(%rax,%rcx),%dl subq %rdx,(%rcx,%rax,8) incw0xA(,%rcx,8) Also note: We do not put ‘$’ in front of constants unless they are used to indicate immediate mode. The following are incorrect Add the double word at address rbp+ 12 to ecx Load the byte at address rax+ rcxinto dl Subtract rdxfrom the quad wordat address rcx+(8*rax)‏ Increment the word at address0xA+(8*rcx)‏ addl$12(%rbp),%ecx subq %rdx,(%rcx,%rax,$8) incw$0xA(,%rcx,$8)

  27. Carnegie Mellon Address computation walkthrough • 0xf000 + 0x8 0xf008 • 0xf000 + 0x100 0xf100 • 0xf000 + 4*0x100 0xf400 • 2*0xf000 + 0x80 0x1e080

  28. Register Value Operand Value %rax %rax 0x100 %rcx 0x1 0x108 %rdx 0x3 $0x108 (%rax)‏ Address Value 8(%rax)‏ 0x100 0xFF 13(%rax, %rdx)‏ 0x108 0xAB 260(%rcx, %rdx)‏ 0x110 0x13 0xF8(, %rcx, 8)‏ 0x118 0x11 (%rax, %rdx, 8)‏ Practice Problem 3.1 0x100 0xAB 0x108 0xFF 0xAB 0x13 0xAB 0xFF 0x11

  29. Example: swap() Memory Registers void swap(long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; } %rdi %rsi %rax %rdx Register Value %rdixp %rsiyp %raxt0 %rdxt1 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret

  30. Understanding swap() Memory Registers Address 0x120 123 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 0x108 %rdx 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret

  31. Understanding swap() Memory Registers Address 0x120 123 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret

  32. Understanding swap() Memory Registers Address 0x120 123 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret

  33. Understanding swap() Memory Registers Address 0x120 456 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 456 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret

  34. Understanding swap() Memory Registers Address 0x120 456 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 123 0x108 %rdx 456 123 0x100 swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret

  35. Practice Problem 3.5 A function has this prototype: long decode(long *xp, long *yp, long *zp); Here is the body of the code in assembly language: /* xp in %rdi, yp in %rsi, zp in %rdx */ 1 movq (%rdi), %r8 2 movq (%rsi), %rcx 3 movq (%rdx), %rax 4 movq %r8,(%rsi)‏ 5 movq %rcx,(%rdx)‏ 6 movq %rax,(%rdi)‏ long decode(long *xp, long *yp, long *zp) { long x = *xp; /* Line 1 */ long y = *yp; /* Line 2 */ long z = *zp; /* Line 3 */ *yp = x; /* Line 6 */ *zp = y; /* Line 8 */ *xp = z; /* Line 7 */ return z; } Write C code for this function

  36. Practice walkthrough Suppose an array in C is declared as a global variable: long array[34]; Write some assembly code that: • sets rsito the address of array • sets rbxto the constant 9 • loads array[9] into register rax. Use scaled index memory mode movq$array,%rsi movq$0x9,%rbx movq (%rsi,%rbx,8),%rax

  37. Arithmetic and Logical Operations

  38. Load address Load Effective Address (Quad)‏ • leaqS, D D ← &S • Loads the address of S in D, not the contents • leaq (%rax),%rdx • Equivalent to movq %rax,%rdx • Destination must be a register • Used to compute addresses without a memory reference • e.g., translation of p = &x[i];

  39. Load address • leaqS, D D ← &S • Commonly used by compiler to do simple arithmetic • If %rdx = x, • leaq 7(%rdx, %rdx, 4), %rdx 5x + 7 • Multiply and add all in one instruction • Example long m12(longx) { return x*12; } Converted to ASM by compiler: leaq(%rdi,%rdi,2), %rax # t <- x+x*2 salq$2, %rax #return t<<2

  40. Practice Problem 3.6 walkthrough %rax= x, %rcx= y Expression Result in %rdx leaq 6(%rax), %rdx x+6 leaq (%rax, %rcx), %rdx x+y leaq (%rax, %rcx, 4), %rdx x+4y leaq 7(%rax, %rax, 8), %rdx 9x+7 leaq 0xA(, %rcx, 4), %rdx 4y+10 leaq 9(%rax, %rcx, 2), %rdx x+2y+9

  41. Carnegie Mellon Two Operand Arithmetic Operations Accumulated operation • Second operand is both a source and destination • A bit like C operators ‘+=‘, ‘-=‘, etc. • Max shift is 64 bits, so k is either an immediate byte, or register (e.g. %clwhere %cl is byte 0 of register %rcx) • FormatComputation addqS, DD= D + S subqS, D D= D  S imulqS, D D= D * S salqS, D D= D << S Also called shlq sarqS, D D= D >> S Arithmetic shift right (sign extend) shrqS, D D= D >> SLogical shift right (zero fill) xorqS, D D= D ^ S andqS, D D= D & S orqS, D D= D | S

  42. Carnegie Mellon One Operand Arithmetic Operations • FormatComputation incqDD = D + 1 decqDD= D 1 negqDD=  D notqDD= ~D • See book for more instructions

  43. Address Value Register Value %rax 0x100 0x100 0xFF %rcx 0x1 0x108 0xAB %rdx 0x3 0x110 0x13 0x118 0x11 Practice Problem 3.8 Instruction Destination address Result addq %rcx, (%rax)‏ 0x100 0x100 subq %rdx, 8(%rax)‏ 0x108 0xA8 imulq$16, (%rax, %rdx, 8)‏ 0x118 0x110 0x110 0x14 incq 16(%rax)‏ decq %rcx %rcx0x0 subq %rdx, %rax %rax0xFD

  44. Practice Problem 3.9 long shift_left4_rightn(long x, long n)‏ { x <<= 4; x >>= n; return x; } _shift_left4_rightn: movq %rdi, %rax ; get x ; x <<= 4; • movq %rsi, %rcx ; get n ; x >>= n; ret salq $4, %rax sarq %cl, %rax

  45. Carnegie Mellon Arithmetic Expression Example arith: leaq(%rdi,%rsi), %rax # t1 addq%rdx, %rax # t2 leaq(%rsi,%rsi,2), %rdx salq$4, %rdx # t4 leaq4(%rdi,%rdx), %rcx # t5 imulq%rcx, %rax # rval ret long arith (long x, long y, long z) { long t1 = x+y; long t2 = z+t1; long t3 = x+4; long t4 = y * 48; long t5 = t3 + t4; long rval = t2 * t5; return rval; } Compilertrick to generate efficient code

  46. Practice Problem 3.10 What does this instruction do? How might it be different than this instruction? xorq %rdx, %rdx Zeros out register movq $0, %rdx 3-byte instruction versus 7-byte Null bytes encoded in instruction

  47. Extra slides

  48. Exam practice • Chapter 3 Problems (Part 1) • 3.1 x86 operands • 3.2,3.3 instruction operand sizes • 3.4 instruction construction • 3.5 disassemble to C • 3.6 leaq • 3.7 leaq disassembly • 3.8 operations in x86 • 3.9 fill in x86 from C • 3.10 fill in C from x86 • 3.11 xorq

  49. Definitions Architecture or instruction set architecture (ISA) • Instruction specification, registers • Examples: x86 IA32, x86-64, ARM Microarchitecture • Implementation of the architecture • Examples: cache sizes and core frequency Machine code (or object code) • Byte-level programs that a processor executes Assembly code • A text representation of machine code

  50. Disassembling Object Code Disassembled • Disassembler objdump–d sumstore • Useful tool for examining object code • Analyzes bit pattern of series of instructions • Produces approximate rendition of assembly code • Can be run on either a.out (complete executable) or .o file 0000000000400595 <sumstore>: 400595: 53 push %rbx 400596: 48 89 d3 mov%rdx,%rbx 400599: e8 f2 ffffffcallq400590 <plus> 40059e: 48 89 03 mov%rax,(%rbx) 4005a1: 5b pop %rbx 4005a2: c3 retq

More Related