190 likes | 206 Views
Explore the basics of machine-level programming with an emphasis on Intel processors, assembly, and x86-64 IS32 architecture. Dive into instruction formats, operand types, and memory references for efficient coding.
E N D
Machine-Level Programming II: BasicsComp 21000: Introduction to Computer Organization & Systems Instructor: John Barr * Modified slides from the book “Computer Systems: a Programmer’s Perspective”, Randy Bryant & David O’Hallaron, 2015
Machine Programming I: Basics • History of Intel processors and architectures • C, assembly, machine code • Assembly Basics: Registers, operands, move • Intro to x86-64
IS32/x86-64 Properties • Instruction can reference different operand types • Immediate, register, memory • Arithmetic operations can read/write memory • Memory reference can involve complex computation • Rb + S*Ri + D • Useful for arithmetic expressions, too • Instructions can have varying lengths • IA32 instructions can range from 1 to 15 bytes
Features of IA32 instructions • X86-64 instructions can be from 1 to 15 bytes. • More commonly used instructions are shorter • Instructions with fewer operands are shorter • Each instruction has an instruction format • Each instruction has a unique byte representation • i.e., instruction pushl %ebp has encoding 55 • X86-64 started as a 16 bit language • So IA32 calls a 16-bit piece of data a “word” • A 32-bit piece of data is a “double word” or a “long word” • A 64-bit piece of data is a “quad word”
Programmer-Visible State PC: Program counter Address of next instruction Called “EIP” (IA32) or “RIP” (x86-64) Register file Heavily used program data 16 named locations, 64bit values (x86-64) Condition codes Store status information about most recent arithmetic operation Used for conditional branching Memory Byte addressable array Code, user data, (some) OS data Includes stack used to support procedures Assembly Programmer’s View (review) Memory CPU Addresses Registers Object Code Program Data OS Data PC Data Condition Codes Instructions Stack
Instruction format • Assembly language instructions have a very rigid format • For most instructions the format is movlSource, Dest Instruction name Instruction suffix Destination of instruction results: Registers/memory Source of data for the instruction: Registers/memory Remember that we use AT&T assembly format
Data Representations: IA32 + x86-64 • Sizes of C Objects (in Bytes) • C Data TypeGeneric 32-bitIntel IA32x86-64 • unsigned 4 4 4 • int 4 4 4 • long int 4 4 8 • char 1 1 1 • short 2 2 2 • float 4 4 4 • double 8 8 8 • long double 8 10/12 16 • char * 4 4 8 • Or any other pointer
Instruction suffix • Every operation in GAS has a single-character suffix • Denotes the size of the operand • Example: basic instruction is mov • Can move byte (movb), word (movw), double word (movl), and quad word (movq) • Note that floating point operations have entirely different instructions.
Registers • 16 64-bit general purpose registers • Programmers/compilers can use these • All registers begin with %r • Rest of name is historical: from 8086 • Registers originally had specific purposes • No restrictions on use of registers in commands • However, some instructions use fixed registers as source/destination • In procedures there are different conventions for saving/restoring the first 4 registers (%rax, %rbx, %rcx, %rdx) than the next 4 (%rsi, %rdi, %rsp, %rbp). • Final two registers have special purposes in procedures • %rbp (frame pointer) • %rsp (stack pointer) • Will discuss all these later
Registers • 16 64-bit general purpose registers • The low-order 4 bytes can be independently read or written by operation instructions. • Done for backward compatibility with 8008 and 8080 (1970’s!) • When a byte of the register is changed, the rest of the register is unaffected. • The low-order 2 bytes (16 bits, i.e., a single word) can be independently read/wrote by word operation instructions • Comes from 8086 16-bit heritage • When a word of the register is changed, the rest of the register is unaffected. • See next slide!
x86-64 Integer Registers %rax %r8 %eax %r8d • Can reference low-order 4 bytes (also low-order 1 & 2 bytes) %rbx %r9 %ebx %r9d %rcx %r10 %ecx %r10d %rdx %r11 %edx %r11d %rsi %r12 %esi %r12d %rdi %r13 %edi %r13d %rsp %r14 %esp %r14d %rbp %r15 %ebp %r15d
%eax %ecx %edx %ebx %esi %edi %esp %ebp History: IA32 Registers Origin (mostly obsolete) 8-bit register (%ah, %al,ch, …) %ax %ah %al accumulate %cx %ch %cl counter %dx %dh %dl data general purpose %bx %bh %bl base source index %si destination index %di stack pointer %sp base pointer %bp 16-bit virtual registers (%ax, %cx,dx, …) (backwards compatibility) 32-bit register (%eax, %ecx, …)
Moving Data %rax %rcx • Moving Data movqSource, Dest • Move 8-byte (“quad”) word • Lots of these in typical code • Operand Types • Immediate: Constant integer data • Example: $0x400, $-533 • Like C constant, but prefixed with ‘$’ • Encoded with 1, 2, or 4 bytes • Register: One of 16 integer registers • Example: %rax, %r13 • But %rspreserved for special use • Others have special uses for particular instructions • Memory: 8 consecutive bytes of memory at address given by register • Simplest example: (%rax) • Various other “address modes” %rdx %rbx %rsi %rdi %rsp %rbp %rN
movl Operand Combinations Cannot do memory-memory transfer with a single instruction Source Dest Src,Dest C Analog Reg movq $0x4,%rax temp = 0x4; Imm Mem movq $-147,(%rax) *p = -147; Reg movq %rax,%rdx temp2 = temp1; movq Reg Mem movq %rax,(%rdx) *p = temp; Mem Reg movq (%rax),%rdx temp = *p;
Simple Memory Addressing Modes • Normal (R) Mem[Reg[R]] • Register R specifies memory address • Aha! Pointer dereferencing in Cmovq (%rcx),%rax • Displacement D(R) Mem[Reg[R]+D] • Register R specifies start of memory region • Constant displacement D specifies offsetmovq 8(%rbp),%rdx Pretend that RAM is a big array named “Mem”
Simple Addressing Modes (cont) • Immediate $ImmImm • The value Immis the value that is usedmovq$4096,%rax • Absolute ImmMem[Imm] • No dollar sign before the number • The number is the memory address to usemovq4096,%rdx • The book has more details on addressing modes!!
mov instructions Notes: 1. byte movements must use one of the 8 single-byte registers 2. word movements must use one of the 8 2-byte registers 3. movsbl takes single byte source, performs sign-extension on high-order 24 bits, copies the resulting double word to dest. 4. movzbl takes single byte source, performs adds 24 0’s to high-order bits, copies the resulting double word to dest
mov instruction example • Assume that %dh = 8D and %eax = 98765432 at the beginning of each of these instructions instruction result • movb %dh, %al %eax = • movsbl %dh, %eax %eax = • movzbl %dh, %eax %eax = 9876548D FFFFFF8D 0000008D
mov instruction example • instruction addressing mode • movq $0x4050, %eax • movq %ebp, %esp • movq (%ecx), %eax • movq $-17, (%esp) • movq %eax, -12(%ebp) ImmReg RegReg MemReg ImmMem RegMem (Displacement)