Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA)

Outline • Introduction • Classifying Instruction Set Architectures • Addressing Modes • Instructions for Control Flow • Instruction Format • The Role of Compilers • The MIPS Architecture • Conclusion

Introduction • An instruction set architecture is a specification of a standardized programmer-visible interface to hardware. • A set of instructions • With associated argument fields, assembly syntax, and machine encoding. • A set of named storage locations • Registers, memory. • A set of addressing modes • Ways to name locations

Classifying Architectures • Classification is based on addressing modes. • Stack architecture • Operands implicitly on top of a stack. • Accumulator architecture • One operand is implicitly an accumulator • Essentially this is a 1 register machine • General-purpose register architecture • Register-memory architectures • One operand can be memory. • Load-store architectures • All operands are registers (except for load/store)

TOS Reg. Set Reg. Set Stack Accumulator ALU ALU ALU ALU Memory Memory Memory Memory Example CA+B Examples (a) Stack (d) Reg-Reg/Load-Store (b) Accumulator (c) Register-Memory Push A Push B Add Pop C Load A Add B Store C Load R1,A Add R1,B Store R1,C Load R1,A Load R2,B Add R3,R1,R2 Store R3,C

ISA Examples Machine Number of General Architecture year Purpose Registers 1949 1953 1963 1964 1970 1977 1980 1985 1987 EDSAC IBM 701 CDC 6600 IBM 360 DEC PDP-11 DEC VAX Motorola 68000 MIPS SPARC 1 1 8 16 8 16 16 32 32 accumulator accumulator load-store register-memory register-memory register-memory memory-memory register-memory load-store load-store

Examples of GPR Machines Number of Maximum number memory addresses of operands allowed SPARK, MIPS 0 3 PowerPC, ALPHA 1 2 Intel 80x86, Motorola 68000 2 2 VAX 3 3 VAX

Pros and cons for each ISA type

General Purpose Register ISAs • Load/Store (0,3) • ALU instruction: 3 register, 3 operands • Special load/store instructions for accessing memory • Easy to pipeline, but higher instruction count (RISC) • Register-Memory (1,2) • ALU instruction: 1 register, 2 operands • Harder pipelining, but more compact program (x86) • Memory-Memory (2,2) or (3,3) • ALU instruction: 2/3 memory, 2/3 operands • Hardest pipelining, most compact program • Not used today (DEC VAX)

Operation Types in The Instruction Set Operator TypeExamples Arithmetic and logical Integer arithmetic and logical operations: add, or Data transfer Loads-stores (move on machines with memory addressing) Control Branch, jump, procedure call, and return, traps. System Operating system call, virtual memory management instructions Floating point Floating point operations: add, multiply. Decimal Decimal add, decimal multiply, decimal to character conversion String String move, string compare, string search Graphics Pixel operations, compression/ decompression operations

instruction load 22% conditional branch 20% compare 16% store 12% add 8% and 6% sub 5% move register-register 4% call 1% return 1% Total 96% Instruction Usage Example: Top 10 Intel X86 Instructions Rank Integer Average Percent total executed 1 2 3 4 5 6 7 8 9 10 CISC to RISC observation Observation: Simple instructions dominate instruction usage frequency.

Memory Addressing • How do we specify memory addresses? • This issue is independent of type of ISA • We need to specify • Operand sizes • Address alignment • Byte ordering for multi-byte operands • Addressing Modes

Operand Sizes (1) • Common operand types include (assuming a 64 bit CPU): • Character (1 byte) • Half word (16 bits) • Word (32 bits) • Double word (64 bits) • IEEE standard 754: single-precision floating point (1 word), double-precision floating point (2 words). • For business applications, some architectures support a decimal format (packed decimal, or binary coded decimal, BCD).

Type and Size of Operands Distribution of data accesses by size for SPEC CPU2000 benchmark programs

Alignment (2) • Processors often require data-types to be aligned on addresses that are a multiple of their size: • address % sizeof (datatype) == 0 • 4 byte integers aligned on addresses divisible by 4 • bytes can be aligned everywhere • An ISA can require alignment of operands • Assume it is required unless otherwise specified • MIPS • all memory operands must be aligned (special two-instruction sequences for accessing unaligned data in memory ) • x86: • no alignment required, but unaligned accesses are slower)

Byte Ordering (“Endianness”) (3) • Layout of multi-byte operands in memory • Little endian (x86) • Least significant byte (LSB) at lowest address in memory • Big endian (most other ISAs) • Most significant byte (MSB ) at lowest address in memory • Assume this ordering unless otherwise specified

Endians & Alignment Increasing byteaddress 7 6 5 4 3 2 1 0 Aligned Not-aligned word Little-endian byte order (least-significant byte “first”). 3 (MSB) 2 1 0 (LSB) word Big-endian byte order (most-significant byte “first”). 0 (LSB) 1 2 3 (MSB)

Another view of Endianness • at word address 100 (assume a 4-byte word) long a = 0x11223344; • big-endian (MSB at word address) layout 100 101 102 103 100 11 22 33 44 +0 +1 +2 +3 • little-endian (LSB at word address) layout 103 102 101 100 11 22 33 44 100 +3 +2 +1 +0

Addressing Modes (4) • What is the location of an operand? • Three basic possibilities • Immediate: operand is a constant • Constant encoded in the instruction • Register: operand is in a register • Register number encoded in the instruction • Memory: operand is in memory • Many address modes possibilities

Immediate Addressing Mode • Operand is a constant encoded in an instruction • Can we have any value as an immediate? • x86: • yes. • # of bytes used to encode the instruction will change to accommodate. • RISC: • no, • instruction size is fixed (e.g. 32 bits) • Somebits used to specify the instruction opcode • Remainingbits encode the immediate value • This is OK: most-frequently needed constants have few bits • MIPS also has a special two-instruction sequence to put a full 32-bit immediate into a register

Distribution of Immediate SPEC CPU2000 on Alpha

Memory Addressing Modes (A) • Register Indirect • Address is in a register • LD R1, (R2) • Use: access via pointer or computed address • Direct (Absolute) • Address is a constant • LD R1, (100) • Use: access to static data • Note: constant encoded in the instruction

Memory Addressing Modes (B) • Displacement • Address is register+immediate • LD R1, 100(R2) • Displacement size is concerned • Note: displacement encoded in instruction

Displacement Distribution SPEC CPU2000 on Alpha 1% of addresses > 16-bits 12 - 16 bits of displacement needed CISC to RISC observation

Typical Memory Addressing Modes Addressing Sample Mode Instruction Meaning Register Immediate Displacement Indirect Indexed Absolute Memory indirect Autoincrement Autodecrement Scaled Add R4, R3 Add R4, #3 Add R4, 10 (R1) Add R4, (R1) Add R3, (R1 + R2) Add R1, (1001) Add R1, @ (R3) Add R1, (R2) + Add R1, - (R2) Add R1, 100 (R2) [R3] Regs [R4] ¬Regs[R4] + Regs[R3] Regs[R4] ¬Regs[R4] + 3 Regs[R4] ¬Regs[R4]+Mem[10+Regs[R1]] Regs[R4] ¬Regs[R4]+ Mem[Regs[R1]] Regs [R3] ¬Regs[R3]+Mem[Regs[R1]+Regs[R2]] Regs[R1] ¬Regs[R1] + Mem[1001] Regs[R1] ¬Regs[R1] + Mem[Mem[Regs[R3]]] Regs[R1] ¬Regs[R1] + Mem[Regs[R2]] Regs[R2] ¬Regs[R2] + d Regs [R2] ¬Regs[R2] -d Regs{R1] ¬Regs[Regs[R1] +Mem[Regs[R2]] Regs[R1] ¬Regs[R1] + Mem[100+Regs[R2]+Regs[R3]*d]

Addressing Mode Usage 3 SPEC89 programs on VAX

Addressing Modes Usage Example For 3 programs running on VAX ignoring direct register mode: Displacement 42% avg, 32% to 55% Immediate: 33% avg, 17% to 43% Register deferred (indirect): 13% avg, 3% to 24% Scaled: 7% avg, 0% to 16% Memory indirect: 3% avg, 1% to 6% Misc: 2% avg, 0% to 3% 75% displacement & immediate 88% displacement, immediate & register indirect. Observation: In addition Register direct, Displacement, Immediate, Register Indirect addressing modes are important. 75% 88% CISC to RISC observation

Control Flow Instructions • control flow 3 basic types • Jumps • Conditional branches • Procedure Call/Return SPEC CPU2000 programs.

Addr. Modes for Control Flow Instructions • PC-relative (PC + displacement) • Most commonly used for branches and jumps • Position-independent code • Target known at compile time • Register indirect (register has address) • Used when target not known at compile time • procedure returns, virtual functions and function pointers, case/switch statements, etc.)

Branch Distance Distribution SPEC CPU2000 on Alpha

Branch Comparison Types SPEC CPU2000 on Alpha

Call/Return Instructions • Call • Minimum: save return addressto the stack (x86) or in a register (MIPS) • Can create a stack frame, save registers, etc. • Return • Jumps to return address • Can pop the stack frame, restore registers, etc. • Simpler typically turns out to be better • E.g. many functions do not need a stack frame

MIPS Call/Return • Call: Jump-And-Link (JAL <function>) • Puts return address into R31,then jumps to target address • Return: Register-Indirect Jump (JR R31) • Jumps to address in R31(no special RET instruction) • Stack frame create/pop via ordinary add/sub instrs (stack-pointer register is R29) • Register save/restore via ordinary load/store instrs

Procedure call essentials (1):Caller/Callee Mechanics who does what when? • Four places foo() bar(int a) { { int temp = 3; bar(42); ... ... return(temp + a); } } 2. callee at entry 1. caller at call time 4. caller after return 3. callee at exit

Name R# Usage Preserved on Call $zero 0 The constant value 0 n.a. $at 1 Reserved for assembler n.a. $v0-$v1 2-3 Values for results & expr. eval. no $a0-$a3 4-7 Arguments no $t0-$t7 8-15 Temporaries no $s0-$s7 16-23 Saved yes $t8-$t9 24-25 More temporaries no $k0-$k1 26-27 Reserved for use by OS n.a. $gp 28 Global pointer yes $sp 29 Stack pointer yes $fp 30 Frame pointer yes $ra 31 Return address yes Procedure call essentials (2):MIPS Registers

Procedure call essentials (3):Good Strategy • Caller at call time • put arguments in $a0..$a4 • save any caller-save temporaries • jalr ..., $ra • Callee at entry • allocate all stack space • save $ra + $s0..$s3 if necessary • Callee at exit • restore $ra + $s0..$s3 if used • deallocate all stack space • put return value in $v0 • Caller after return • retrieve return value from $v0 • restore any caller-save temporaries do most work at callee entry/exit most of the work

Procedure call essentials (4):Summary… • Summary • Caller saves registers (outside the agreed upon convention i.e. $ax) at point of call • Callee saves registers (per convention i.e. $sx) at point of entry • Callee restores saved registers, and re-adjusts stack before return • Caller restores saved registers, and re-adjusts stack before resuming from the call • Big ?: • Is this clear? I can work through an example if needed…

Instruction Encoding • Instruction must specify • What is supposed to be done (opcode) • What are the operands • Three popular formats • Variable format (VAX, x86) • Opcode specifies how many operands, operands are listed after opcode • Each operand has an address specifier and an address field • Address specifier describes addressing mode for that operand • Fixed format (RISC) • All instructions of the same size • Opcode specifies addressing mode for load/store operations • All other operations use register operands • Hybrid Format (IBM 360, some DSP processors) • Several (but few) fixed size instruction formats • Some formats have address specifier fields

How is the operation specified? • options: Variable …. Operation & # of operands Address Specifier 1 Address Field 1 Address Specifier n Address Field n Operation Address Field 1 Address Field 2 Address Field 3 Fixed Operation Address Specifier Address Field Operation Address Specifier 1 Address Specifier 2 Address Field Hybrid Operation & # of operands Address Specifier Address Field 1 Address Field 2

Instruction Set Encoding • Affects program size: • Number of instructions: size of the Opcode • Number of instructions: types of instructions • Number of operands • Number of registers: size of the operand fields • Variable instruction length vs. Fixed instruction length

Instruction Encoding Tradeoffs • Decoding vs. Programming • Fixed formats easy to decode • Variable formats easier to program (assembler) • But we mostly use compilers now… • Number of registers • More registers help optimization (a lot) • Operand fields get smaller with a few registers • In general, we want many (e.g. 32) registers,

Helping the Compiler Writers • Regularity and Orthogonality • General-Purpose Registers • If an operation works with one data type,is should work with all supported data types • If an operation works with one addr. mode… • Primitives, not solutions • E.g. JAL vs. elaborate function call instruction • Simplify tradeoffs • Make the frequent cases fast • Bind constants at compile time

Today’s compilers work like this: Dependencies: Function Pass Front-end per language Transform language to common, intermediate form • Language dependent • Machine independent Intermediate representation For example, procedure inlining and loop transformations High-level optimizations • Somewhat language dependent • Largely machine independent • Small language dependencies • Machine dependencies slight • (I.e. register counts/types) Including global and local optimization + register allocation Global optimizer Detailed instruction selection and machine-dependent optimizations (assembler next?) Code generator • Highly machine dependent • Language independent

Compiler Optimization and Instruction Count Change in instruction count for the programs lucas and mcf from SPEC2000 as compiler optimizations vary.

How the architect can help the compiler writer • Keep in mind: • Most programs are locally simple! • Simple translations work just fine • Complexity arises b/c program require lots of instructions and they must interact globally • Also b/c of the whole multiple pass thing • The compiler writer’s corollary/rule/manifesto: • Make the frequent cases fast and the rare cases correct!

The DLX mProcessor • A generic mP that we’ll use from time-to-time • Very similar to a MIPS machine • Compiled by taking the average of a # of recent experimental and commercial machines • Has 32 general purpose registers (R0, R1, … R31) and floating point registers • Data types include: • 8-bit bytes • 16-bit half words • 32-bit words for integer data words • 32 & 64-bit double precision words

DLX addressing modes • The only data addressing modes are immediate and displacement • Possible to implement register deferred and absolute • DLX memory is byte addressable in the Big Endian mode with a 32-bit address • DLX uses a load/store architecture so: • All memory references are through loads or stores between memory and either GPRs and FPRs Add R1, (1001) # R1  R1 + M(1001) Add R4, (R1) # R4  R4 + Mem(R1)

Opcode rs1 rs2 rd func DLX instruction format I-type instruction DLX has 2 addressing modes which are encoded in theopcode 6 5 5 16 Opcode rs1 rd Immediate • Encodes: Loads and Stores of bytes, words, half words • All immediates (rd  rs op immediate) • Conditional branch instructions (rs1 is register, rd is unused) • Jump register, jump and link register (rd = 0, rs1 = destination, immediate = 0) R-type instruction 6 5 5 5 11 • Register-register ALU operations: rd  rs1 func rs2 • Function encodes the data path operation: Add, Sub, … • Read/write special registers and moves

DLX instruction format J-type instruction 6 26 Opcode Offset added to PC Jump and jump and link Trap and return from exception

An example MIPS machine

Instruction Set Architecture (ISA)