290 likes | 403 Views
CS136, Advanced Architecture. Instruction Set Architecture. Types of ISAs. Stack Implicit operands (top of stack) Heavy memory traffic Limited ability to access operands at will Obsolete Accumulator Implicit register operand (“accumulator”) One memory operand Insufficient temporaries
E N D
CS136, Advanced Architecture Instruction Set Architecture CS 136
Types of ISAs • Stack • Implicit operands (top of stack) • Heavy memory traffic • Limited ability to access operands at will • Obsolete • Accumulator • Implicit register operand (“accumulator”) • One memory operand • Insufficient temporaries • Obsolete • General-purpose register • Multiple registers • Several variations CS 136
GPR Architectures • Memory-memory • CISC idea • Usually allows any operand to be in register as well • Register-memory • Example: x86 • Can do one operand in register, one in memory, or 2 in regs • Register-register • Only design used in modern machines • Lots of registers ⇒ fast flexible operand access • Simplicity of hardware • Compiler has full flexibility in register usage CS 136
Five Ways to Do C = A + B STACK PUSH A PUSH B ADDPOP C ACCUM LOAD A ADD BSTORE C MEM-MEM ADD C,A,B REG-MEM LOAD R1,A ADD R1,B STORE R1,C REG-REG LOAD R1,A LOAD R2,B ADD R3,R1,R2STORE R3,C CS 136
Memory Addressing • Originally just word addressing • 8-bit bytes and byte addressing introduced on IBM 360 series • Brief experiments with bit addressing (bad idea) • Unaligned accesses not worth supporting • Some machines byte-address but only load/store a word at a time • Turned out to be bad design decision • Too many programs do string processing 1 character at a time • May need to revisit in future (32-bit characters?) • Modern RISC designs allow short load/store, but not short arithmetic CS 136
Endian-ness • The word is “Endian”, not “Indian” • Reference to Gulliver’s Travels • Little-Endian invented by Digital Equipment on the PDP-11 • Mathematically more elegant • Horrible for humans • “It seemed like a good idea at the time” • Should be banished from the face of the Earth • Some machines can switch endianness with a control bit • This idea is even stupider than the original CS 136
Addressing Modes • How can an instruction reference memory? • Early days: absolute address in instruction • Led to instruction modification • Improvement: “Indirection” picked up absolute location, used it as final address • Minimum necessary today: follow pointer in register • Clumsy if only option • Fanciest conceivable: *(R1+S*R2+constant), with either or both of R1 and R2 autoincremented or autodecremented as side effect, either before or after instruction • No machine went quite this far • But VAX came close CS 136
Addressing Modes (cont’d) • What’s actually useful? • Need to follow pointers: can restrict to registers • ADD R1,(R2) • Better: LOAD R1,R2 (like MIPS) • Frequent stack access ⇒ register + constant useful • Immediates needed for built-in constants • Access to globals ⇒ absolute memory addresses • (We’ll see that that’s painful) • PC-relative modes • Used to be needed for data; not in modern systems • Still needed for calls and branches • Absolute addresses no longer needed for branches • Can always emulate with PC-relative, since PC known • Still available on some architectures CS 136
Operand Types and Sizes • Type usually implies size • Integers can safely be widened to word size • Shrink again when stored • Takes advantage of two’s-complement representation • Single-precision FP gives different results than double-precisions • Necessary to support both widths • Some FPUs can do two SP operations in parallel • Older machines allowed “packed” decimal (2 digits per byte) • x86 supports with DAA (Decimal Add Adjust) instruction • Still useful in business world, though dying • 32 bits standard these days, 64 bits coming • 128 some day? CS 136
Operations Provided • Only one instruction truly needed: SJ • Subtract A from B, giving C; if result is < 0, jump to D • It’s Turing-complete! • Practical machines need a bit more at minimum: • Arithmetic and logical (add, multiply, divide?, and, or, …) • Data movement (load/store, move between registers) • Control (conditional/unconditional branch, procedure call and return, trap to OS) • System control (return from interrupt, manage VM, set unprivileged mode, access I/O devices) • Other builtins can be useful: • Basic floating point • Bad x86 design idea: sin, sqrt, etc.! • Decimal • String • Vector, graphics CS 136
Control Flow • Addressing modes are important • PC-relative means code can run at any virtual address • Useful for dynamically linked (shared) libraries • Pointer-following jump needed for returns • Also useful for switch statements, function pointers, virtual functions, and shared libraries • How to specify condition for conditional branches? • Condition code as side effect of every instruction • Boils down to extra register • Spurious dependencies in pipeline • Condition register explicitly set by comparison • Compare as part of branch • Adds delay slots in pipeline CS 136
Encodings • Variable-length instructions • Highly efficient (few wasted bits) • Allows complex specifications (e.g., x86 addressing modes) • Usually means misaligned instruction fetch • Greatly complicates fetch/decode units • Fixed-length instructions • May limit number of registers • Usually very few instruction formats • Wastes space but gains speed (e.g., only aligned fetches) • Limits width of immediate operands CS 136
The Fight for Bits • How wide should instruction be? • Wider ⇒ can encode more registers, more options • Wider ⇒ bigger programs, more memory bandwidth • Bigger programs ⇒ fewer cache hits • Things you need to encode: • Operation code (16 to 1000 instructions) • Operands (at least one, normally two or three) • Immediate operands • Memory offsets • Branch targets • Branch conditions • Conditional operations (e.g., conditional load, add) CS 136
Two or Three Operands? • In favor of three: • Smaller code size • No clobbered operands ⇒ fewer copies or reloads • Setting R0 to zero allows fewer operations supported in ALU • In favor of two: • Can address more registers CS 136
How to Decide All These Questions? • Slide rules at 50 paces? • Analysis wars • Look at existing designs, existing programs • “Recompile” programs for hypothetical architecture • Analyze size of resulting program • Run through simulator to see how it performs • Impractical approach • Writing compiler back ends is expensive • Simulators are slow • instead, make projections based on existing object code CS 136
Example of Bad Analysis: @-(R2) • DEC VAX had three “auto” addressing modes: autopostincrement, autopredecrement, and indirect autopostincrement • What happened to indirect autopredecrement? • Analyzed output of BLISS compiler on many programs • Language didn’t provide way to express autopredecrement • Concluded it wasn’t necessary • Very different result if had analyzed C! *--p1 = a[--i]; CS 136
Example of Difficult Analysis: imm16 • How big should an immediate be? • Easy analysis: examine existing code • Calculate frequency of various widths • Analyze tradeoff of using those bits for other purposes • Problem: analyzed architecture affects frequency of different widths • E.g., Alpha has only 16 bits, so you’ll never see over 16! • Alternative: look for multi-instruction sequences that effectively use more than 16 bits • Hard to find (compiler pipeline scheduling) • Compiler will stand on head, use sneaky tricks to avoid generating extra instructions • Need for wider constants depends on architecture • E.g., MIPS needs them when jumping to shared libraries CS 136
Interaction with Compilers • Nearly all modern code generated by compilers • Architect must make compiler’s job easier • Lots of registers • Orthogonal instruction set • Few side effects • Instructions and addressing modes matched to language constructs • But NOT attempt to implement them in detail! • Primitives are better than “solutions” even when solutions are correct • Good support for stack, globals, and pointers • Support for both compile-time and run-time binding • Don’t ask compiler to predict dynamic information (e.g., branch targets) • Don’t provide features language can’t express • Example pro and con: vector architectures CS 136
The MIPS64 Architecture • Extension of MIPS32 • Data path widened to 64 bits • Still 32-bit instructions • Still only 32 registers • Most instructions have “D” as prefix to indicate 64-bit version CS 136
R-Type Instruction I-Type Instruction 6 6 5 5 5 5 5 5 16 6 Opcode Opcode rs rs rt rt Immediate rd shamt funct J-Type Instruction 6 26 Opcode Offset inserted into PC MIPS Instruction Formats CS 136
6 5 5 16 Opcode rs rt Immediate I-Type Instructions • Encodes loads, stores (all widths), immediate ALU ops • Also conditional branches (rt unused) CS 136
6 5 5 5 5 6 Opcode rs rt rd shamt funct R-Type Instructions • Register-register ALU operations • “funct” encodes the ALU operation: add, sub, etc. • Opcode chooses operands, special registers, sizes, etc. • Conditional moves • Handles special registers, floating point, … CS 136
6 26 Opcode Offset inserted into PC J-Type Instructions • Jump, jump and link • Trap, return from exception CS 136
MIPS Control Flow • Unconditional jump substitutes low bits of PC • NOT addition! • Exceptionally bad on 64-bit architecture, where 36 bits unchanged • No built-in stack • Subroutine call stores return in register • Callee must save on stack if necessary • Reduces overall cycle time • Ultra-efficient for leaf functions • Conditional branches only test against zero • Complex tests (e.g., <) store Z/NZ result in a register • We’ve seen how this improves the pipeline • Conditional moves can eliminate many branches • Feature of many modern architectures CS 136
MIPS Floating Point • Floating point was originally coprocessor • Separate FP registers • Special instructions to move to/from integer registers • MIPS64 (but not 32) has paired single operations • Two SP numbers pass through DP ALU simultaneously • MIPS64 also has multiply-add in one instruction • Useful in signal processing (multimedia) CS 136
Fallacies and Pitfalls • PITFALL: Instruction designed to support feature in some language • Examples: PDP-11/45 MARK, VAX CALLS, IBM 360 ED/EDMK • Why is this bad? • Easy to get wrong (PDP-11 MARK instruction) • Easy to make inefficient (VAX CALLS) • Languages evolve, hardware doesn‘t CS 136
Fallacies and Pitfalls (2) • FALLACY: Typical programs exist • We wish! • PITFALL: Ignoring the compiler • Design better code size, based on bad compiler • Good compiler can blow your idea out of the water • FALLACY: Flawed architectures can’t succeed • Ummm, x86? • Every architecture has drawbacks • FALLACY: You (YOU!) can design a flawless architecture • Always tradeoffs • Always something new to learn CS 136
Summary • Instruction encoding is important • Don’t forget to provide what the compiler needs • This is NOT what you think the compiler needs! • Addresses will only get wider • Data will only get wider • Including characters • Cleverness to improve bandwidth (e.g., MADD) • RISC is here to stay CS 136