510 likes | 659 Views
Instruction Set. Ali Azarpeyvand Advanced Computer Architecture. 1. ExTime old ExTime new. Speedup overall =. =. (1 - Fraction enhanced ) + Fraction enhanced. Speedup enhanced. Review, #1. Amdahl ’ s Law: CPI Law: Execution time is the REAL measure of computer performance!
E N D
Instruction Set Ali Azarpeyvand Advanced Computer Architecture
1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced Review, #1 • Amdahl’s Law: • CPI Law: • Execution time is the REAL measure of computer performance! • Good products created when have: • Good benchmarks • Good ways to summarize performance • Die Cost goes roughly with die area4 CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle
Computing Targets (review) • Desktop computing • Servers • Embedded applications
Outline • Taxonomy of instruction set alternatives • Some instruction set measurements • Instruction set architecture of processors not aimed at desktops or servers: digital signal processors (DSPs) and media processors. • A sample RISC architecture
Instruction Set Architecture (ISA) software instruction set hardware
Arithmetic Logical Shift Load (from MM) Store (to MM) Move (reg-reg) Move (MM-MM) If I/O is not memory- mapped (e.g., MIPS: 4 bytes) 1) Length of operands 2) Shift/rotate: direction, amount 3) Branch condition (e.g., VAX: 1-37 bytes) 0 address 1 address 2 address 3 address implied • Addressing modes • immediate • absolute • computed Unconditional (jump) Conditional (branch) Call Return Instruction Register Memory Organization of an Instruction
Instruction Set Design Objective #1 Code size (code density) : • Depends on: • size of MM/cache • access time of cache (on-chip/off-chip) • CPU-MM bandwidth • Frequently used (written down) instructions should be short • Implies variable-length instructions
Instruction Set Design Objective #2 Execution speed (performance) : • Only frequently executed instructions should be included in the instruction set • Infrequently executed instructions slow down the others • Complex and long instructions tend to be used infrequently • Frequently executed instructions should be fast • Pipelining should be made as easy as possible • Overlapped execution lowers CPI value • Single instruction length, simple instruction formats, and few addressing modes for easy decoding
Instruction Set Design Objective #3 Size and complexity of hardware (ALU, CU) • Implementing infrequently executed instructions ties down hardware that is rarely used, and could be used for some other purpose with greater advantage • Some instructions should not be included in the instruction set
Instruction Set Design Objective #4 Instruction set for programming languages • Needs of a human programmer (less important today) • orthogonality (each operand can be specified independently of the others) • consistency (being able to predict the remainder of an architecture given partial knowledge of the system) • Needs of an optimizing compiler • Simple instructions are more suitable for code optimizations • Optimizing compilers try to find the shortest or fastest code sequence that implements the semantics of a HLL program. To make code reorganization tractable, an instruction set is needed that makes: • the size of each instruction easy to calculate; • the execution time of each instruction easy to calculate; • the interactions between instructions easy to figure out. • ISA features such as complex addressing modes, variable length instructions, special-purpose registers provide too many ways of doing the same thing and lead to combinatorial explosion
Evolution of Instruction Sets • Major advances in computer architecture are typically associated with landmark instruction set designs • Ex: Stack vs GPR (System 360) • Design decisions must take into account: • technology • machine organization • programming languages • compiler technology • operating systems • And they in turn influence these
Classifying • Stack • Accumulator • Register – memory • Register – register (load – store)
Register Advantages • Registers - like other forms of storage internal to the processor—are faster than memory. • Registers are more efficient for a compiler to use than other forms of internal storage. • More importantly, registers can be used to hold variables. • How many registers?
Register Usage (Compiler) • for expression evaluation • for parameter passing • to be allocated to hold variables. • GPR architectures: • Two or three operands • how many of the operands may be memory addresses in ALU instructions
Registers versus Cache • Similarities • Both are small, fast, and expensive (flip-flops) • Both are used to increase execution speed of CPU • Differences • Registers are visible in ISA; caches are not (except for instructions for invalidation, prefetch, or flushing) • Number of registers is fixed by instruction format; size of cache is easily changeable • Registers have higher BW: 3 words/cycle, and are random-access; caches have lower BW: 1 word/cycle, and are associative • Register access time is fixed; cache access time is statistical • Registers require fewer bits to address; caches require full memory addresses • Registers create no I/O problems; caches do
Organization of Registers • One general-purpose set (all interchangeable, “typeless”) • One general-purpose set (a few with dedicated uses) • PDP-11: eight 16-bit registers (R6: stack pointer, R7: PC) • VAX 11/780: sixteen 32-bit registers (four special-purpose, R14: stack pointer, R15: PC) • Two sets • Motorola 68000: eight 32-bit data, eight 32-bit address • IBM 370: sixteen 32-bit integer, four 64-bit FP • DLX, MIPS: 31 32-bit integer, 32 32-bit FP • Three sets • CDC 6600: eight 18-bit integer, eight 18-bit address, eight 60-bit FP • Many registers with dedicated use • Intel 80x86
64 bits 8 bytes 2 words 1 doubleword Most Significant Digit (MSD) “Big End” Least Significant Digit (LSD) “Little End” 0 1 2 3 4 5 6 “Big End”-ian Numbering 6 5 4 3 2 1 0 “Little End”-ian Numbering Notations for Information Representation Q: How do we number these various units of information in a consistent manner? 9 6 2 1 7 6 6
Mem Bank 00 Mem Bank 01 Mem Bank 10 Mem Bank 11 Memory Controller 8 8 8 8 32 bits Alignment of Words in Memory • CPU accesses a 32-bit word of data starting at byte address x…x00 • Such an address (multiple of 32[b]/8[b/B] = 4[B]) is called word-aligned • Memory controller is simple and fast, data available in one cycle • CPU accesses a 32-bit word of data starting at byte address 01111 • Byte addresses are 01111, 10000, 10001, 10010 (misaligned address) • Doubles the access time of word • Requiring aligned addresses results in simpler memory controller and faster execution • Costs some loss of storage, and adds complexity in code generators
Mem Bank 00 Mem Bank 01 Mem Bank 10 Mem Bank 11 Memory Controller 8 8 8 8 32 bits Sub-Word Accesses CPU Register File (32 bits) • Byte operand in register is usually the rightmost byte of register • Byte may come from any of the four memory banks • Source of complications
Memory Addressing • Byte Addressed • Big Endian, Little Endian • Aligned and misaligned access of objects
Addressing Modes • Addresses: • Constants • Registers • Locations in memory • Immediates are also included • What is effective address? • PC-relative addressing • Effects of Addressing modes: • Reduce instruction counts • Complexity of building a computer • Increase average CPI
Addressing Modes • We can’t directly refer to data values, only their addresses • Except for immediate operands • Register deferred and direct addressing modes can be synthesized from displacement addressing mode R : the register file M: the memory address space d : the size of the data item being accessed (1, 2, 4, 8 bytes)
Displacement values Range of displacements used?
Conclusions (up to now) • Modes: • displacement, • immediate, • and register indirect. • Displacement mode: • 12 to 16 bits • Immediate field • 8 to 16 bits
Operand Types • Integers: • 2’s complement • Characters: • ASCI • Unicode, utf-8 • Floating points: • IEEE standard 754 (short seminar on Unicode, IEEE) • Strings • Packed decimals
Control Transfer Instructions Terminology • BTA (Branch Target Address): The destination address of the branch • The BTA is static if it is always the same during execution • The BTA is dynamic if it can vary during a single execution of a program (procedure return, switch statements are major examples) • Branch is taken if next instruction to be executed is at address BTA • Branch is not taken if next instruction to be executed is the one following the branch instruction (“fall-through”) • Branch outcome: whether the branch is taken or not taken • Forward branch: BTA > (PC), where (PC) is the address of the branch instruction • Backward branch: BTA < (PC) • An unconditional branch is always taken
Code Generation Examples for Branches while (a < b) { a++; b--; x++; } if (x > 0) y += z; else y -=z; blez r7, L18 addu r3, r3, r4 j L33 L18: subu r3, r3, r4 L33: j L33 L34: addu r5, r5, 1 addu r6, r6, -1 addu r7, r7, 1 L33: slt r2, r5, r6 bne r2, r0, L34 Register r3 contains y Register r4 contains z Register r5 contains a Register r6 contains b Register r7 contains x
Classification of Branches Classifying branches into these four groups permits us to compute some of the dynamic frequencies if some others have been measured. Rule of thumb: Backward branches tend to be taken, forward branches tend not to be taken.
Computing Branch Frequencies Assume that 75% of all branches are forward, and that 55% of all branches are taken. If 80% of all backward branches are taken, what is the probability that a taken branch is a forward branch?
Frequency of Instructions for Control Flow • Conditional branches • Jumps • Procedure calls • Procedure returns
Addressing Modes for Control Instructions • Destination must be specified (compile time) • Absolute • PC-relative (displacement) • Target is usually near fewer bits • Position-independent • Target unknown • Returns • Case or switch • Virtual functions • High order functions
Conditional Branch Options • Typical set of condition codes (e.g., Motorola 680x0) • NegativeResult, ZeroResult, ArithmeticOverflow, CarryOut • Many RISC machines do not use condition codes (e.g., MIPS, Alpha) • Magnitude comparisons are done with explicit COMPARE instructions that put their results into named registers
Instruction Encoding • Factors: • Registers • Addressing modes • Size of these in instruction • Length of instructions: • Variable • Fixed • Hybrid The length of 80x86 instructions varies between 1 and 17 bytes. 16-bit and 32-bit instructions: ARM Thumb and MIPS MIPS16 (code size reduction of up to 40%).
An Ideal Machine • In Section 2.2—Use general-purpose registers with a load-store architecture. • In Section 2.3—Support these addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register indirect. • In Section 2.5—Support these data sizes and types: 8-, 16-, 32-bit, and 64-bit integers and 64-bit IEEE 754 floating-point numbers. • In Section 2.7—Support these simple instructions: load, store, add, subtract, move register-register, and, shift. • In Section 2.9—Compare equal, compare not equal, compare less, branch (with a PC-relative address at least 8 bits long), jump, call, and return. • In Section 2.10—Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size. • In Section 2.11—Provide at least 16 general-purpose registers, and be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist instruction set.
A "Typical" RISC (MIPS64) • 32-bit fixed format instruction (3 formats) • 32 64-bit GPR (R0 contains zero) • 32 double precision floating point register (F0-F31) • reg-reg arithmetic instruction • Single address mode for load/store: base + displacement • no indirection • Simple branch conditions • Delayed branch see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3