440 likes | 454 Views
Topic II Instruction-Set Architecture. Introduction A Case Study: The MIPS Instruction-Set Architecture. Reading List. Slides: Topic2x Henn & Patt: Chapter 2 Other papers as assigned in class or homeworks. INPUT. OUTPUT. CONTROL (sequencer). DATAPATH (arithmetic). MEMORY.
E N D
Topic IIInstruction-Set Architecture • Introduction • A Case Study: The MIPS Instruction-Set Architecture \course\cpeg323-05F\Topic2-323.ppt
Reading List • Slides: Topic2x • Henn & Patt: Chapter 2 • Other papers as assigned in class or homeworks \course\cpeg323-05F\Topic2-323.ppt
INPUT OUTPUT CONTROL (sequencer) DATAPATH (arithmetic) MEMORY The Stored Memory Computer Five parts of a computer • Datapath (channels/changes bits) • Control (directs operations) • Memory (places to keep bits) • Input (get data from outside) • Output (send data to outside \course\cpeg323-05F\Topic2-323.ppt
Steps in Executing an Instruction Instruction Fetch: Fetch the next instruction from memory Instruction Decode: Examine instruction to determine: • What operation is performed by the instruction (e.g., addition) • What operands are required, and where the result goes Operand Fetch: Fetch the operands Execution: Perform the operation on the operands Result Writeback: Write the result to the specified location Next Instruction: Determine where to get next instruction \course\cpeg323-05F\Topic2-323.ppt
What is Specified in an ISA? Instruction Decode: How are operations and operands specified? Operand Fetch: Where can operands be located? How many? Execution: What operations can be performed? What data types and sizes? Result Writeback: Where can results be written? How many? Next Instruction: How can we choose the next instruction? \course\cpeg323-05F\Topic2-323.ppt
A Simple ISA: Memory-Memory • What operation becan performed? Basic arithmetic (for now) • What data types and sizes? 32-bit integers • Where can operands and results be located? Memory • How many operands and results ? 2 operands, 1 result • How are operations and operands specified? OP DEST, SRC1, SRC2 • How can we choose the next instruction? Next in sequence \course\cpeg323-05F\Topic2-323.ppt
Memory Model Think of memory as being a large array of n integers, referenced by the index (random Access Memory, or RAM) For instance, M[1] contains the value 3. We can read and write these locations. These are the only locations available to us. All “abstract” locations (such as variables in a C program) must be assigned locations in M. Address Contents 0 14 1 3 2 99 . . . . . . N - 1 0 \course\cpeg323-05F\Topic2-323.ppt
Simple Code Translation Given the C code A = B + C; we could decide that variable A uses location 100, B uses 48, and C uses 76. Convert the code above to the following “assembly” code: ADD M[100], M[48], M[76] How would we express A = (B + C) * (D + E); \course\cpeg323-05F\Topic2-323.ppt
Using a Temporary Location Assume we put A in 100, B in 48, C in 76, D in 20, and E in 32. Now choose an unused memory location (e.g., 84). ADD M[100], M[48], M[76] # A = B + C ADD M[84], M[20], M[32] # temp = D + E MUL M[100], M[100], M[84] # A = A * temp \course\cpeg323-05F\Topic2-323.ppt
Problems with Memory-Memory ISAs • Main memory much slower than arithmetic circuits • This was as true in 1950 as in 2003! • It takes a lot of room to specify memory addresses • Results are often used one or two instructions later Remember: make the common case fast! Solution: store temporary or intermediate results in fast memories near the arithmetic units. \course\cpeg323-05F\Topic2-323.ppt
Accumulator Machines An “accumulator” machine keeps a single high-speed buffer (e.g., a set of D latches or flip-flops, one for each data bit) near the arithmetic logic. In the simplest kind, only one operand can be specified; the accumulator is implicit: “OP operand” means: acc. = acc. OP operand Example: LOAD M[48] # Load B into acc. ADD M[76] # Add C to acc. (now has B+C) STORE M[100] # Write acc. To A \course\cpeg323-05F\Topic2-323.ppt
Accumulator Machines Does A=(B+C)*(D+E) LOAD M[20] # Load D into acc. ADD M[32] # Add E to acc. (now has D+E) STORE M[100] # Write acc. To A LOAD M[48] # Load B into acc. ADD M[76] # Add C to acc. (now has B+C) MUL M[100] # Multiply A to acc. STORE M[100] # Write (B+C) * (D+E) to A \course\cpeg323-05F\Topic2-323.ppt
Shortcomings of Accumulator Machines • Still requires storing lots of temporary and intermediate values in memory • Accumulator only really beneficial for a chain (sequence) of calculations where the result of one is the input to the next. \course\cpeg323-05F\Topic2-323.ppt
Still, Accumulator Machines Were Common in Early Computers • A simple design, and hence popular, especially for • Early computers • Early microprocessors (4004, 8008) • Low-end (cheap) models • Reason: accumulator logic much more expensive than memory • Vacuum tubes vs. core memory • D flip-flops vs. DRAM • Precious space on processor chip vs. off-chip DRAM \course\cpeg323-05F\Topic2-323.ppt
Alternatives to Accumulator Machines If more hardware resources are available, put more fast storage locations alongside the accumulator: • Stack machines • Register machines • Special purpose • General purpose \course\cpeg323-05F\Topic2-323.ppt
Stack Machines Idea: A pile of fast storage locations with a top and a bottom. An instruction can only get at the top value, or maybe the top two or three values. We can put new values on the top (“push”) or take them off the top (“pop”) but that’s it. We can’t get to locations underneath the top unless we remove everything above. Address Contents top 14 2nd from top 3 3rd from top 99 . . . . . . bottom 0 \course\cpeg323-05F\Topic2-323.ppt
Stack Machine ISA Basic operations include: Load: get value from memory and push onto stack Store: pop value off of stack and put into memory Arithmetic: pop 1 or 2 values off of stack; push result on stack Dup: Get value at top of stack without removing; push new copy onto stack (why is this useful?) \course\cpeg323-05F\Topic2-323.ppt
XXX Stack Machine Does A=(B+C)*(D+E) (stack top at start) (D+E) ADD XXX (D) LOAD M[20] XXX (B) (D+E) LOAD M[48] XXX (E) (D) (continued next slide) LOAD M[32] XXX \course\cpeg323-05F\Topic2-323.ppt
(C) XXX Stack Machine (cont.) ((B+C)*(D+E)) (B) XXX MULT (D+E) LOAD M[76] XXX STORE M[100] (B+C) (D+E) ADD XXX Note that the stack is now the same as when we began. \course\cpeg323-05F\Topic2-323.ppt
Stack Machines Used • Some early computers • 8086 floating point unit (sort of…) • Java Virtual Machine (JVM) \course\cpeg323-05F\Topic2-323.ppt
Register Machines Idea: Put more storage locations (“registers”) near the accumulator • Regs have names/numbers and can be used instead of memory • Accessed much faster than main memory • (1-2 CPU cycles vs. ~ 10s to 100 cycles) • Far fewer registers than memory locations • MIPS has 32 32-bit registers • Fewer regs, smaller addresses, fewer bits to name them • A scarce resource – use them carefully! \course\cpeg323-05F\Topic2-323.ppt
Special- vs. General-Purpose Registers • A special-purpose register is used for specific purposes and there may be limitations on which operations can use it • Easier on the HW design: put the reg right where it’s needed • More difficult for the compiler to use effectively • A general-purpose register can be used in any operation - Datapaths more general, hence routing is more difficult \course\cpeg323-05F\Topic2-323.ppt
Special-Purpose Registers: The Z-80 CPU • Seven 8-bit registers: A, B, C, D, E, H, L (BC, DE, HL can be pairs) • Three 16-bit registers: SP, IX, IY, plus PC (Program counter) • Add, subtract, shift can only be done to A (8-bit accumulator) • Increment and decrement can be done to all regs and reg pairs • Can fetch from memory at address (HL) and put in any 8-bit reg • A fetch from address (BC) or(DE) can only go to A • Fetches from (BC), (HL) and (IX) take different numbers of cycles Anyone want to write a compiler for this? \course\cpeg323-05F\Topic2-323.ppt
General Purpose Register (GPR) Machines The MIPS (and similar processors) has 32 General Purpose Registers (GPRs), each 32 bits long. All can be read or written, except register 0, whichis always 0 and can’t be changed. Register access time is uniform. Address Contents $0 0 $1 3 $2 99 . . . . . . $31 14 \course\cpeg323-05F\Topic2-323.ppt
GPR Machine Does A=(B+C)*(D+E) ADD $1 M[48], M[76] $R1 = B + C ADD $2 M[20], M[32] $R2 = D + E MUL M[100], $1, $2 $A = R1 * R2 \course\cpeg323-05F\Topic2-323.ppt
Some Trend • From hardware technology: number of Rs can be put on chip has potential grow very fast (Moore’s Law ?) • Very large register set will have slow access time. • Instruction set evolution is slow to accommodate the change of # of Rs \course\cpeg323-05F\Topic2-323.ppt
Memory and Data Sizes So far, we’ve only talked about uniform data sizes. Actual data come in many different sizes: • Single bits: (“boolean” values, true or false) • Bytes (8 bits): Characters (ASCII), very small integers • Halfwords (16 bits): Characters (Unicode), short integers • Words (32 bits): Long integers, floating-point (FP) numbers • Double-words (64 bits): Very long integers, double-precision FP • Quad-words (128 bits): Quad-precision floating-point numbers \course\cpeg323-05F\Topic2-323.ppt
Different Data Sizes How do we handle different data sizes? • Pick one size to be the unit stored in a single address • Store larger datum in a set of contiguous memory locations • Store smaller datum in one location; use shift & mask ops Today, almost all machines (including MIPS) are “byte-addressable” – each addressable location in memory holds 8 bits. \course\cpeg323-05F\Topic2-323.ppt
MIPS Memory On a byte-addressable machine such as the MIPS, if we say a word (32 bits) is stored “at” address 80, we mean it occupies locations 80-83. (The next word would start at 84.) Normally, multi-byte loads and stores must be “aligned.” The address of an n-byte load/store must be a multiple of n. For instance, halfwords can only be stored at even addresses. MIPS allow non-aligned loads and stores using special instructions, but they may be slower. (Most processors don’t allow this at all!) \course\cpeg323-05F\Topic2-323.ppt
Byte-Order (“Endianness”) • For a multi-byte datum, which part goes in which byte? • If $1 contains 1,000,000 (F4240H) and we store it into address 80: • On a “big-endian” machine, the “big” end goes into address 80 • On a “little-endian” machine, it’s the other way around 00 0F 42 40 … 79 80 81 82 83 84 … 40 42 0F 00 … 79 80 81 82 83 84 … \course\cpeg323-05F\Topic2-323.ppt
Big-Endian vs. Little-Endian • Big-endian machines: MIPS, Sparc, 68000 • Little-endian machines: most Intel processors, Alpha, VAX, Intel 8086 • No real reason one is better than the other… • Compatibility problems transferring multi-byte data between big-endian and little-endian machines – CAREFUL! [Read Appendix A-43 for more information.] \course\cpeg323-05F\Topic2-323.ppt
Addressing Modes - An ISA’s addressing modes answer the question: “where can operands be located?” • We have two types of storage in the MIPS (and most other machines): registers and main memory. • We can go to either or both for operands. A single operand can come from either a register or a memory location • and addressing modes offer various ways of specifying this location. \course\cpeg323-05F\Topic2-323.ppt
Simple Addressing Modes In these modes, a location or datum is given directly in the instruction: \course\cpeg323-05F\Topic2-323.ppt
Indirect Addressing Modes One or more registers are used to produce a memory address: \course\cpeg323-05F\Topic2-323.ppt
Advanced Addressing Modes Extra features to support features in high-level languages or reduce the number of instructions during common memory accesses: \course\cpeg323-05F\Topic2-323.ppt
Choices in Addressing Modes Anything goes: Any addressing mode may be used for any operand at any time - Easier to map high-level statements directly to instructions - Hard to design processor, due to all the complexity Limited addressing: Only allow a few modes, and/or restrict some operands to certain modes - Harder for compiler/programmer to follow all the rules - Code may be longer \course\cpeg323-05F\Topic2-323.ppt
Frequency of Addressing Modes 3 programs measured on VAX, which supports all kinds of modes: Frequency of mode (%) Min. ave. max. Mode Name \course\cpeg323-05F\Topic2-323.ppt
Empirical Data on Addressing Modes • How big do the displacements need to be? • In study of SPECin92 and SPECfp92, 99% of displacements fell within ± 215 • How big do the immediates (constants) need to be? • Studies show: 50% - 60% fit within 8 bits • 75%-80% fit within 16 bits \course\cpeg323-05F\Topic2-323.ppt
How Do We Represent Instructions? • We need some bits to tell what operation is performed (e.g., add, sub, mul, etc.) – this is called the opcode. • We need some bits for each operand and result (3 total, in our case): • What type of addressing mode • Number of the register, memory address and/or immediate constant \course\cpeg323-05F\Topic2-323.ppt
Variable-Length Instructions Since the VAX allows any mode for any operand, there could be an instruction with three 32-bit addresses (direct addressing) > 12 bytes in this instruction. But registers need only a few bits to specify, so 12 bytes would be wasteful for an instruction using 3 registers only! Must use variable-length instructions. On the VAX, instructions can vary from 1 to 17 bytes! \course\cpeg323-05F\Topic2-323.ppt
Fixed-Length Instructions If every instruction has the same number of bits (preferable a nice even number like 16 or 32), many components of the processor will be simpler. But we either waste some amounts of space or can’t support all the addressing modes! \course\cpeg323-05F\Topic2-323.ppt
Loading Small Integers • All registers in MIPS are 32 bits • What if we load a byte or halfword into a reg? • Load the bits into the lowest 8 or 16 bits of the reg. Unsigned load: All upper bits set to 0 Signed load: All upper bits set to sign bit (MSB of byte/halfword) \course\cpeg323-05F\Topic2-323.ppt
The RISC Approach In a Reduced Instruction Set Computer • All instructions are the same size (32 bits on the MIPS) • Few addressing modes are supported (only the frequent ones) • Only a few instruction formats (makes decoding easier!) • Arithmetic instructions can only work on registers • Data in memory must be loaded into registers before processing - This is called a “load-store” architecture \course\cpeg323-05F\Topic2-323.ppt
RISC Criteria[Colwell 85] • Single cycle operation • Load/store machine • Hardwired control • Relative few instructions and addressing modes • Fixed instruction format • More compile time effort \course\cpeg323-05F\Topic2-323.ppt