830 likes | 1.02k Views
Princess Sumaya University for Technology. Computer Architecture. Dr. Esam Al_Qaralleh. Instruction Set Architecture (ISA). Outline. Introduction Classifying instruction set architectures Instruction set measurements Memory addressing Addressing modes for signal processing
E N D
Princess Sumaya University for Technology Computer Architecture Dr. Esam Al_Qaralleh
Outline • Introduction • Classifying instruction set architectures • Instruction set measurements • Memory addressing • Addressing modes for signal processing • Type and size of operands • Operations in the instruction set • Operations for media and signal processing • Instructions for control flow • Encoding an instruction set • MIPS architecture
Basic Issues in Instruction Set Design • What operations and How many • Load/store/Increment/branch are sufficient to do any computation, but not useful (programs too long!!). • How (many) operands are specified? • Most operations are dyadic (e.g., AB+C); Some are monadic (e.g., A B). • How to encode them into instruction format? • Instructions should be multiples of Bytes. • Typical Instruction Set • 32-bit word • Basic operand addresses are 32-bit long. • Basic operands (like integer) are 32-bit long. • In general, Instruction could refer 3 operands (AB+C). • Challenge: Encode operations in a small number of bits.
6 5 5 16 rs rt Immediate opcode Brief Introduction to ISA • Instruction Set Architecture: a set of instructions • Each instruction is directly executed by the CPU’s hardware • How is it represented? • By a binary format since the hardware understands only bits • Options - fixed or variable length formats • Fixed - each instruction encoded in same size field (typically 1 word) • Variable – half-word, whole-word, multiple word instructions are possible
What Must be Specified? • Instruction Format (encoding) • How is it decoded? • Location of operands and result • Where other than memory? • How many explicit operands? • How are memory operands located? • Data type and Size • Operations • What are supported?
Example of Program Execution • Command • 1: Load AC from Memory • 2: Store AC to memory • 5: Add to AC from memory • Add the contents of memory 940 to the content of memory 941 and stores the result at 941 Fetch Execution
Instruction Set Design The instruction set influences everything
Instruction Characteristics • Usually a simple operation • Which operation is identified by the op-code field • But operations require operands - 0, 1, or 2 • To identify where they are, they must be addressed • Address is to some piece of storage • Typical storage possibilities are main memory, registers, or a stack • 2 options explicit or implicit addressing • Implicit - the op-code implies the address of the operands • ADD on a stack machine - pops the top 2 elements of the stack, then pushes the result • HP calculators work this way • Explicit - the address is specified in some field of the instruction • Note the potential for 3 addresses - 2 operands + the destination
Classifying Instruction Set Architectures Based on CPU internal storage optionsAND # of operands These choices critically affect - #instructions, CPI, and cycle time
Stack Push A Push B Add Pop the top-2 values of the stack (A, B) and push the result value into the stack Pop C Accumulator (AC) Load A Add B Add AC (A) with B and store the result into AC Store C Register (register-memory) Load R1, A Add R3, R1, B Store R3, C Register (load-store) Load R1, A Load R2, B Add R3, R1, R2 Store R3, C C=A+B
Modern Choice – Load-store Register (GPR) Architecture • Reasons for choosing GPR (general-purpose registers) architecture • Registers (stacks and accumulators…) are faster than memory • Registers are easier and more effective for a compiler to use • (A+B) – (C*D) – (E*F) • May be evaluated in any order (for pipelining concerns or …) • But on a stack machine must left to right • Registers can be used to hold variables • Reduce memory traffic • Speed up programs • Improve code density (fewer bits are used to name a register) • Compiler writers prefer that all registers be equivalent and unreserved • The number of GPR: at least 16
Characteristics Divide GPR Architectures • # of operands • Three-operand: 1 result and 2 source operands • Two-operand – 1 both source/result and 1 source • How many operands are memory addresses • 0 – 3 (two sources + 1 result) Load-store Register-memory Memory-memory
Pro’s and Con’s of Three Most Common GPR Computers Register-Register: (0,3) + Simple, fixed length instruction encoding. + Simple code-generation model. + Similar number of clocks to execute. - Higher instruction count. Memory-memory: (3,3) + Most compact. - Different Instruction size. - Memory access bottleneck. Register-Memory: (1,2) + Data access without loading first. + Easy to encode and yield good density. - One operand is destroyed. - Limited number of registers.
Memory Addressing Basics All architectures must address memory • What is accessed - byte, word, multiple words? • Today’s machine are byte addressable • Main memory is organized in 32 - 64 byte lines • Big-Endian or Little-Endian addressing • Hence there is a natural alignment problem • Size s bytes at byte address A is aligned if A mod s = 0 • Misaligned access takes multiple aligned memory references • Memory addressing mode influences instruction counts (IC) and clock cycles per instruction (CPI)
Byte Ordering • Idea • Bytes in long word numbered 0 to 3 • Which is most (least) significant? • Can cause problems when exchanging binary data between machines • Big Endian: Byte 0 is most, 3 is least • IBM 360/370, Motorola 68K, SPARC. • Little Endian: Byte 0 is least, 3 is most • Intel x86, VAX • Alpha • Chip can be configured to operate either way • DEC workstation are little endian • Cray T3E Alpha’s are big endian
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] Byte Ordering Example union { unsigned char c[8]; unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } dw;
Byte Ordering on Alpha Little Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] LSB MSB LSB MSB LSB MSB LSB MSB s[0] s[1] s[2] s[3] LSB MSB LSB MSB i[0] i[1] LSB MSB l[0] Print Output on Alpha: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [0xf7f6f5f4f3f2f1f0]
Byte Ordering on x86 Little Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] LSB MSB LSB MSB LSB MSB LSB MSB s[0] s[1] s[2] s[3] LSB MSB LSB MSB i[0] i[1] LSB MSB l[0] Print Output on Pentium: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [f3f2f1f0]
Byte Ordering on Sun Big Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] MSB LSB MSB LSB MSB LSB MSB LSB s[0] s[1] s[2] s[3] MSB LSB MSB LSB i[0] i[1] MSB LSB l[0] Print Output on Sun: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7] Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7] Long 0 == [0xf0f1f2f3]
Immediate Add R4, #3 Regs[R4] Regs[R4]+3 Register Add R4, R3 Regs[R4] Regs[R4]+Regs[R3] Operand:3 R3 Operand Register Indirect Add R4, (R1) Regs[R4] Regs[R4]+Mem[Regs[R1]] R1 Registers Operand Memory Registers Addressing Modes
Direct Add R4, (1001) Regs[R4] Regs[R4]+Mem[1001] Memory Indirect Add R4, @(R3) Regs[R4] Regs[R4]+Mem[Mem[Regs[R3]]] R3 1001 Operand Operand Memory Registers Addressing Modes(Cont.) Memory
Displacement Add R4, 100(R1) Regs[R4] Regs[R4]+Mem[100+R1] 100 R1 Operand Registers Memory Addressing Modes(Cont.) Scaled Add R1, 100(R2) [R3] Regs[R1] Regs[R1]+Mem[100+ Regs[R2]+Regs[R3]*d] 100 R3 R2 Operand *d Registers Memory
Use of Memory Addressing Mode (Figure 2.7) Based on a VAX which supported everything Not counting Register mode (50% of all)
Displacement Address Size • Average of 5 programs from SPECint92 and SPECfp92. • 1% of addresses > 16 bits. Integer Average FP Average
Immediate Addressing Mode • 10 Programs from SPECInt92 and SPECfp92
Immediate Addressing Mode • 50% to 60% fit within 8 bits • 75% to 80% fit within 16 bits gcc spice Tex
Short Summary – Memory Addressing • Need to support at least three addressing modes • Displacement, immediate, and register deferred (+ REGISTER) • They represent 75% -- 99% of the addressing modes in benchmarks • The size of the address for displacement mode to be at least 12—16 bits (75% – 99%) • The size of immediate field to be at least 8 – 16 bits (50%— 80%)
Operand Type & Size Typical types: assume word= 32 bits • Character - byte - ASCII or EBCDIC (IBM) - 4 per word • Short integer - 2- bytes, 2’s complement • Integer - one word - 2’s complement • Float - one word - usually IEEE 754 these days • Double precision float - 2 words - IEEE 754 • BCD or packed decimal - 4- bit values packed 8 per word
Short Summary – Type and Size of Operand • The future - as we go to 64 bit machines • Larger offsets, immediate, etc. is likely • Usage of 64 and 128 bit values will increase • DSPs need wider accumulating registers than the size in memory to aid accuracy in fixed-point arithmetic
What Operations are Needed • Arithmetic + Logical • Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT • Logical operation: AND, OR, XOR, NOT • Data Transfer - copy, load, store • Control - branch, jump, call, return, trap • System - OS and memory management • We’ll ignore these for now - but remember they are needed • Floating Point • Same as arithmetic but usually take bigger operands • Decimal • String - move, compare, search • Graphics – pixel and vertex, compression/decompression operations
load: 22% conditional branch: 20% compare: 16% store: 12% add: 8% and: 6% sub: 5% move register-register: 4% call: 1% return: 1% The most widely executed instructions are the simple operations of an instruction set The top-10 instructions for 80x86 account for 96% of instructions executed Make them fast, as they are the common case Top 10 Instructions for 80x86
Control Instructions are a Big Deal • Jumps - unconditional transfer • Conditional Branches • How is condition code set? – by flag or part of the instruction • How is target specified? How far away is it? • Calls • How is target specified? How far away is it? • Where is return address kept? • How are the arguments passed? Callee vs. Caller save! • Returns • Where is the return address? How far away is it? • How are the results passed?
Breakdown of Control Flows • Call/Returns • Integer: 19% FP: 8% • Jump • Integer: 6% FP: 10% • Conditional Branch • Integer: 75% FP: 82%
Branch Address Specification • Known at compile time for unconditional and conditional branches - hence specified in the instruction • As a register containing the target address • As a PC-relative offset • Consider word length addresses, registers, and instructions • Full address desired? Then pick the register option. • BUT - setup and effective address will take longer. • If you can deal with smaller offset then PC relative works • PC relative is also position independent - so simple linker duty
Returns and Indirect Jumps • Branch target is not known at compile time • Need a way to specify the target dynamically • Use a register • Permit any addressing mode • Regs[R4] Regs[R4] + Mem[Regs[R1]] • Also useful for • case or switch • Dynamically shared libraries • High-order functions or function pointers
Branch Stats - 90% are PC Relative • Call/Return • TeX = 16%, Spice = 13%, GCC = 10% • Jump • TeX = 18%, Spice = 12%, GCC = 12% • Conditional • TeX = 66%, Spice = 75%, GCC = 78%
Condition Testing Options PSW: program Switch Word
What kinds of compares do Branches Use? Large comparisons are with zero
Direction, Frequency, and real Change Key points – 75% are forward branch • Most backward branches are loops - taken about 90% • Branch statistics are both compiler and application dependent • Any loop optimizations may have large effect
Short Summary – Operations in the Instruction Set • Branch addressing to be able to jump to about 100+ instructions either above or below the branch • Imply a PC-relative branch displacement of at least 8 bits • Register-indirect and PC-relative addressing for jump instructions to support returns as well as many other features of current systems ( dynamic allocations)