Instruction Set Principles

Instruction Set Principles • ISA should reflect application characteristics: • Desktop computing is compute-intensive, thus focusing on features favoring Integer and FP ops; • Server computing is data-intensive, focusing on integers and char-strings (yet FP ops are still standard in them) • Embedded computing is time-sensitive, memory and power conciouse, thus focusing on code-density, real-time and media data streams.

Instruction Set Principles • Taxonomy of ISA: • Stack: both operands are implicit on the top of the stack, a data structure in which items are accessed an a last in, first out fashion. • Accumulator: one operand is implicit in the accumulator, a special-purpose register. • General Purpose Register: all operands are explicit in specified registers or memory locations. Depending on where operands are specified and stored, there are three different ISA groups: • Register-Memory:one operand in register and one in memory.Examples: IBM 360/370, Intel 80x86 family, Mototola 68000; • Memory-Memory:both operands are in memory. Example: VAX. • Register=Register (load & store): all operands, except for those in load and store instructions, are in registers. Examples: SPARC (Sun Microsystems), MIPS, Precision Architecture (HP), PowerPC (IBM), Alpha (DEC).

Instruction Set Principles CA+B Taxonomy of ISA: Examples (a) Stack (e) Memory-Memory (d) Reg-Reg/Load-Store (b) Accumulator (c) Register-Memory TOS Reg. Set Reg. Set Stack Accumulator ALU ALU ALU ALU ALU Memory Memory Memory Memory Memory Push A Push B Add Pop C Load A Add B Store C Load R1,A Add R1,B Store R1,C Load R1,A Load R2,B Add R3,R1,R2 Store R3,C Add C,A,B or Add A,B

Instruction Set Principles • Comparisons:

Instruction Set Principles • Addressing Memory: how to specify and interpret memory address is important since all data are initially in the memory. • Interpreting Memory Addresses • All computers, except DSPs, are byte-addressed, providing access for bytes, half-words (2 bytes), words (4 bytes), and double words (8 bytes) • Ordering bytes within a larger object: 8 bytes in a double word • Little Endian • Big Endian • Byte ordering can be a problem when exchanging data between computers with different ordering conventions • Alignment of bytes: an access to an object of size s bytes at byte address A is aligned if A mod s = 0. Memory is aligned on a multiple of a word or double-word boundary • Misalignment causes extra memory accesses and HW costs 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7

Instruction Set Principles • Addressing Modes: how ISA specifies the address of an object to be accessed (fig. 2.6-2.7) • Operands: they can be found in registers, memory locations, and instructions themselves (instruction stream) • Effective Address: specifies the actual memory address when a memory location is used for an operand • PC-Relative Addressing: addressing modes that depend on the program counter • Immediates/Literals: considered as memory addressing modes, even though the value they access is in the instruction steam • Displacement Mode: must determine the range of displacement judiciously (via quantitative studies, fig. 2.8) • Immediate/literal Mode: must decide the level of support (all or a subset ops) and the range of values (fig. 2.9-10) • Modulo/Circular Mode for DSPs:handling infinite, continuous stream of data relies on circular buffers • Bit-Reverse Mode: used exclusively for FFTs

Instruction Set Principles • Type and Size of Operands: encoding in opcode designates operand types in all modern day computers while tags were used to indicate types in old machines • Desktop and Server architectures: • Character: 8-bit, usually in ASCII • 16-bit Unicode: used in Java is gaining popularity • Integers: are almost universally represented as two’s complement binary numbers – short integer (half-word), integer (word), long integer (double-word) • Single-precision (1-word) and double-precision (2-word) floating point: the IEEE float-point standard, IEEE standard 754 • Architectures supporting business applications: • Packed decimal/binary-coded decimal: 4 bits are used to encode the values 0-9 and two decimal digits are packed into each byte, for getting results that exactly match decimal numbers (some decimal fractions do not have exact representation in binary) • Frequency of access to types helps determine what types are most important to support efficiently (fig. 2.12)

Instruction Set Principles • Operands for Media and Signal Processing: • Graphics applications deal with 2D and 3D images • Vertex:usually of 32-bit floating-point values, isa data structure with four components for representing 3D images: x-coordinate, y-coordinate, z-coordinate, w-coordinate (color or hidden surfaces) • Pixel: consists of four 8-bit channels: R (red), G (green), B (blue), and A (transparency of the surface or pixel) • DSPs adds a unique data type: • fixed point:a binary point just to the right of the sign bit, thus representing a fraction between –1 and +1 • Blocked floating point: because the exponent variable is often shared among many fixed-point variables (the fixed point does not include an exponent in every word, thus relying on DSP programmer to keep the exponent in a separate variable and ensure that each result is shifted left or right to keep alignment).

Instruction Set Principles • Operations in the Instruction Set (fig. 2.15): • Rule of thumb: the most widely executed instructions are the simple operations of an instruction set (fig 2.16) • Operations for Media and Signal Processing: less precision and narrower data width due to the tolerance of human perception • Partitioned add: 4 16-bit adds performed on a single 64-bit ALU in a single cycle (SIMD or vector instructions, fig2.17) • Paired operations: one instruction can launch two 32-bit operations on operands found side by side on a double-precision register • Saturated arithmetic: due to real-time requirement, DSP does not allow exception handling and must tolerate overflow by substituting it with the largest representable number • Multiply-accumulate (MAC): key to dot-product operations for vector and matrix multiplies (MACs/second is the primary peak performance metric for DSP)

Instruction Set Principles • Instructions for Control Flow • There four different types of control flow change (fig 2.19): • Conditional branch: 75% integer and 82% fp • How to specify branch conditions? (fig 2.21-2.22) • Jump (or unconditional branch): 6% integer and 10% fp • Procedure calls and Procedure returns: 19% and 8% • Caller saving vs. callee saving • Addressing Modes for Control Flow Instructions: • PC-relative: advantageous for cases where targets are near the branch instruction and has the desirable property of position independence (fig 2.20) • Register indirect jumps: if the target is not known at compile time, PC cannot be used; rather, a location is used to dynamically specify the target • Case of switch: in most languages • Virtual functions or methods: in OO languages • High-order functions or function pointers: in C or C++ • Dynamically shared libraries

Instruction Set Principles • Encoding an Instruction Set: there are three choices • Variable: allows virtually all addressing modes to be with all operations, enabling the smallest code representation • examples: VAX and Intel 80x86 (1-5 operands, each with 10 addressing modes) • Fixed: load-store ISA, with only one memory operand and only one or two addressing modes, thus being able to encode addressing mode as part of the opcode • Examples: Alpha, ARM, MIPS, PowerPC, SPARC, SuperH • Largest code size • Hybrid: IBM 360/370, MIPS16, Thumb, TI TMS320C54x (fig 2.23) • Competing forces:no. & size ofreg & addr modes, code, pipeline Operation and # of operands Address specifier 1 Address field 1 Address specifier n Address field n Address field 2 Address field 1 Address field 3 Operation

Instruction Set Principles • The Role of Compilers: • The Structure of Recent Compilers: multi-phased (fig. 2.24) • Difficulties: compiler makes gross assumptions about the abilities of later phases, hence phase-ordering problem. For instance, it can not guarantee allocations of registers where they are most desirable. • Example: global common subexpression elimination -- replacing multiple computations of the same variable with a single computation and a temporary location for storing the value. If this temporary is not allocated a register, the slow accessing to memory may actually negate the gain from such optimization! • Register Allocation: plays a central role in compiler optimization both in speeding up the code and in making other optimizations useful. • graph coloring (≥16 general purpose registers) for simple cases and heuristics for more complicated cases;

Instruction Set Principles • Impact of Optimizations on Performance: • Major types of optimizations and examples in each class • Change in instruction count for the programs lucas and mcf from the SPEC2000 as compiler optimization levels vary: • Level 0:unoptimized; • Level 1: local optimizations, code scheduling, and local register allocation; • Level 2: global optimizations, loop transformation, and global register allocation; and • Level 3: procedure integration

Instruction Set Principles • The Impact of Compiler Technology on the Architect’s Decisions: • How are variables allocated and addressed? • How many registers are needed to allocate variables appropriately? • stack: procedure calls (grows) and returns (shrinks), activation of records; most effective with register; • global data area: statically declared objects -- arrays or aggregate data structure; difficult, if not impossible, to allocate registers if objects are aliased; • heap: dynamic objects -- accessed through pointers and typically non-scalar; almost impossible for register allocation due to pointers • Because of aliasing, a compiler must be conservative for it is impossible to know what a pointer may refer to, or inversely, what an object is referred to by.

Instruction Set Principles • How the Architect Can Help the Compiler Writer: • Guiding principle for compiler designer: Make the frequent cases fast and the rare cases correct. • Other guide lines: • Regularity: orthogonality (independence among the 3 components of ISA: operation, data type, and addressing mode) helps to make decision early and correctly; • Provide primitives, not solutions: support for HLL should be in ways that's not language dependent; • Simplify trade-offs among alternatives: (optimizing objectives) help the compiler writer understand costs of various alternatives; • Provide instructions that bind the quantities known at compile time as constants • It is better to err on the side of simplicity: less is more!!

Instruction Set Principles • The MIPS Architecture: • MIPS is a simple 64-bit load-store architecture. • 32 64-bit general purpose registers: • R0, R1, … R31 integer registers; Value of R0 is always 0. • 32 64-bit floating point registers: • F0, F1, … F31 floating point registers; • Data types: • 8-bit bytes, 16-bit half words, 32-bit words, and 64-bit double words for integers; • 32-bit single precision and 64-bit double precision for floating point. • Addressing modes: • Register; Immediate and displacement with 16-bit field. • Byte-addressable memory, a mode bit to allow software to select either Big Endian or Little Endian • Instruction encoding: fixed

Instruction Set Principles • The MIPS Instruction Format:

Instruction Set Principles • The MIPS Operations: • Load and store instructions

Instruction Set Principles • The MIPS Operations: • ALU instructins

Instruction Set Principles • The MIPS Operations: • Control flow instructions

Instruction Set Principles –MIPS Example

Instruction Set Principles –MIPS/DLX Example

Instruction Set Principles –MIPS Example

Instruction Set Principles

Instruction Set Principles –MIPS/DLX Example

Instruction Set Principles