Computer Architecture

Computer Architecture Chapter 2 Instruction Sets Prof. Jerry Breecher CSCI 240 Fall 2001

Introduction 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture Bonus Chap. 2 - Instruction Sets

software instruction set hardware Introduction The Instruction Set Architecture is that portion of the machine visible to the assembly level programmer or to the compiler writer. • What are the advantages and disadvantages of various instruction set alternatives. • How do languages and compilers affect ISA. • Use the DLX architecture as an example of a RISC architecture. Chap. 2 - Instruction Sets

Classifying Instruction Set Architectures 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture Classifications can be by: • Stack/accumulator/register • Number of memory operands. • Number of total operands. Chap. 2 - Instruction Sets

Basic ISA Classes Instruction Set Architectures Accumulator: 1 address add A acc ¬ acc + mem[A] 1+x address addx A acc ¬ acc + mem[A + x] Stack: 0 address add tos ¬ tos + next General Purpose Register: 2 address add A B EA(A) ¬ EA(A) + EA(B) 3 address add A B C EA(A) ¬ EA(B) + EA(C) Load/Store: 0 Memory load R1, Mem1 load R2, Mem2 add R1, R2 1 Memory add R1, Mem2 ALU Instructions can have two or three operands. ALU Instructions can have 0, 1, 2, 3 operands. Shown here are cases of 0 and 1. Chap. 2 - Instruction Sets

Basic ISA Classes Instruction Set Architectures The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B. Registers are the class that won out. The more registers on the CPU, the better. Chap. 2 - Instruction Sets

Intel 80x86 Integer Registers Instruction Set Architectures Chap. 2 - Instruction Sets

Memory Addressing 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture Sections Include: Interpreting Memory Addresses Addressing Modes Displacement Address Mode Immediate Address Mode Chap. 2 - Instruction Sets

Interpreting Memory Addresses Memory Addressing What object is accessed as a function of the address and length? Objects have byte addresses – an address refers to the number of bytes counted from the beginning of memory. Little Endian – puts the byte whose address is xx00 at the least significant position in the word. Big Endian – puts the byte whose address is xx00 at the most significant position in the word. Alignment – data must be aligned on a boundary equal to its size. Misalignment typically results in an alignment fault that must be handled by the Operating System. Chap. 2 - Instruction Sets

Addressing Modes Memory Addressing This table shows the most common modes. A more complete set is in Figure 2.5 Chap. 2 - Instruction Sets

Displacement Addressing Mode Memory Addressing How big should the displacement be? For addresses that do fit in displacement size: Add R4, 10000 (R0) For addresses that don’t fit in displacement size, the compiler must do the following: Load R1, address Add R4, 0 (R4) Depends on typical displaces as to how big this should be. On both IA32 and DLX, the space allocated is 16 bits. Chap. 2 - Instruction Sets

Immediate Address Mode Memory Addressing Used where we want to get to a numerical value in an instruction. At high level: a = b + 3; if ( a > 17 ) goto Addr At Assembler level: Load R2, 3 Add R0, R1, R2 Load R2, 17 CMPBGT R1, R2 Load R1, Address Jump (R1) So how would you get a 32 bit value into a register? Chap. 2 - Instruction Sets

Operations In The Instruction Set 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture Sections Include: Detailed information about types of instructions. Instructions for Control Flow (conditional branches, jumps) Chap. 2 - Instruction Sets

Operator Types Operations In The Instruction Set Arithmetic and logical and, add Data transfer move, load Control branch, jump, call System system call, traps Floating point add, mul, div, sqrt Decimal add, convert String move, compare Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS Chap. 2 - Instruction Sets

Control Instructions Operations In The Instruction Set Conditional branches are 20% of all instructions!! Control Instructions Issues: • taken or not • where is the target • link return address • save or restore Instructions that change the PC: • (conditional) branches, (unconditional) jumps • function calls, function returns • system calls, system returns Chap. 2 - Instruction Sets

Control Instructions Operations In The Instruction Set • There are numerous tradeoffs: • condition in generalpurpose register • + no special state but uses up a register • -- branch condition separate from branch logic in pipeline • some data for MIPS • > 80% branches use immediate data, > 80% of those zero • 50% branches use == 0 or <> 0 • compromise in MIPS • branch==0, branch<>0 • compare instructions for all other compares There are numerous tradeoffs: Compare and branch + no extra compare, no state passed between instructions -- requires ALU op, restricts code scheduling opportunities Implicitly set condition codes Z, N, V, C + can be set ``for free'' -- constrains code reordering, extra state to save/restore Explicitly set condition codes + can be set ``for free'', decouples branch/fetch from pipeline -- extra state to save/restore Chap. 2 - Instruction Sets

Control Instructions Operations In The Instruction Set • Save or restore state: • What state? • function calls: registers • system calls: registers, flags, PC, PSW, etc • Hardware need not save registers • Caller can save registers in use • Callee save registers it will use • Hardware register save • IBM STM, VAX CALLS • Faster? • Many recent architectures do no register saving • Or do implicit register saving with register windows (SPARC) Link Return Address: implicit register many recent architectures use this + fast, simple -- s/w save register before next call, surprise traps? explicit register + may avoid saving register -- register must be specified processor stack + recursion direct -- complex instructions Chap. 2 - Instruction Sets

Type And Size of Operands 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture The type of the operand is usually encoded in the Opcode – a LDW implies loading of a word. Common sizes are: Character (1 byte) Half word (16 bits) Word (32 bits) Single Precision Floating Point (1 Word) Double Precision Floating Point (2 Words) Integers are two’s complement binary. Floating point is IEEE 754. Some languages (like COBOL) use packed decimal. Chap. 2 - Instruction Sets

Encoding And Instruction Set 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture This section has to do with how an assembly level instruction is encoded into binary. Ultimately, it’s the binary that is read and interpreted by the machine. We will be using the Intel instruction set which is defined at: http://developer.intel.com/design/Pentium4/manuals. Volume 2 has the instruction set. Chap. 2 - Instruction Sets

80x86 Instruction Encoding Encoding And Instruction Set Here’s some sample code that’s been disassembled. It was compiled with the debugger option so is not optimized. for ( index = 0; index < iterations; index++ ) 0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0 0040D3B6 EB 09 jmp main+0D1h (0040d3c1) 0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h] 0040D3BB 83 C1 01 add ecx,1 0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx 0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h] 0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8] 0040D3C7 7D 15 jge main+0EEh (0040d3de) long_temp = (*alignment + long_temp) % 47; 0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch] 0040D3CC 8B 00 mov eax,dword ptr [eax] 0040D3CE 03 45 EC add eax,dword ptr [ebp-14h] 0040D3D1 99 cdq 0040D3D2 B9 2F 00 00 00 mov ecx,2Fh 0040D3D7 F7 F9 idiv eax,ecx 0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx 0040D3DC EB DA jmp main+0C8h (0040d3b8) This code was produced using Visual Studio Chap. 2 - Instruction Sets

80x86 Instruction Encoding Encoding And Instruction Set Here’s some sample code that’s been disassembled. It was compiled with optimization for ( index = 0; index < iterations; index++ ) 00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h] 00401006 33 D2 xor edx,edx 00401008 85 C9 test ecx,ecx 0040100A 7E 14 jle 00401020 0040100C 56 push esi 0040100D 57 push edi 0040100E 8B F1 mov esi,ecx long_temp = (*alignment + long_temp) % 47; 00401010 8D 04 11 lea eax,[ecx+edx] 00401013 BF 2F 00 00 00 mov edi,2Fh 00401018 99 cdq 00401019 F7 FF idiv eax,edi 0040101B 4E dec esi 0040101C 75 F2 jne 00401010 0040101E 5F pop edi 0040101F 5E pop esi 00401020 C3 ret This code was produced using Visual Studio Chap. 2 - Instruction Sets

80x86 Instruction Encoding Encoding And Instruction Set Here’s some sample code that’s been disassembled. It was compiled with optimization for ( index = 0; index < iterations; index++ ) 0x804852f <main+143>: add $0x10,%esp 0x8048532 <main+146>: lea 0xfffffff8(%ebp),%edx 0x8048535 <main+149>: test %esi,%esi 0x8048537 <main+151>: jle 0x8048543 <main+163> 0x8048539 <main+153>: mov %esi,%eax 0x804853b <main+155>: nop 0x804853c <main+156>: lea 0x0(%esi,1),%esi long_temp = (*alignment + long_temp) % 47; 0x8048540 <main+160>: dec %eax 0x8048541 <main+161>: jne 0x8048540 <main+160> 0x8048543 <main+163>: add $0xfffffff4,%esp This code was produced using gcc and gdb. For details, see Lab 2.1 Note that the representation of the code is dependent on the compiler/debugger! Chap. 2 - Instruction Sets

Encoding And Instruction Set 80x86 Instruction Encoding 4 3 1 8 A Morass of disjoint encoding!! ADD Reg W Disp. 6 2 8 8 SHL V/w postbyte Disp. This is Figure D.8 7 1 8 8 TEST W postbyte Immediate Chap. 2 - Instruction Sets

Encoding And Instruction Set 4 4 8 80x86 Instruction Encoding JE Cond Disp. 16 16 8 CALLF Offset Segment Number 6 2 8 8 MOV D/w postbyte Disp. 5 3 PUSH Reg Chap. 2 - Instruction Sets

Encoding And Instruction Set Here’s the instruction that we had several pages ago: 0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0 Is described in: http://developer.intel.com/design/Pentium4/manuals/24547103.pdf (I found it on page 472, but this is obviously version dependent.) 80x86 Instruction Encoding C7 /0 MOV r/m32,imm32 Move an immediate 32 bit data item to a register or to memory. Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, or a doubleword. In our case, because of the “C7” Opcode, we know it’s a sub-flavor of MOV putting an immediate value into memory. C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0 Op Code for Mov Immediate 32 bits of 0. This is -10 hex. Target Register + use next 8 bits as displacement. Chap. 2 - Instruction Sets

The Role of Compilers 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture Compiler goals: • All correct programs execute correctly • Most compiled programs execute fast (optimizations) • Fast compilation • Debugging support Chap. 2 - Instruction Sets

The Role of Compilers Parsing > intermediate representation Jump Optimization Loop Optimizations Register Allocation Code Generation > assembly code Common SubExpression Procedure in-lining Constant Propagation Strength Reduction Pipeline Scheduling Steps In Compilation Chap. 2 - Instruction Sets

The Role of Compilers Steps In Compilation Chap. 2 - Instruction Sets

The Role of Compilers • regularity • orthogonality • composability • Compilers perform a giant case analysis • too many choices make it hard • Orthogonal instruction sets • operation, addressing mode, data type • One solution or all possible solutions • 2 branch conditions eq, lt • or all six eq, ne, lt, gt, le, ge • not 3 or 4 • There are advantages to having instructions that are primitives. • Let the compiler put the instructions together to make more complex sequences. What compiler writers want: Chap. 2 - Instruction Sets

The DLX Architecture 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture DLX (pronounced DELUX) is an instruction set introduced by Hennessy and Patterson in the 1st edition of this text. DLX is very RISC oriented. DLX will be used for many examples throughout the course. Chap. 2 - Instruction Sets

The DLX Architecture DLX Characteristics RISC strongly related to MIPS 32bit byte addresses aligned Load/store only displacement addressing Standard datatypes 3 fixed length formats 32 32bit GPRs (r0 = 0) 16 64bit (32 32bit) FPRs FP status register No Condition Codes • Data transfer • load/store word, load/store byte/halfword signed? • load/store FP single/double • moves between GPRs and FPRs • ALU • add/subtract signed? immediate? • multiply/divide signed? • and,or,xor immediate?, shifts: ll, rl, ra immediate? • sets immediate? Chap. 2 - Instruction Sets

The DLX Architecture DLX Characteristics • Control • branches == 0, <> 0 • conditional branch testing FP bit • jump, jump register • jump & link, jump & link register • trap, returnfromexception • Floating Point • add/sub/mul/div • single/double • fp converts, fp set Chap. 2 - Instruction Sets

Register-Register 6 5 11 10 31 26 25 21 20 16 15 0 Op Rs1 Rs2 Rd Opx Register-Immediate 31 26 25 21 20 16 15 0 immediate Op Rs1 Rd Branch 31 26 25 21 20 16 15 0 immediate Op Rs1 Rs2/Opx Jump / Call 31 26 25 0 target Op The DLX Architecture The DLX Encoding Chap. 2 - Instruction Sets

RISC versus CISC BONUS combines 3 features • architecture • implementation • compilers and OS argues that • implementation effects are second order • compilers are similar • RISCs are better than CISCs: fair comparison? • NEEDS MORE WORK Chap. 2 - Instruction Sets

RISC versus CISC BONUS RISC factor: {CPI VAX * Instr VAX }/ {CPI MIPS * Instr MIPS } Benchmark Instruction CPI CPI CPI RISC Ratio MIPS VAX Ratio factor li 1.6 1.1 6.5 6.0 3.7 eqntott 1.1 1.3 4.4 3.5 3.3 fpppp 2.9 1.5 15.2 10.5 2.7 tomcatv 2.9 2.1 17.5 8.2 2.9 Chap. 2 - Instruction Sets

RISC versus CISC BONUS Factors favoring MIPS • Operand specifier decoding • Number of registers • Separate floating point unit • Simple branches/jumps (lower latency) • No complex instructions • Instruction scheduling • Translation buffer • Branch displacement size Compensating factors • Increase VAX CPI but decrease VAX instruction count • Increase MIPS instruction count • e.g. 1: loads/stores versus operand specifiers • e.g. 2: necessary complex instructions: loop branches Factors favoring VAX • Big immediate values • Nottaken branches incur no delay Chap. 2 - Instruction Sets

Wrapup 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture Bonus Chap. 2 - Instruction Sets

Computer Architecture