Alternative Architectures Arithmetic/Logic Operations 18 September 2013

CDA 3101 Fall 2013Introduction to Computer Organization Alternative Architectures Arithmetic/Logic Operations 18 September 2013

Concept Review • Variable allocation • Stack frames (caller/callee), static, heap • Pointers • Memory addresses • Necessary (to pass arguments: arrays, structures) • Efficient (pointer arithmetic) • Problems • Reference wrong memory location (segmentation fault) • Memory leakage

Overview • ISA design alternatives • Design principles (tradeoffs) • CPU performance equation • RISC vs. CISC • Historical perspective • PowerPC and 80x86

Computer Architectures • Accumulator • Hardware expensive => only one register • Accumulator: one of the operands and result • Memory-based operand addressing mode • Stack • No registers (simple compilers, compact encoding) • Special purpose registers (e.g. 8086) • General purpose registers • Register-memory • Register-register (load-store) • HLL computer architecture

Instruction Set Architectures A = B + C; Accumulator Stack Load-store Load AddressB Add AddressC Store AddressA Push AddressC Push AddressB Add Pop AddressA Load R1, AddressB Load R2, AddressC Add R3, R1, R2 Store R3, AddressA Memory-memory Add AddressA, AddressB, AddressC CPUtime = IC * CPI * Cycle_time

ISA Trends • Hardware and compiler technology trends • Hardware / software boundary swings back and forth EDSAC CISC RISC Post RISC (FISC) EPIC Multi-Core Compiler support Hardware support

RISC Architecture • Reduced Instruction Set Computer • Design philosophy • Load-store • Fixed-length instructions • Three-address architecture • Plenty of registers • Simple addressing modes • Instruction pipelining • Many ideas used in modern computers have been taken from CDC 6600 (1963)

PowerPC • Similar to MIPS: 32 registers, 32-bit instructions, RISC • Differences (tradeoffs: simplicity vs. common case) • Indexed addressing • Example: lw $t1,$a0+$s3 #$t1=Memory[$a0+$s3] • MIPS: add $t0, $a0, $s3; lw $t1,0($t0) • Update addressing • Update a register as part of load (for marching through arrays) • Example:lwu $t0,4($s3) #lw $t0,4($s3); addi $s3,$s3,4 • Unique instructions • Load multiple/store multiple: up to 32 words in a single instruction • Special counter register • bc Loop, $ctr!=0 #decrement counter, if not 0 goto loop • MIPS: addi, $t0, $t0, -1; bne $t0, $zero, Loop

80x86 Milestones • 1978: 8086, 16 bit architecture (64KB), no GPRs • 1980: 8087 FP coprocessor, 60+ instructions, 80-bit stack, no GPRs • 1982: 80286, 24-bit address space, protection model • 1985: 80386, 32 bits, new addressing modes, 8 GPRs • 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions (designed for higher performance) • 1997: MMX +57 instructions (SIMD) • 1999: PIII +70 multimedia instructions • 2000: P4 +144 multimedia instructions Golden handcuffs of upward compatibility => architecture is difficult to explain and impossible to love

X86 Architecture • Two-address architecture • The destination is also one of the sources add $s1,$s0 # s0=s0+s1 (C: a += b;) • Benefit: smaller instructions  smaller code  faster • Register-memory architecture • One operand can be in memory; other operand is register add 12(%gp),%s0 # s0=s0+Mem[12+gp] • Benefit: fewer instructions  smaller code • Variable-length instructions (1 to 17 bytes) • Small code size (30% smaller) • Better instruction cache hit rates • Instructions can include 8- or 32-bit immediates

X86 Features • Operating modes: real (8088), virtual, and protected • Four protection levels • Memory • Address space: 16,384 segments (4GB) • Little endian • 8 32-bit Registers (16-bit 8086 names with e prefix): • eax, ecx, edx, ebx, esp, ebp, esi, edi • Data types • Signed/unsigned integers (8, 16, and 32 bits) • Binary coded decimal integers • Floating point (32 and 64 bits) • Floating point uses a separate stack

X86 Registers Main arithmetic register Pointers (memory addresses) Loops Multiplication and division Pointer to source string Pointer to destination string Base of the current stack frame ($fp) Stack pointer Support for 8088 attempt to address 220 bytes using 16-bit addresses Program counter Processor State Word

X86 Instruction Formats • Highly complex and irregular • Six variable-length fields • Five fields are optional

Examples of X86 Instruction Formats

Integer Instructions • Arithmetic • ADD, SUB • CMP • SHL, SHR, RCR • CBW • TEST • INC, DEC • OR, NOR • String • MOVS • LODS • Control • JNZ, JZ • JMP • CALL • RET • LOOP • Data Transfer • MOV • PUSH, POP • LES

Examples of X86 Instructions • leal (load effective address) • Calculate address like a load, but load address into register • Load 32-bit address: leal -4000000(%ebp),%esi # esi = ebp – 4000000 • Memory Stack is part of instruction set • call label(esp-=4; M[esp]=eip+5; eip = label) • push places value onto stack, increments esp • pop gets value from stack, decrements esp • incl, decl (increment, decrement) incl %edx # edx = edx + 1

Addressing Modes Encoding • Highly irregular, non-orthogonal addressing modes • Instruction in 16-bit or 32-bit mode? • Not all modes apply to all instructions • Not all registers can be used in all modes

Addressing Modes • Base reg + offset(like MIPS) • movl -8000044(%ebp), %eax • Base reg + index reg(2 regs form addr.) • movl (%eax,%ebx),%edi # edi = Mem[ebx + eax] • Scaled reg + index(shift one reg by 1,2) • movl(%eax,%edx,4),%ebx # ebx = Mem[edx*4 + eax] • Scaled reg + index + offset • movl 12(%eax,%edx,4),%ebx # ebx = Mem[edx*4 + eax + 12]

Branch Support • Rather than compare registers, x86 uses special 1-bit registers called “condition codes” that are set as a side-effect of ALU operations • S - Sign Bit • Z - Zero (result is all 0) • C - Carry Out • P - Parity: set to 1 if even number of ones in rightmost 8 bits of operation • Conditional Branch instructions then use condition flags for all comparisons: <, <=, >, >=, ==, !=

While Loop while (save[i]==k) i = i + j; MIPS X86 (i,j,k => %edx, %esi, %ebx) leal -400(%ebp),%eax .Loop: cmpl %ebx,(%eax,%edx,4) jne .Exit addl %esi,%edx j .Loop .Exit: (i,j,k => $s3, $s4, $s5) Loop: sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1) bne $t0, $s5, Exit add $s3, $s3, $s4 j Loop Exit:

PIII, P4, and AMD • PC World magazine, Nov. 20, 2000 • WorldBench 2000 benchmark (business applications) • P4 score @ 1.5 GHz: 164 (higher is better) • PIII score @ 1.0 GHz: 167 • AMD Althon @ 1.2 GHz: 180 • (Media applications do better on P4 vs. PIII) • Why? => CPU performance equation • Time = Instruction count x CPI x 1/Clock rate • Instruction count is the same for x86 • Clock rates: P4 > Althon > PIII • How can P4 be slower? • Average CPI of P4 must be worse than Althon, PIII

Summary • Instruction complexity is only one variable • lower instruction count vs. higher CPI / lower clock rate • Design Principles: • simplicity favors regularity • smaller is faster • good design demands compromise • make the common case fast • Instruction set architecture • a very important abstraction!

New Topic – Arithmetic/Logic Ops • Arithmetic and logic unit (ALU) • Core of the a computer • Performs arithmetic and logical operations on data • Computer arithmetic issues • Number representation • Integers and floating point • Finite precision (overflow / underflow) • Algorithms used for the basic operations • Properties of number representation • One zero • As many positive numbers as negative numbers • Efficient hardware implementation of algorithms • 2’s complement: negate positive number and add one

Review

Overview I- instruction 32-bit memory address

Addition • 5ten + 6ten • 0000 0000 0000 0000 0000 0000 0000 0101 (5ten) • 0000 0000 0000 0000 0000 0000 0000 0110 (6ten) • 0000 0000 0000 0000 0000 0000 0000 1011 (11ten) + = . . . (0)(1)(0) (0) (0) . . . 0 0 1 0 1 . . . 0 0 1 1 0 . . . 0 (0)1 (1)0 (0)1 (0)1 Carries +

Subtraction • 12ten - 5ten • 0000 0000 0000 0000 0000 0000 0000 1100 (12ten) • 0000 0000 0000 0000 0000 0000 0000 0101 ( 5ten) • 0000 0000 0000 0000 0000 0000 0000 0111 ( 7ten) - = • 12ten - 5ten =12ten + (- 5ten) • 0000 0000 0000 0000 0000 0000 0000 1100 (12ten) • 1111 1111 1111 1111 1111 1111 1111 1011 ( -5ten) • 0000 0000 0000 0000 0000 0000 0000 0111 ( 7ten) + =

Overflow • Computer arithmetic is not closed w.r.t. + - * / • Overflow • The result can not be expressed with 32 bits • Overflow can not occur • Addition: if the operands have different signs • Subtraction: if the operands have the same sign • Overflow detection • Result needs 33 bits • Addition: a carry out occurs into the sign bit • Subtraction: a borrow occurs from the sign bit

Examples • 4 bits (instead of 32 in MIPS) => can represent integers in [-8 : 7] 7 + 6 0 1 1 1 ( 7ten) 0 1 1 0 ( 6ten) 1 1 0 1 (13ten) -7 + -6 1 0 0 1 ( -7ten) 1 0 1 0 ( -6ten) 0 0 1 1 (-13ten) + + -7 – 6 1 0 0 1 ( -7ten) 0 1 1 0 ( 6ten) 0 0 1 1 (-13ten) -7 – 6 = -7 + -6 1 0 0 1 ( -7ten) 1 0 1 0 ( -6ten) 0 0 1 1 (-13ten) - +

Overflow Conditions

MIPS Support • MIPS raises an Exception when overflow occurs • Exceptions (or interrupts) act like procedure calls • Register EPC stores address of offending instruction • mfc0 $t1, $epc # moves contents of EPC to $t1 • No conditional branch to test overflow • Two’s complement arithmetic (add, addi, and sub) • Exception on overflow • Unsigned arithmetic (addu and addiu) • No exception on overflow • Used for address arithmetic • Compilers • C ignores overflows (always uses addu, addiu, subu) • Fortran uses the appropriate instructions

Conditional branch on overflow Signed addition addu $t0, $t1, $t2 # add but do not trap xor $t3, $t1, $t2 # check if sign differ slt $t3, $t3, $0 # $t3 =1 if signs differ bne $t3, $0, NO_OVFL # signs of t1, t2 different xor $t3, $t0, $t1 # sign of sum (t0) different? slt $t3, $t3, $0 # $t3 = 1 if sum has different sign bne $t3, $0, OVFL # go to overflow Unsigned addition (range = [0 : 232 – 1] => $t1 + $t2 <= 232 – 1) addu $t0, $t1, $t2 # $t0 contains the sum nor $t3, $t1, $0 # negate $t1 ($t3 = NOT $t1) sltu $t3, $t3, $t2 # 232 –1 – t1 < t2? bne $t3, $0, OVFL # t1 + t2 > 232 –1 => overflow

Registers $k0 and $k1 Offending: . . . add $t0, $t1, $t2 . . . Registers Text Data Exception handling procedure EPC Stack • Exception handling procedure will use registers • Procedure calling conventions do not work • Reserve $k0 $k1 for the operating system Offending procedure

Logical Operations • Operations on fields of bits within a 32-bit word • Characters (8 bits) • Bit fields (in C) • Logical operations to pack/unpack bits into words • sll shift left • srl shift right • and, andi bitwise AND • or, ori bitwise OR • Bitwise operators treat operand as vector of bits

C Bit Fields struct { unsigned int ready: 1; unsigned int enable: 1; unsigned int receivedByte: 8; } receiver; int data = receiver.receiverByte; receiver.ready = 0; receiver.enable = 1; 31 10 9 2 1 0 $s1 receivedByte e r $s0 $s0 $s1 receivedByte e 0 $s1 receivedByte 1 0 #$s0: data; $s1: receiver sll $s0, $s1, 22 srl $s0, $s0, 24 andi $s1, $s1, 0xfffe ori $s1, $s1, 0x0002

Conclusions • ISA supports architectural development • Hardware/Software, RISC/CISC emphasis • Technology driven • ALU = core of computer • ALU problem = overflow • Exception handling  • Think: Weekend! =>

Alternative Architectures Arithmetic/Logic Operations 18 September 2013