1 / 37

Alternative Architectures Arithmetic/Logic Operations 18 September 2013

CDA 3101 Fall 2013 Introduction to Computer Organization. Alternative Architectures Arithmetic/Logic Operations 18 September 2013. Concept Review. Variable allocation Stack frames (caller/callee), static, heap Pointers Memory addresses

Download Presentation

Alternative Architectures Arithmetic/Logic Operations 18 September 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CDA 3101 Fall 2013Introduction to Computer Organization Alternative Architectures Arithmetic/Logic Operations 18 September 2013

  2. Concept Review • Variable allocation • Stack frames (caller/callee), static, heap • Pointers • Memory addresses • Necessary (to pass arguments: arrays, structures) • Efficient (pointer arithmetic) • Problems • Reference wrong memory location (segmentation fault) • Memory leakage

  3. Overview • ISA design alternatives • Design principles (tradeoffs) • CPU performance equation • RISC vs. CISC • Historical perspective • PowerPC and 80x86

  4. Computer Architectures • Accumulator • Hardware expensive => only one register • Accumulator: one of the operands and result • Memory-based operand addressing mode • Stack • No registers (simple compilers, compact encoding) • Special purpose registers (e.g. 8086) • General purpose registers • Register-memory • Register-register (load-store) • HLL computer architecture

  5. Instruction Set Architectures A = B + C; Accumulator Stack Load-store Load AddressB Add AddressC Store AddressA Push AddressC Push AddressB Add Pop AddressA Load R1, AddressB Load R2, AddressC Add R3, R1, R2 Store R3, AddressA Memory-memory Add AddressA, AddressB, AddressC CPUtime = IC * CPI * Cycle_time

  6. ISA Trends • Hardware and compiler technology trends • Hardware / software boundary swings back and forth EDSAC CISC RISC Post RISC (FISC) EPIC Multi-Core Compiler support Hardware support

  7. RISC Architecture • Reduced Instruction Set Computer • Design philosophy • Load-store • Fixed-length instructions • Three-address architecture • Plenty of registers • Simple addressing modes • Instruction pipelining • Many ideas used in modern computers have been taken from CDC 6600 (1963)

  8. PowerPC • Similar to MIPS: 32 registers, 32-bit instructions, RISC • Differences (tradeoffs: simplicity vs. common case) • Indexed addressing • Example: lw $t1,$a0+$s3 #$t1=Memory[$a0+$s3] • MIPS: add $t0, $a0, $s3; lw $t1,0($t0) • Update addressing • Update a register as part of load (for marching through arrays) • Example:lwu $t0,4($s3) #lw $t0,4($s3); addi $s3,$s3,4 • Unique instructions • Load multiple/store multiple: up to 32 words in a single instruction • Special counter register • bc Loop, $ctr!=0 #decrement counter, if not 0 goto loop • MIPS: addi, $t0, $t0, -1; bne $t0, $zero, Loop

  9. 80x86 Milestones • 1978: 8086, 16 bit architecture (64KB), no GPRs • 1980: 8087 FP coprocessor, 60+ instructions, 80-bit stack, no GPRs • 1982: 80286, 24-bit address space, protection model • 1985: 80386, 32 bits, new addressing modes, 8 GPRs • 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions (designed for higher performance) • 1997: MMX +57 instructions (SIMD) • 1999: PIII +70 multimedia instructions • 2000: P4 +144 multimedia instructions Golden handcuffs of upward compatibility => architecture is difficult to explain and impossible to love

  10. X86 Architecture • Two-address architecture • The destination is also one of the sources add $s1,$s0 # s0=s0+s1 (C: a += b;) • Benefit: smaller instructions  smaller code  faster • Register-memory architecture • One operand can be in memory; other operand is register add 12(%gp),%s0 # s0=s0+Mem[12+gp] • Benefit: fewer instructions  smaller code • Variable-length instructions (1 to 17 bytes) • Small code size (30% smaller) • Better instruction cache hit rates • Instructions can include 8- or 32-bit immediates

  11. X86 Features • Operating modes: real (8088), virtual, and protected • Four protection levels • Memory • Address space: 16,384 segments (4GB) • Little endian • 8 32-bit Registers (16-bit 8086 names with e prefix): • eax, ecx, edx, ebx, esp, ebp, esi, edi • Data types • Signed/unsigned integers (8, 16, and 32 bits) • Binary coded decimal integers • Floating point (32 and 64 bits) • Floating point uses a separate stack

  12. X86 Registers Main arithmetic register Pointers (memory addresses) Loops Multiplication and division Pointer to source string Pointer to destination string Base of the current stack frame ($fp) Stack pointer Support for 8088 attempt to address 220 bytes using 16-bit addresses Program counter Processor State Word

  13. X86 Instruction Formats • Highly complex and irregular • Six variable-length fields • Five fields are optional

  14. Examples of X86 Instruction Formats

  15. Integer Instructions • Arithmetic • ADD, SUB • CMP • SHL, SHR, RCR • CBW • TEST • INC, DEC • OR, NOR • String • MOVS • LODS • Control • JNZ, JZ • JMP • CALL • RET • LOOP • Data Transfer • MOV • PUSH, POP • LES

  16. Examples of X86 Instructions • leal (load effective address) • Calculate address like a load, but load address into register • Load 32-bit address: leal -4000000(%ebp),%esi # esi = ebp – 4000000 • Memory Stack is part of instruction set • call label(esp-=4; M[esp]=eip+5; eip = label) • push places value onto stack, increments esp • pop gets value from stack, decrements esp • incl, decl (increment, decrement) incl %edx # edx = edx + 1

  17. Addressing Modes Encoding • Highly irregular, non-orthogonal addressing modes • Instruction in 16-bit or 32-bit mode? • Not all modes apply to all instructions • Not all registers can be used in all modes

  18. Addressing Modes • Base reg + offset(like MIPS) • movl -8000044(%ebp), %eax • Base reg + index reg(2 regs form addr.) • movl (%eax,%ebx),%edi # edi = Mem[ebx + eax] • Scaled reg + index(shift one reg by 1,2) • movl(%eax,%edx,4),%ebx # ebx = Mem[edx*4 + eax] • Scaled reg + index + offset • movl 12(%eax,%edx,4),%ebx # ebx = Mem[edx*4 + eax + 12]

  19. Branch Support • Rather than compare registers, x86 uses special 1-bit registers called “condition codes” that are set as a side-effect of ALU operations • S - Sign Bit • Z - Zero (result is all 0) • C - Carry Out • P - Parity: set to 1 if even number of ones in rightmost 8 bits of operation • Conditional Branch instructions then use condition flags for all comparisons: <, <=, >, >=, ==, !=

  20. While Loop while (save[i]==k) i = i + j; MIPS X86 (i,j,k => %edx, %esi, %ebx) leal -400(%ebp),%eax .Loop: cmpl %ebx,(%eax,%edx,4) jne .Exit addl %esi,%edx j .Loop .Exit: (i,j,k => $s3, $s4, $s5) Loop: sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1) bne $t0, $s5, Exit add $s3, $s3, $s4 j Loop Exit:

  21. PIII, P4, and AMD • PC World magazine, Nov. 20, 2000 • WorldBench 2000 benchmark (business applications) • P4 score @ 1.5 GHz: 164 (higher is better) • PIII score @ 1.0 GHz: 167 • AMD Althon @ 1.2 GHz: 180 • (Media applications do better on P4 vs. PIII) • Why? => CPU performance equation • Time = Instruction count x CPI x 1/Clock rate • Instruction count is the same for x86 • Clock rates: P4 > Althon > PIII • How can P4 be slower? • Average CPI of P4 must be worse than Althon, PIII

  22. Summary • Instruction complexity is only one variable • lower instruction count vs. higher CPI / lower clock rate • Design Principles: • simplicity favors regularity • smaller is faster • good design demands compromise • make the common case fast • Instruction set architecture • a very important abstraction!

  23. New Topic – Arithmetic/Logic Ops • Arithmetic and logic unit (ALU) • Core of the a computer • Performs arithmetic and logical operations on data • Computer arithmetic issues • Number representation • Integers and floating point • Finite precision (overflow / underflow) • Algorithms used for the basic operations • Properties of number representation • One zero • As many positive numbers as negative numbers • Efficient hardware implementation of algorithms • 2’s complement: negate positive number and add one

  24. Review

  25. Overview I- instruction 32-bit memory address

  26. Addition • 5ten + 6ten • 0000 0000 0000 0000 0000 0000 0000 0101 (5ten) • 0000 0000 0000 0000 0000 0000 0000 0110 (6ten) • 0000 0000 0000 0000 0000 0000 0000 1011 (11ten) + = . . . (0)(1)(0) (0) (0) . . . 0 0 1 0 1 . . . 0 0 1 1 0 . . . 0 (0)1 (1)0 (0)1 (0)1 Carries +

  27. Subtraction • 12ten - 5ten • 0000 0000 0000 0000 0000 0000 0000 1100 (12ten) • 0000 0000 0000 0000 0000 0000 0000 0101 ( 5ten) • 0000 0000 0000 0000 0000 0000 0000 0111 ( 7ten) - = • 12ten - 5ten =12ten + (- 5ten) • 0000 0000 0000 0000 0000 0000 0000 1100 (12ten) • 1111 1111 1111 1111 1111 1111 1111 1011 ( -5ten) • 0000 0000 0000 0000 0000 0000 0000 0111 ( 7ten) + =

  28. Overflow • Computer arithmetic is not closed w.r.t. + - * / • Overflow • The result can not be expressed with 32 bits • Overflow can not occur • Addition: if the operands have different signs • Subtraction: if the operands have the same sign • Overflow detection • Result needs 33 bits • Addition: a carry out occurs into the sign bit • Subtraction: a borrow occurs from the sign bit

  29. Examples • 4 bits (instead of 32 in MIPS) => can represent integers in [-8 : 7] 7 + 6 0 1 1 1 ( 7ten) 0 1 1 0 ( 6ten) 1 1 0 1 (13ten) -7 + -6 1 0 0 1 ( -7ten) 1 0 1 0 ( -6ten) 0 0 1 1 (-13ten) + + -7 – 6 1 0 0 1 ( -7ten) 0 1 1 0 ( 6ten) 0 0 1 1 (-13ten) -7 – 6 = -7 + -6 1 0 0 1 ( -7ten) 1 0 1 0 ( -6ten) 0 0 1 1 (-13ten) - +

  30. Overflow Conditions

  31. MIPS Support • MIPS raises an Exception when overflow occurs • Exceptions (or interrupts) act like procedure calls • Register EPC stores address of offending instruction • mfc0 $t1, $epc # moves contents of EPC to $t1 • No conditional branch to test overflow • Two’s complement arithmetic (add, addi, and sub) • Exception on overflow • Unsigned arithmetic (addu and addiu) • No exception on overflow • Used for address arithmetic • Compilers • C ignores overflows (always uses addu, addiu, subu) • Fortran uses the appropriate instructions

  32. Conditional branch on overflow Signed addition addu $t0, $t1, $t2 # add but do not trap xor $t3, $t1, $t2 # check if sign differ slt $t3, $t3, $0 # $t3 =1 if signs differ bne $t3, $0, NO_OVFL # signs of t1, t2 different xor $t3, $t0, $t1 # sign of sum (t0) different? slt $t3, $t3, $0 # $t3 = 1 if sum has different sign bne $t3, $0, OVFL # go to overflow Unsigned addition (range = [0 : 232 – 1] => $t1 + $t2 <= 232 – 1) addu $t0, $t1, $t2 # $t0 contains the sum nor $t3, $t1, $0 # negate $t1 ($t3 = NOT $t1) sltu $t3, $t3, $t2 # 232 –1 – t1 < t2? bne $t3, $0, OVFL # t1 + t2 > 232 –1 => overflow

  33. Registers $k0 and $k1 Offending: . . . add $t0, $t1, $t2 . . . Registers Text Data Exception handling procedure EPC Stack • Exception handling procedure will use registers • Procedure calling conventions do not work • Reserve $k0 $k1 for the operating system Offending procedure

  34. Logical Operations • Operations on fields of bits within a 32-bit word • Characters (8 bits) • Bit fields (in C) • Logical operations to pack/unpack bits into words • sll shift left • srl shift right • and, andi bitwise AND • or, ori bitwise OR • Bitwise operators treat operand as vector of bits

  35. C Bit Fields struct { unsigned int ready: 1; unsigned int enable: 1; unsigned int receivedByte: 8; } receiver; int data = receiver.receiverByte; receiver.ready = 0; receiver.enable = 1; 31 10 9 2 1 0 $s1 receivedByte e r $s0 $s0 $s1 receivedByte e 0 $s1 receivedByte 1 0 #$s0: data; $s1: receiver sll $s0, $s1, 22 srl $s0, $s0, 24 andi $s1, $s1, 0xfffe ori $s1, $s1, 0x0002

  36. Conclusions • ISA supports architectural development • Hardware/Software, RISC/CISC emphasis • Technology driven • ALU = core of computer • ALU problem = overflow • Exception handling  • Think: Weekend! =>

More Related