Arithmetic / Logic Unit – ALU Design Presentation F

CSE 675.02: Introduction to Computer Architecture Arithmetic / Logic Unit – ALU DesignPresentation F Slides by Gojko Babić

ALU Control A 32 Result 32 32-bit ALU Zero Overflow Carry out B 32 32-bit ALU • Our ALU should be able to perform functions: • – logical and function • – logical or function • – arithmetic add function • – arithmetic subtract function • – arithmetic slt (set-less-then) function • – logical nor function • ALU control lines define a function to be performed on A and B. Presentation F

ALU Control A 32 Result 32 32-bit ALU Zero Overflow Carry out B 32 Functioning of 32-bit ALU ALU Control lines 4 • Result lines provide result of the chosen function applied to values of • A and B • Since this ALU operates on 32-bit operands, it is called 32-bit ALU • Zero output indicates if all Result lines have value 0 • Overflow indicates a sign integer overflow of add and subtract functions; • for unsigned integers, this overflow indicator does not provide any useful • information • Carry out indicates carry out and unsigned integer overflow Presentation F

a0 b0 0 1 a1 b1 0 1 a2 b2 0 1 a31 b31 0 1 Designing 32-bit ALU: Beginning • Let us start with and function • Let us now add or function = 0  and = 1  or Operation Result0 Result1 Result2 Result31

a0 b0 and or 0 1 Result0 a1 b1 and or Result1 0 1 a2 b2 and or Result2 0 1 a31 b31 and or Result31 0 1 Designing 32-bit ALU: Principles = 0  and = 1  or • Number of functions • are performed inter- • nally, but only one • result is chosen for • the output of ALU • 32-bit ALU is built • out of 32 identical • 1-bit ALU’s Operation

Designing Adder • 32-bit adder is built out of 321-bit adders 1-bit Adder Truth Table 1-bit Adder Input Output Figure B.5.2 From the truth table and after minimization, we can have this design for CarryOut Figure B.5.5 Presentation F

a0 b0 Cin sum0 + Cout Cin a1 b1 sum1 + Cout Cin a2 b2 sum2 + Cout Cin a31 b31 sum31 + Cout 32-bit Adder “0” This is a ripple carry adder. The key to speeding up addition is determining carry out in the higher order bits sooner. Result: Carry look-ahead adder. Carry out Presentation F

=0 CarryOut 32-bit ALU With 3 Functions 1-bit ALU Operation = 00  and = 01  or = 10  add Figure B.5.6 Figure B.5.7 + carry out Presentation F

A – B = A + (–B) = A + B + 1 a0 b0 Cin Result0 + Cout Cin a1 b1 Result1 + Cout Cin a2 b2 Result2 + Cout Cin a31 b31 Result31 + Cout CarryOut 32-bit Subtractor “1” “0” Presentation F

“0” a0 b0 Cin Result0 + 0 1 Cout Cin a1 b1 Result1 + 0 1 Cout Cin a2 b2 Result2 + 0 1 Cout Cin a31 b31 Result31 + 0 1 Cout CarryOut 0 1 32-bit Adder / Subtractor binvert Binvert = 0  addition = 1  subtraction

B i n v e r t O p e r a t i o n B i n v e r t O p e r a t i o n C a r r y I n a 0 C a r r y I n b 0 R e s u l t 0 A L U 0 a 0 C a r r y O u t 1 R e s u l t a 1 C a r r y I n 0 b 2 b 1 R e s u l t 1 A L U 1 1 C a r r y O u t C a r r y O u t a 2 C a r r y I n b 2 R e s u l t 2 A L U 2 C a r r y O u t C a r r y I n a 3 1 R e s u l t 3 1 C a r r y I n b 3 1 A L U 3 1 0 1 Carry Out 32-bit ALU With 4 Functions 1-bit ALU Figure B.5.8 Control lines Presentation F

B i n v e r t O p e r a t i o n C a r r y I n a 0 1 R e s u l t + b 0 2 1 L e s s 3 Carry Out O v e r f l o w O v e r f l o w d e t e c t i o n 2’s Complement Overflow 1-bit ALU for the most significant bit • 2’s complement overflow happens: • if sum of two positive numbers • results in a negative number • if sum of two negative numbers • results in a positive number Other 1-bit ALUs, i.e. non-most significant bit ALUs, are not affected. Presentation F

B i n v e r t O p e r a t i o n a 0 C a r r y I n b 0 R e s u l t 0 A L U 0 C a r r y O u t a 1 C a r r y I n b 1 R e s u l t 1 A L U 1 C a r r y O u t a 2 C a r r y I n b 2 R e s u l t 2 A L U 2 C a r r y O u t C a r r y I n a 3 1 R e s u l t 3 1 C a r r y I n b 3 1 A L U 3 1 O v e r f l o w Carry Out 32-bit ALU With 4 Functions and Overflow Control lines Missing: slt & nor functions and Zero output Presentation F Add correction for CarryOut

Set Less Than (slt) Function • slt function is defined as: 000 … 001 if A < B, i.e. if A – B < 0 A slt B = 000 … 000 if A ≥ B, i.e. if A – B ≥ 0 • Thus each 1-bit ALU should have an additional input (called “Less”), that will provide results for sltfunction. This input has value 0 for all but 1-bit ALU for the least significant bit. • For the least significant bit Less value should be sign of A – B Presentation F

B i n v e r t O p e r a t i o n C a r r y I n B i n v e r t O p e r a t i o n a 0 1 a 0 C a r r y I n R e s u l t b 0 R e s u l t 0 A L U 0 b 0 2 L e s s C a r r y O u t 1 L e s s 3 a 1 C a r r y I n b 1 R e s u l t 1 A L U 1 C a r r y O u t 0 L e s s C a r r y O u t 1-bit ALU for the most significant bits B i n v e r t O p e r a t i o n C a r r y I n a 2 C a r r y I n b 2 R e s u l t 2 A L U 2 a 0 L e s s 0 C a r r y O u t 1 R e s u l t + b 0 2 C a r r y I n 1 L e s s 3 a 3 1 R e s u l t 3 1 C a r r y I n S e t b 3 1 A L U 3 1 S e t Carry Out 0 L e s s O v e r f l o w O v e r f l o w O v e r f l o w d e t e c t i o n Carry Out 32-bit ALU With 5 Functions 1-bit ALU for non- most significant bits Add correction for CarryOut Operation = 3 and Binvert =1 for slt function

Carry Out 32-bit ALU with 5 Functions and Zero Binvert Control lines Add correction for CarryOut Presentation F

Carry Out 32-bit ALU with 6 Functions A nor B = A and B Binvert Figure B.5.10 (Top) Add correction for CarryOut Figure B.5.12 + Carry Out + Binvert

32-bit ALU Elaboration • We have now accounted for all but one of the arithmetic and logic functions for the core MIPS instruction set. 32-bit ALU with 6 functions omits support for shift instructions. • It would be possible to widen 1-bit ALU multiplexer to include 1-bit shift left and/or 1-bit shift right. • Hardware designers created the circuit called a barrel shifter, which can shift from 1 to 31 bits in no more time than it takes to add two 32-bit numbers. Thus, shifting is normally done outside the ALU. • We now consider integer multiplication (but not division). Presentation F

Multiplication • Multiplication is more complicated than addition: • accomplished via shifting and addition • More time and more area required • Let's look at 3 versions based on elementary school algorithm • Example of unsigned multiplication: 5-bit multiplicand 100012 = 1710 5-bit multiplier ×100112 = 1910 10001 10001 00000 00000 10001 . 1010000112 = 32310 • But, this algorithm is very impractical to implement in hardware Presentation F

Multiplication : Example • The multiplication can be done with intermediate additions. • The same example: multiplicand 10001 multiplier ×10011 intermediate product 0000000000 add since multiplier bit=1 10001 intermediate product 0000010001 shift multiplicand and add since multiplier bit=1 10001 intermediate product 0000110011 shift multiplicand and no addition since multiplier bit=0 shift multiplicand and no addition since multiplier bit=0 shift multiplicand and add multiplier since bit=1 10001 final result 0101000011 Presentation F

S t a r t M u l t i p l i e r 0 = 1 M u l t i p l i e r 0 = 0 1 . T e s t M u l t i p l i e r 0 M u l t i p l i c a n d 1 a . A d d m u l t i p l i c a n d t o p r o d u c t a n d S h i f t l e f t p l a c e t h e r e s u l t i n P r o d u c t r e g i s t e r 6 4 b i t s M u l t i p l i e r 6 4 - b i t A L U S h i f t r i g h t 2 . S h i f t t h e M u l t i p l i c a n d r e g i s t e r l e f t 1 b i t 3 2 b i t s P r o d u c t C o n t r o l t e s t W r i t e 3 . S h i f t t h e M u l t i p l i e r r e g i s t e r r i g h t 1 b i t 6 4 b i t s N o : < 3 2 r e p e t i t i o n s 3 2 n d r e p e t i t i o n ? Y e s : 3 2 r e p e t i t i o n s D o n e Multiplication Hardware: 1st Version Figure 3.5 Figure 3.6 Presentation F

S t a r t M u l t i p l i e r 0 = 1 M u l t i p l i e r 0 = 0 1 . T e s t M u l t i p l i e r 0 M u l t i p l i c a n d 1 a . A d d m u l t i p l i c a n d t o t h e l e f t h a l f o f t h e p r o d u c t a n d p l a c e t h e r e s u l t i n 3 2 b i t s t h e l e f t h a l f o f t h e P r o d u c t r e g i s t e r M u l t i p l i e r 3 2 - b i t A L U S h i f t r i g h t 3 2 b i t s 2 . S h i f t t h e P r o d u c t r e g i s t e r r i g h t 1 b i t S h i f t r i g h t P r o d u c t C o n t r o l t e s t W r i t e 3 . S h i f t t h e M u l t i p l i e r r e g i s t e r r i g h t 1 b i t 6 4 b i t s N o : < 3 2 r e p e t i t i o n s 3 2 n d r e p e t i t i o n ? Y e s : 3 2 r e p e t i t i o n s D o n e Multiplication Hardware: 2nd Version Presentation F

S t a r t P r o d u c t 0 = 1 P r o d u c t 0 = 0 1 . T e s t P r o d u c t 0 M u l t i p l i c a n d 3 2 b i t s 1 a . A d d m u l t i p l i c a n d t o t h e l e f t h a l f o f t h e p r o d u c t a n d p l a c e t h e r e s u l t i n t h e l e f t h a l f o f t h e P r o d u c t r e g i s t e r 3 2 - b i t A L U C o n t r o l S h i f t r i g h t 2 . S h i f t t h e P r o d u c t r e g i s t e r r i g h t 1 b i t P r o d u c t t e s t W r i t e 6 4 b i t s N o : < 3 2 r e p e t i t i o n s 3 2 n d r e p e t i t i o n ? Y e s : 3 2 r e p e t i t i o n s D o n e Multiplication Hardware: 3rd Version Figure 3.7 Presentation F

Multiplication of Signed Integers • A simple algorithm: • Convert to positive integer any of operands (if needed) and remember original signs • Perform multiplication of unsigned numbers using the existing algorithm and hardware • Negate product if original signs disagree • This algorithm is not simple to implement in hardware, since it has to: • account in advance about signs, • if needed, convert from negative to positive numbers, • if needed, convert back to negative integer at the end • Fast multiplication algorithms. Presentation F

Real Numbers • Conversion from real binary to real decimal – 1101.10112 = – 13.687510 since: 11012 = 23 + 22 + 20 = 1310 and 0.10112 = 2-1 + 2-3 + 2-4 = 0.5 + 0.125 + 0.0625 = 0.687510 • Conversion from real decimal to real binary: +927.4510 = + 1110011111.01 1100 1100 1100 ….. 927/2 = 463 + ½  LSB 0.45 × 2 = 0.9 463/2 = 231 + ½ 0.9 × 2 = 1.8 231/2 = 155 + ½ 0.8 × 2 = 1.6 155/2 = 57 + ½ 0.6 × 2 = 1.2 57/2 = 28 + ½ 0.2 × 2 = 0.4 28/2 = 14 + 0 0.4 × 2 = 0.8 14/2 = 7 + 0 0.8 × 2 = 1.6 7/2 = 3 + ½ 0.6 × 2 = 1.2 3/2 = 1 + ½ 0.2 × 2 = 0.4 1/2 = 0 + ½ 0.4 × 2 = 0.8 …… Presentation F

31 30 23 22 0 s EFraction 63 62 52 51 32 s E Fraction 31 0 Fraction Floating Point Number Formats • The term floating point number refers to representation of real binary numbers in computers. • IEEE 754 standard defines standards for floating point representations • Single precision: • Double precision: Presentation F

Converting to Floating Point 1. Normalize binary real number i.e. put it into the normalized form: (-1)s× 1.Fraction * 2Exp -1101.10112 = (-1)1× 1.1011011 * 23 +1110011111.011100 = (-1)0 × 1.110011111011100 * 29 2. Load fields of single or double precision format with values from normalized form, but with the adjustment for E field. E = Exp + 12710 = Exp + 011111112 for single precision E = Exp + 102310 = Exp + 011111111112 for double precision • E is called a biased exponent. Presentation F

Floating Point: Example 1 • Find single and double precision of –13.687510 Normalized form: (-1)1× 1.1011011 × 23 • single precision: E = 112 + 011111112 = 100000102 |1|10000010|10110110000000000000000| • double precision E = 112 + 011111111112 = 100000000102 |1|10000000010|10110110000000000000| |00000000000000000000000000000000| Presentation F

Floating Point: Example 2 • Find single and double precision of +927.4510 Normalized form: (-1)0× 1.110011111011100 * 29 • single precision E = 10012 + 011111112 = 100010002 |0|10001000|11001111101110011001100|1100... truncation |0|10001000|11001111101110011001100| rounding |0|10001000|11001111101110011001101| • double precision E = 10012 + 011111111112 = 10000001000 |0|10000001000|11001111101110011001| |10011001100110011001100110011001|1001100… truncation |10011001100110011001100110011001| rounding |10011001100110011001100110011010| Presentation F

Converting to Floating Point: Conclusion • Rules for biased exponents in single precision apply only for real exponents in the range [-126,127], thus we can have biased exponents only in the range [1,254]. • The number 0.0 is represented as S=0, E=0 and Fraction=0. The infinite number is represented with E=255. There are some additional rules that are outside our scope. • Find the largest (non-infinite) real binary number (by magnitude) which can be represented in a single precision. • Floating point overflow • Find the smallest (non-zero) real binary number (by magnitude) which can be represented in a single precision. • Floating point underflow Presentation F

Floating Point Addition Figure 3.16 Presentation F

Arithmetic Unit for Floating Point Addition Figure 3.17 Presentation F

Conclusion • We can build an ALU to support the MIPS instruction set • key idea: use multiplexor to select the output we want • we can efficiently perform subtraction using two’s complement • we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware • all of the gates are always working • the speed of a gate is affected by the number of inputs to the gate • the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • Clever changes to organization can improve performance (similar to using better algorithms in software) • We saw this in multiplication, let’s look at addition now

Arithmetic / Logic Unit – ALU Design Presentation F