430 likes | 557 Views
Lecture 13: Integer Arithmetic and Floating Point cont. CS 2011. Fall 2014, Dr. Rozier. BOMB LAB STATUS. MIDTERM II. Midterm II. November 13 th Plan for remaining time. FLOATING POINT. Representation Bits to right of “ binary point ” represent fractional powers of 2
E N D
Lecture 13: Integer Arithmetic and Floating Point cont. CS 2011 Fall 2014, Dr. Rozier
Midterm II November 13th Plan for remaining time
Representation Bits to right of “binary point” represent fractional powers of 2 Represents rational number: Carnegie Mellon Fractional Binary Numbers • • • • • •
Limitation Can only exactly represent numbers of the form x/2k Other rational numbers have repeating bit representations Value Representation 1/3 0.0101010101[01]…2 1/5 0.001100110011[0011]…2 1/10 0.0001100110011[0011]…2 Carnegie Mellon Representable Numbers
Defined by IEEE Std 754-1985 Developed in response to divergence of representations Portability issues for scientific code Now almost universally adopted Two representations Single precision (32-bit) Double precision (64-bit) Floating Point Standard
S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 ≤ |significand| < 2.0 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the “1.” restored Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203 IEEE Floating-Point Format single: 8 bitsdouble: 11 bits single: 23 bitsdouble: 52 bits S Exponent Fraction
Consider a 4-digit decimal example 9.999 × 101 + 1.610 × 10–1 1. Align decimal points Shift number with smaller exponent 9.999 × 101 + 0.016 × 101 2. Add significands 9.999 × 101 + 0.016 × 101 = 10.015 × 101 3. Normalize result & check for over/underflow 1.0015 × 102 4. Round and renormalize if necessary 1.002 × 102 Floating-Point Addition
Now consider a 4-digit binary example 1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375) 1. Align binary points Shift number with smaller exponent 1.0002 × 2–1 + –0.1112 × 2–1 2. Add significands 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1 3. Normalize result & check for over/underflow 1.0002 × 2–4, with no over/underflow 4. Round and renormalize if necessary 1.0002 × 2–4 (no change) = 0.0625 Floating-Point Addition
FP Adder Hardware Step 1 Step 2 Step 3 Step 4
Start with long-multiplication approach 1000 × 1001 1000 0000 0000 1000 1001000 Multiplication multiplicand multiplier product Length of product is the sum of operand lengths
Start with long-multiplication approach 1000 × 1001 1000 0000 0000 1000 1001000 Multiplication multiplicand multiplier product Length of product is the sum of operand lengths Why?
1000 × 1001 1000 0000 0000 1000 1001000 How could we implement this in a better way? • What is unique about binary multiplication?
Multiplication Hardware Initially 0
1000 × 1001 1000 1000 Multiplying Add
1000 × 100 10000 0000 10000 Multiplying Shift! Add Shift!
1000 × 10 100000 0000 100000 Multiplying Shift! Add Shift!
1000 × 1 1000000 1000 1001000 Multiplying Shift! Add Shift!
1000 × 1001000 Multiplying Shift! Done!
Perform steps in parallel: add/shift Optimized Multiplier • One cycle per partial-product addition • That’s ok, if frequency of multiplications is low
Uses multiple adders Cost/performance tradeoff Faster Multiplier • Can be pipelined • Several multiplication performed in parallel
Computing Exact Product of w-bit numbers x, y Either signed or unsigned Ranges Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1 Up to 2w bits Two’s complement min: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1 Up to 2w–1 bits Two’s complement max: x * y ≤ (–2w–1) 2 = 22w–2 Up to 2w bits, but only for (TMinw)2 Maintaining Exact Results Would need to keep expanding word size with each product computed Done in software by “arbitrary precision” arithmetic packages Multiplication
Standard Multiplication Function Ignores high order w bits Implements Modular Arithmetic UMultw(u , v) = u · v mod 2w • • • • • • • • • • • • • • • Unsigned Multiplication in C u Operands: w bits * v u · v True Product: 2*w bits UMultw(u , v) Discard w bits: w bits
SUN XDR library Widely used library for transferring data between machines ele_src malloc(ele_cnt * ele_size) Code Security Example #2 void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);
XDR Code void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL) /* malloc failed */ return NULL; void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size); /* Move pointer to next memory region */ next += ele_size; } return result; }
What if: ele_cnt = 220 + 1 ele_size = 4096 = 212 Allocation = ?? How can I make this function secure? XDR Vulnerability malloc(ele_cnt * ele_size)
Standard Multiplication Function Ignores high order w bits Some of which are different for signed vs. unsigned multiplication Lower bits are the same • • • • • • • • • • • • • • • Signed Multiplication in C u Operands: w bits * v u · v True Product: 2*w bits TMultw(u , v) Discard w bits: w bits
Operation u << kgives u * 2k Both signed and unsigned Examples u << 3 == u * 8 u << 5 - u << 3 == u * 24 Most machines shift and add faster than multiply Compiler generates this code automatically • • • Power-of-2 Multiply with Shift k u • • • Operands: w bits * 2k 0 ••• 0 1 0 ••• 0 0 u · 2k True Product: w+k bits 0 ••• 0 0 Discard k bits: w bits UMultw(u , 2k) ••• 0 ••• 0 0 TMultw(u , 2k)
Multiply on ARM MUL{<cond>}{S} Rd, Rm, Rs Rd = Rm * Rs MLA{<cond>}{S} Rd, Rm, Rs, Rn Rd = Rm * Rs + Rn
Check for 0 divisor Long division approach If divisor ≤ dividend bits 1 bit in quotient, subtract Otherwise 0 bit in quotient, bring down next dividend bit Restoring division Do the subtract, and if remainder goes < 0, add divisor back Signed division Divide using absolute values Adjust sign of quotient and remainder as required Division quotient dividend 1001 1000 1001010 -1000 10 101 1010 -1000 10 divisor remainder n-bit operands yield n-bitquotient and remainder
Division Hardware Initially divisor in left half Initially dividend
One cycle per partial-remainder subtraction Looks a lot like a multiplier! Same hardware can be used for both Optimized Divider
Can’t use parallel hardware as in multiplier Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT devision) generate multiple quotient bits per step Still require multiple steps Faster Division
Division in ARM • ARMv6 has no DIV instruction.
Division in ARM • ARMv6 has no DIV instruction. N = D x Q + R with 0 <= |R| < |D| N/D = Q + R
For next time Homework Exercises: 3.4.2, 3.4.4 3.10.1 – 3.10.5 Due Tuesday 11/4 Read Chapter 4.1-4.4