1 / 28

Information Representation (Level ISA3)

Information Representation (Level ISA3). Floating point numbers. Floating point storage. Two’s complement notation can’t be used to represent floating point numbers because there is no provision for the radix point A form of scientific notation is used. Floating point storage.

theodora
Download Presentation

Information Representation (Level ISA3)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Representation (Level ISA3) Floating point numbers

  2. Floating point storage • Two’s complement notation can’t be used to represent floating point numbers because there is no provision for the radix point • A form of scientific notation is used

  3. Floating point storage • Since the radix is 2, we use a binary point instead of a decimal point • The bits allocated for storing the number are broken into 3 parts: a sign bit, an exponent, and a mantissa (aka significand)

  4. Floating point storage • The number of bits used for exponent and significand depend on what we want to optimize for: • For greater range of magnitude, we would allocate more bits to the exponent • For greater precision, we would allocate more bits to the significand • For the next several slides, we’ll use a 14-bit model with a sign bit, a 5-bit exponent, and an 8-bit significand

  5. Example • Suppose we want to store the number 17 using our 14-bit model: • Using decimal scientific notation, 17 is: 17.0 x 100 or 1.7 x 101 or .17 x 102 • In binary, 17 is 10001, which is: 10001.0 x 20 or 1000.1 x 21 or 100.01 x 22 or, finally, .10001 x 25 • We will use this last value for our model: 0 0 0 1 0 1 1 0 0 0 1 0 0 0 sign exponent significand

  6. Negative exponents • With the existing model, we can only represent numbers with positive exponents • Two possible solutions: • Reserve a sign bit for the exponent (with a resulting loss in range) • Use a biased exponent; explanation on next slide

  7. Bias values • With bias values, every integer in a given range is converted into a non-negative by adding an offset (bias) to every value to be stored • The bias value should be at or near the middle of the range of numbers • Since we are using 5 bits for the exponent, the range of values is 0 .. 31 • If we choose 16 as our bias value, we can consider every number less than 16 a negative number, every number greater than 16 a positive number, and 16 itself can represent 0 • This is called excess-16 representation, since we have to subtract 16 from the stored value to get the actual value • Note: exponents of all 0s or all 1s are usually reserved for special numbers (such as 0 or infinity)

  8. Examples Using excess-16 notation, our initial example (1710) becomes: 0 0 0 1 0 1 1 0 0 0 1 0 0 0 sign exponent significand If we wanted to store 0.2510 = 1.0 x 2-2 we would have: 0 0 1 1 1 0 1 0 0 0 0 0 0 0 sign exponent significand

  9. Synonymous forms • There is still one problem with this representation scheme; there are several synonymous forms for a particular value • For example, all of the following could represent 17: 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 sign exponent significand sign exponent significand 0 1 0 1 1 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 sign exponent significand sign exponent significand

  10. Normalization • In order to avoid the potential chaos of synonymous forms, a convention has been established that the leftmost bit of the significand will always be 1 • An additional advantage of this convention is that, since we always know the 1 is supposed to be present, we never need to actually store it; thus we get an extra bit of precision for the significand; your text refers to this as the hidden bit

  11. Example Express 0.0312510 in normalized floating-point form with excess-16 bias 0.0312510 = 0.000012 x 20 = 1.0 x 2-5 Applying the bias, the exponent field is 16 – 5 = 11 0 0 1 1 0 0 0 0 0 0 0 0 0 0 sign exponent significand

  12. Floating-point arithmetic • If we wanted to add decimal numbers expressed in scientific notation, we would change one of the numbers so that both of them are expressed in the same power of the base • Example: 1.5 x 102 + 3.5 x 103 = .15 x 103 + 3.5 x 103 = 3.65 x 103

  13. Floating-point arithmetic Example: add the following binary numbers as represented in normalized 14-bit format with a bias of 16: 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 0 Aligning the operands around the binary point, we have: 11.001000 + 0.10011010 ------------------- 11.10111010 Renormalizing, retaining the larger exponent and truncating the low-order bit, we have: 0 1 0 0 1 0 1 1 1 0 1 1 1 0

  14. Floating-point errors • Any two real numbers have an infinite number of values between them • Computers have a finite amount of memory in which to represent real numbers • Modeling an infinite system using a finite system means the model is only an approximation • The more bits used, the more accurate the approximation • Always some element of error, no matter how many bits used

  15. Floating-point errors • Errors can be blatant or subtle (or unnoticed) • Blatant errors: overflow or underflow; blatant because they cause crashes • Subtle errors: can lead to wildly erroneous results, but are often hard to detect (may go unnoticed until they cause real problems)

  16. Example • In our simple model, we can express normalized numbers in the range -.111111112 x 215 … .11111111 x 215 • Certain limitations are clear; we know we can’t store 2-19 or 2128 • Not so obvious that we can’t accurately store 256.5: • Binary equivalent is 10000000.1, which is 10 bytes wide • The low-order bit would be dropped (or rounded into next bit) • Either way, we have introduced an error

  17. Error propogation • The relative error in the previous example can be found by taking the ratio of the absolute value of the error to the true value of the number: 256.5 – 256 ---------------- = .003906 (about .39%) 256 • In a lengthy calculation, such errors can propogate, causing substantial loss of precision • The next slide illustrates such a situation; it shows the first five iterations of a floating-point product loop. Eventually the error will be 100%, since the product will go to 0.

  18. Error propogation example Multiplier Multiplicand 14-bit product Real product Error 1000.001 0.11101000 1110.1001 14.7784 1.46% (16.125) (0.90625) (14.5625) 1110.1001 0.11101000 1101.0011 13.4483 1.94% (14.5625) (13.1885) 1101.0011 0.11101000 1011.1111 12.2380 2.46% (13.1885) (11.9375) 1011.1111 0.11101000 1010.1101 11.1366 2.91% (11.9375) (10.8125) 1010.1101 0.11101000 1001.1100 10.1343 3.79% (10.8125) (9.75) 1001.1100 0.11101000 1000.1101 8.3922 4.44% (9.75) (8.8125)

  19. Special values • Zero is an example of a special floating-point value • Can’t be normalized; no 1 in its binary version • Usually represented as all 0s in both exponent and significand regions • It is common to have both positive and negative versions of 0 in floating-point representation

  20. Real number line with 0 as only special value (with 3-bit exponent, 4-bit significand)

  21. More special values • Infinity: bit pattern used when a result falls in the overflow region • Uses exponent field of all 1s, significand of all 0s • Not a number (NaN): used to indicate illegal floating point operations • Uses exponent of all 1s, any significand • Use of these bit patterns for special values restricts the range of numbers that can be represented

  22. Denormalized numbers • There is no value to represent infinity in the underflow region analogous to infinity in the overflow region • Denormalized numbers exist in the underflow regions, shortening the gap between small positive values and 0

  23. Denormalized numbers • When the exponent field of a number is all 0s and the significand contains at least a single 1, special rules apply to the representation: • The hidden bit is assumed to be 0 instead of 1 • The exponent is stored in excess n-1 (where excess n is used for normalized numbers)

  24. Floating point storage and C • In the C programming language, the standard output function, printf(), prints floating-point numbers with a default precision of 6 (7 digits, including the whole part, using either fixed or scientific notation) • We can reason backwards from this fun fact to discover C’s default size for floating point numbers

  25. Floating point values and C • To represent some number n with d decimal digits, we need log2n x d bits: • Let L10 = log10(n); then 10L10 = n • So log2(10L10) = log2(n), • and L10 * log2(10) = log2(n) • which means log2(n) = log10(n) * log2(10) • The last value above (log2(10)) is a constant, about 3.322 • So, the number of bits needed to represent a decimal number with 7 significant digits is 3.322 * 7, or 23.254; 24 bits, in other words

  26. The IEEE 754 floating point standard • Specifies uniform standard for single and double-precision floating-point numbers • Single-precision: 32 bits • 23-bit significand • 8-bit using excess 127 bias • Double-precision: 64 bits • 52-bit significand • 11-bit exponent with excess 1023 bias

  27. Binary-coded decimal representation • Numeric coding system used mainly in IBM mainframe & midrange systems • BCD encodes each digit of a decimal number to a 4-bit binary form • In an 8-bit byte, only the lower nibble represents the digit’s value; the upper nibble (called the zone) is used to hold the sign, which can be positive (1100), negative (1101), or unsigned (1111)

  28. BCD • Commonly used in banking and other financial settings where: • Most data is monetary • High level of I/O activity • BCD is easier to convert to decimal for printed reports • Circuitry for arithmetic operations usually slower than for unsigned binary

More Related