210 likes | 381 Views
Fixed-point and floating-point numbers. CS370 Fall 2003. Representations of numbers. Unsigned integers Signed integers – 1’s and 2’s complement representation To represent Very Large and very Small numbers Real numbers in general Fixed-point numbers Floating-point numbers.
E N D
Fixed-point and floating-point numbers CS370 Fall 2003
Representations of numbers • Unsigned integers • Signed integers – 1’s and 2’s complement representation • To represent • Very Large and very Small numbers • Real numbers in general • Fixed-point numbers • Floating-point numbers
Base-10 (decimal) arithmetic • Uses the ten numbers from 0 to 9 • Each column represents a power of 10
Base-10 (decimal) arithmetic • Uses the ten numbers from 0 to 9 • Each column represents a power of 10
Standard binary representation • Uses the two numbers from 0 to 1 • Every column represents a power of 2
Fixed-point representation • Uses the two numbers from 0 to 1 • Every column represents a power of 2
Addition Base-10 Base-2
Scientific notation (1) • One billion • 1,000,000,000 • 1 x 109 • significand or mantissa: 1 • base or radix: 10 • exponent: 9
Scientific notation (2) • 1999 • 1.999 x 103 • significand or mantissa: 1999 • base or radix: 10 • exponent: 3 • 19.99 x 10 • 199.9 x 10
Practice (base 10) • 258 = 2.58 x 102 Mantissa = 258 Radix = 10 Exponent = 2 • 24.25 = 2.425 x 101 Mantissa = 2425 Radix = 10 Exponent = 1
Base-2 scientific notation • 2.25ten • 10.01two • 10.01two x 20 • 1.001two x 21 normalized Numbers are usually normalized which means that the leading bit is always a 1.
Improvements • Bias the exponent • Always subtract a fixed amount, e.g., 3 • Allows representation of negative exponents • Implicit one • Leading one in a Phone number such as 1-619-556-0231 is redundant. • Why use a bit for the leading one?
8-bit floating-point format (2) • Exponent (3 bits) is biased by 3 • The leading one of significand is implicit • Zero is represented by all zeros
Single precision 32 bits sign: 1 bit exponent: 8 bits significand: 23 bits Bias: 127 Double precision 64 bits sign: 1 bit exponent: 11 bits significand: 52 bits Bias: 511 IEEE standard floating-point
Practice( base 10) • 13 = 1.3 x 101 = 1.011 x 23 • 1.25 = 1.25 x 100 = 1.010 x 20