170 likes | 315 Views
CS231 Spring 2006. Computer Arithmetic: Advanced Topics in Number Representation. Arithmetic Data Types. There are three important categories of arithmetic data types used in various kinds of computers:. Integers Two’s Complement, One’s Complement, Signed Magnitude
E N D
CS231 Spring 2006 Computer Arithmetic: Advanced Topics in Number Representation
Arithmetic Data Types There are three important categories ofarithmetic data types used in various kinds ofcomputers: • Integers • Two’s Complement, One’s Complement, Signed Magnitude • (We have already seen these.) • Fixed Point Fractions • The data types used by signal and media processors • The subject of this slide set. • Floating Point • IEEE 754 Single, Double, Extended • See additional slides on this data type
Data Types By Application • Integer • Business, Systems Programming, User Interfaces, General Purpose Computation • Floating Point • Scientific Computation • Fixed Point Fractions • Communications, Media Processing • This is a simplification; in reality all these data types can be used in many different applications
Properties of the Types • Integer • Represent positive and negative whole numbers • All the integers between a MIN and MAX can be represented • Fixed Point Fractions • Signed: represent values in the range -1 to (almost) 1 • Unsigned: values in the range 0 to (almost) 1 • Floating Point • Represent numbers in a huge range • Numbers very close to 0, and numbers very far from 0 • -10100 10-100 10100 10-100 • But unevenly distributed (many holes in the range)
Types for Signal and Media Processing • Signal and media processing uses most every arithmetic type you can think of: • Fixed point • Integers (signed and unsigned) • Fractions (signed and unsigned) • Mixed integers / fractions (e.g., accumulator types) • Floating point • Of various precisions. Even 16-bit floating point! • Complex numbers <real, imaginary> • Fixed point • Floating point
Signal Data • Signal data often originates from A/D conversion • An analog signal is sampled at fixed time intervals (see clock below) • A stream of numbers emerges from the A/D converter • These numbers lie in a fixed range, and represent the instaneous amplitude of the signal
Unsigned Fixed Point Fractions • “0.16” = 16 bits of fractional magnitude, binary point fully at left • “0.32” = 32 bits of fractional magnitude • Addition and multiplication work exactly as for twos-complement integers • .1000 X .1000 = .01000000 • But: discard the lower bits if the final value has same preceision as the inputs! • .1000 + .1000 = 1.0000 (overflow) • If saturating arithmetic is used, 1.0000 is rounded to the nearest reprentable value, .1111 in this case.
Signed Fixed Point Fractions • “1.15” = 1 sign bit, 15 magnitude bits • “1.31” = 1 sign bit, 31 magnitude bits • 0100 = ½ • 1100 = -½ • Most positive value = 0111 … 11 • Most negative value = 1000 … 00 • Multiplication produces two sign bits: • 0.100 X 0.100 = 00.010000 • One is normally shifted away to yield 1.N format (shift result left by one bit) • This is usually done by the multiply instruction (or multiply-accumulate instruction)
Multiplication • Multiplication of M bits by N bits producesM + N bits. • E.g., 32 16 → 48 • Subsequent truncation, rounding, saturation, etc., is a separate operation (conceptually) • If you keep these discrete steps straight in your head you won’t be confused about multiplication. • Note that 1.15 0.16 yields 1.31. No redundant sign bit in this case. • When we multiply 32 32 in a C program, which 32 bits are discarded from the 64-bit product?
Multiplication Continued • 32 32 multiply yields 64 bits • If we are performing integer arithmetic and want a 32-bit result, we discard the upper 64 bits of the result • Works great unless there is an overflow! • Rather disastrous if there is an undetected overflow however, because we are discarding the most significant bits of the result • In signal processing and when working with fractional types, we discard LSBs of a product • This is much safer of course
Accumulator Types • Guard bits are initialized by sign-extending the 1.31 form. I.e., guard bits = sign bit initially • 1.31 values are sign-extended to 9.31 in order to add them to the accumulator • Only when 8 digits of overflow have occurred are we at risk of overflowing the accumulator. • By bounding the number of adds that are done to the accumulator we can guarantee against overflow.
Unsigned Accumulator Types • In the case of unsigned fractions, the guard bits are initialized with zero • A 32-bit fraction is zero-extended to prepare it for addition to the accumulator
Accumulators and Precision • 1.15 must be filled (in the LSBs) in order to prepare it for addition to 1.31 or 9.31 (accumulator)
Saturation versus Rounding • Saturation is a means of discarding MSBs (upper bits) while causing minimum damage to the value • 1.0000 becomes .1111 (5 bit unsigned intermediate value saturated to 4 bits) • Rounding is a means of discarding LSBs (lower bits) while causing minimum damage to the value • .00111111 becomes .0100 (8 bit unsigned fraction rounded to 4 bit unsigned fraction) • Truncation refers to the simple dropping of bits when reducing precision. This is simply an operation on the representation of a number and naturally does not have very good mathematical properties. • The underlined bits above are the bits that are discarded • Saturation and rounding solve different problems • Saturation prevents radical error • E.g, change of sign • E.g, wrapping from MAXINT to 0 • Rounding prevents smaller errors (but they can accumulate)