1 / 16

CS231 Spring 2006

CS231 Spring 2006. Computer Arithmetic: Advanced Topics in Number Representation. Arithmetic Data Types. There are three important categories of arithmetic data types used in various kinds of computers:. Integers Two’s Complement, One’s Complement, Signed Magnitude

byron-pena
Download Presentation

CS231 Spring 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS231 Spring 2006 Computer Arithmetic: Advanced Topics in Number Representation

  2. Arithmetic Data Types There are three important categories ofarithmetic data types used in various kinds ofcomputers: • Integers • Two’s Complement, One’s Complement, Signed Magnitude • (We have already seen these.) • Fixed Point Fractions • The data types used by signal and media processors • The subject of this slide set. • Floating Point • IEEE 754 Single, Double, Extended • See additional slides on this data type

  3. Data Types By Application • Integer • Business, Systems Programming, User Interfaces, General Purpose Computation • Floating Point • Scientific Computation • Fixed Point Fractions • Communications, Media Processing • This is a simplification; in reality all these data types can be used in many different applications

  4. Properties of the Types • Integer • Represent positive and negative whole numbers • All the integers between a MIN and MAX can be represented • Fixed Point Fractions • Signed: represent values in the range -1 to (almost) 1 • Unsigned: values in the range 0 to (almost) 1 • Floating Point • Represent numbers in a huge range • Numbers very close to 0, and numbers very far from 0 • -10100 10-100 10100 10-100 • But unevenly distributed (many holes in the range)

  5. Types for Signal and Media Processing • Signal and media processing uses most every arithmetic type you can think of: • Fixed point • Integers (signed and unsigned) • Fractions (signed and unsigned) • Mixed integers / fractions (e.g., accumulator types) • Floating point • Of various precisions. Even 16-bit floating point! • Complex numbers <real, imaginary> • Fixed point • Floating point

  6. Signal Data • Signal data often originates from A/D conversion • An analog signal is sampled at fixed time intervals (see clock below) • A stream of numbers emerges from the A/D converter • These numbers lie in a fixed range, and represent the instaneous amplitude of the signal

  7. Fixed Point Types

  8. Unsigned Fixed Point Fractions • “0.16” = 16 bits of fractional magnitude, binary point fully at left • “0.32” = 32 bits of fractional magnitude • Addition and multiplication work exactly as for twos-complement integers • .1000 X .1000 = .01000000 • But: discard the lower bits if the final value has same preceision as the inputs! • .1000 + .1000 = 1.0000 (overflow) • If saturating arithmetic is used, 1.0000 is rounded to the nearest reprentable value, .1111 in this case.

  9. Signed Fixed Point Fractions • “1.15” = 1 sign bit, 15 magnitude bits • “1.31” = 1 sign bit, 31 magnitude bits • 0100 = ½ • 1100 = -½ • Most positive value = 0111 … 11 • Most negative value = 1000 … 00 • Multiplication produces two sign bits: • 0.100 X 0.100 = 00.010000 • One is normally shifted away to yield 1.N format (shift result left by one bit) • This is usually done by the multiply instruction (or multiply-accumulate instruction)

  10. Multiplication • Multiplication of M bits by N bits producesM + N bits. • E.g., 32  16 → 48 • Subsequent truncation, rounding, saturation, etc., is a separate operation (conceptually) • If you keep these discrete steps straight in your head you won’t be confused about multiplication. • Note that 1.15  0.16 yields 1.31. No redundant sign bit in this case. • When we multiply 32  32 in a C program, which 32 bits are discarded from the 64-bit product?

  11. Multiplication Continued • 32  32 multiply yields 64 bits • If we are performing integer arithmetic and want a 32-bit result, we discard the upper 64 bits of the result • Works great unless there is an overflow! • Rather disastrous if there is an undetected overflow however, because we are discarding the most significant bits of the result • In signal processing and when working with fractional types, we discard LSBs of a product • This is much safer of course

  12. Accumulator Types • Guard bits are initialized by sign-extending the 1.31 form. I.e., guard bits = sign bit initially • 1.31 values are sign-extended to 9.31 in order to add them to the accumulator • Only when 8 digits of overflow have occurred are we at risk of overflowing the accumulator. • By bounding the number of adds that are done to the accumulator we can guarantee against overflow.

  13. Unsigned Accumulator Types • In the case of unsigned fractions, the guard bits are initialized with zero • A 32-bit fraction is zero-extended to prepare it for addition to the accumulator

  14. Accumulators and Precision • 1.15 must be filled (in the LSBs) in order to prepare it for addition to 1.31 or 9.31 (accumulator)

  15. Saturation versus Rounding • Saturation is a means of discarding MSBs (upper bits) while causing minimum damage to the value • 1.0000 becomes .1111 (5 bit unsigned intermediate value saturated to 4 bits) • Rounding is a means of discarding LSBs (lower bits) while causing minimum damage to the value • .00111111 becomes .0100 (8 bit unsigned fraction rounded to 4 bit unsigned fraction) • Truncation refers to the simple dropping of bits when reducing precision. This is simply an operation on the representation of a number and naturally does not have very good mathematical properties. • The underlined bits above are the bits that are discarded • Saturation and rounding solve different problems • Saturation prevents radical error • E.g, change of sign • E.g, wrapping from MAXINT to 0 • Rounding prevents smaller errors (but they can accumulate)

  16. Accumulator Extraction

More Related