120 likes | 216 Views
The Power of Streaming Table Look-Up. Fred Brooks University of North Carolina at Chapel Hill http://www.cs.unc.edu/~brooks brooks@cs.unc.edu Thanks to ONR Virte, NIH NIBIB, DoE, NSF. von Neumann Computers. Designed to do many operations on each datum
E N D
The Power of Streaming Table Look-Up Fred Brooks University of North Carolina at Chapel Hill http://www.cs.unc.edu/~brooks brooks@cs.unc.edu Thanks to ONR Virte, NIH NIBIB, DoE, NSF
von Neumann Computers • Designed to do many operations on each datum • Hence data stays (mostly) still, while instructions flow past • Substantial set of different operations, but each has fixed function
Data-Streaming Computers • Designed to do same operation on many data; serially vs SIMD || • So operation stays still (set up), and the data flows past • Want very powerful vector operations, so as to flow the data few times • A whole different way of programming • APL exemplifies how to think
Problem: Conditionals • Stencil (logical vector) calculation: <>=≠ • Input/output masking by stencil; • Some operators effect conditions • Absolute value, clamping • Max, min, match, merge (two streams) • Table Look-Up • Table Look-Up with Table Change
GPUs • Are data-streaming computers • Have some fixed operations, e.g., vertex transformation • The powerful, custom-tailorable ops are done by streaming table lookup, • and streaming to-memory operations, such as Z-buffering • I’m eager to hear Bill Dally on this
So a Quick Look at the Past • Get some ideas for programming • Get some ideas for generalizing the GPUs • Avoid some mistakes of previous designs
CONVERT in IBM 709 (1957) • Three ops in a standard op set • Amdahl, 709’s architect, invented them • General table lookups on 6-bit bytes, but designed for specific applications: • Translate character codes, 1-for-1 • Radix conversion—add result to ACC • Decimal (BCD) addition • Suppress left zeros, etc. for printing
CONVERT AND REPLACE step • Six-bit byte argument is added to table base address • Returns a 36-bit function: • 6 bits replace the argument in stream • 15 bits are added mod 215 to table address for next lookup! • A finite-state machine!
The IBM 7950 (“Harvest”) • 1961; Jim Pomerene was chief architect • A “plug-in processor” for Stretch • Like GPUs, bigger than host • A pure data streaming machine • Delivered to NSA, ran decades • For byte-by-byte cryptanalysis
Harvest Programming Model From Blaauw & Brooks, Computer Architecture, 1997
What Can Such a Machine Do? • Flow bytes through multiple permutation tables • Sort, merge, collate like crazy • Count, incidence in memory • Detect low-probability sequences, by Bayes • Determine the language of a text, by Bayes • Convert Roman numerals to Arabic • Buchholz, ed., Planning a Computer System, 1962 • Randomly create valid hymn tunes, by Markov
Lessons and Ideas for GPUs • Real problems not uniform as model ones • Going back to host is a killer • Today, texture memory size cramps • Just wait • Adding TLU result to table address gives a very powerful capability • Two streams interacting >> one • Instance stream in the NV 40?