1.15k likes | 1.28k Views
Computing Media and Languages for Space-Oriented Computations. (opening remarks). Big Idea: Matter Computes. Our physical world implements computations Double implication Computing landscape determined by laws of physical world
E N D
Computing Media and Languages for Space-Oriented Computations (opening remarks)
Big Idea: Matter Computes • Our physical world implements computations • Double implication • Computing landscape determined by laws of physical world • Understand our physical world in terms of the computation it performs • Control our physical world by programming the computation it performs.
Convergence of Concerns • Dealing with space as a physical issue when implementing modern computing components and systems • DSM IC, sublithographic effects • Realizing shapes/behaviors/properties using computations • Distributed robotics, programmable matter • Programming physical systems • Self-assembly, protein networks, …
Viewpoint • Traditional/mainstream • Abstractions, models, algorithms, languages • Have not adequately dealt with spatial issues • Either as optimization • Or as computational goal • Now have several communities approaching this from different perspectives
Monday 9:00am Opening Remarks 9:15am DeHon – Spatial Compute 11:15am Coore – Amorphous Computing 12:15pm LUNCH 1:30pm Goldstein – Programmable Matter 2:30pm Gruau – Blob Computing 3:30pm Coffee/Cake 4:00pm Giavitto –Data Structures as Space
Challenges and Opportunities for Spatial Computing André DeHon <andre@acm.org>
Message • Opportunity • Large and capable computing systems • Continued scaling primarily in spatial capacity • Performance capabilities from parallelism • Dynamically (re)programmable/adaptive • Spatial Challenges • Distance=Delay • Communications take up space and energy • Demands • New models/abstractions/algorithms
Convergence of Concerns • Dealing with space as a physical issue when implementing modern computing components and systems • DSM IC, sublithographic effects • Realizing shapes/behaviors/properties using computations • Distributed robotics, programmable matter • Programming physical systems • Self-assembly, protein networks, …
Outline • Scaling • Spatial vs. Temporal Computation • Ground spatial examples: FPGAs, nanoPLA • Spatial Challenges: • Scaling, Interconnect Delay and Requirements • Defects, faults, lifetime effects • Opportunities • Capacity, parallelism, scaling, adaptation • Why not: C, VHDL… • Design Patterns • System Architectures
Spatial Capacity Scaling Continues • Tl2 by end of Silicon roadmap • Another decade • Over Billion gates • Molecular-scale promises • Two orders of magnitude more than that • All 2D still have third dimension to exploit • [paper at NanoNets2006 in 2 weeks]
Implication • Qualitatively not in the same world we were in 1945 1985 • Orders of magnitude shift in resources • Suggest dramatic changes in strategy • We have been on exponential curve • But…up to about 1990 • Shrinking same kind of computers down to one chip
Example • Compute: Y=Ax2+Bx+c
Temporal Implementation • Single Operator • Reuse in time • Store instructions • Store intermediates • Communication across time • One cycle per operation
Spatial Implementation • One operator for every operation • Instruction per operator • Communication in space • Computation in single cycle
Large instruction and data memory per active processing element • Economize on instruction memory with word-wide SIMD organization w op op op op Conventional Processors • Have temporal organization
Conventional Processors • Economize on Area • Pack Large computation • Into small area • By storing description of computation compactly • Reusing small number of processing elements in time • Trade time for area • Absolutely the right thing for 1985 Silicon • (and pre-integrated circuits)
Early Challenge • How do I make my large program fit on an economical computer? • Can compute with 10K vacuum tubes? • Fit in caches hold 100 instructions • 64K address space • Heavy sequentialization was a good engineering solution…for 19451990
Field-Programmable Gate Array (FPGA) K-LUT (typical k=4) Compute block w/ optional output Flip-Flop LUT = Look Up Table
Small instruction area per active operator • pack more computation on die Field-Programmable Gate Arrays • Have spatial computing organization • Bit-level control • use more of available ops
Field-Programmable Gate Arrays • Put more area into computing • Have more compute elements per die • Support more computation per cycle • Trade area for time • With more capacity • More applications fit spatially • More appropriate 2000+
Component Example XC4085XL-09 3,136 CLBs 4.6ns 682 Bit Ops/ns Alpha 1996 264b ALUs 2.3ns 55.7 Bit Ops/ns • Single die in 0.35mm
ALU bitops l2s EmpiricalRaw Density Comparison Computational Density Time
Spatial Computing • Enabled by high capacity • Has a density advantage • Now have sufficient capacity to hold large range of interesting problems • 100,000 bit-level operators on a single chip • More on the way • Can exploit the kinds of capacities now becoming available
100,000s of LUTs Embedded blocks Many small distributed memories Megabits of memory Data rates ~10Tb/s 10—100x over uP Operate 100s of MHz Easily scale up spatially Step and repeat Today’s FPGAs
Simple Nanowire-Based PLA NOR-NOR = AND-OR PLA Logic FPGA 2004
Tile into Arrays FPGA 2005
10mm x 5mm subarrays Millions on single-layer modest die 100 Product Terms per subarray Include memory blocks Stack in 3D nanoPLA Capacity
Interconnect Challenge • With 100,000 processing elements cooperating on a task • (can get today with FPGAs) • Must communicate • Interconnect becomes dominate • Area, delay, energy • Replaces memory for communications • Less heavily studied
Large Memories • Build larger memory • Simple model: multiplex together more cells
Delay vs. Memory (1) • How does delay grow with memory size (N) ? • Tmem = Tdecode+Tcell+Tmux • Tmux = log(N) Tmux4
Delay vs. Memory (2) • Tmem = O(log(N)) • Does this make sense for large N? • Speed of light? • Tmem = Tlogic + Twire • 2D memory: • Twire = O(N) • Tmem=C1log(N) + C2N
Chips >> Cycles • Chips growing • Gate delays shrinking • Wire delays aren’t scaling down • Will take many cycles to cross chip
Clock Cycle Radius • Radius of logic can reach in one cycle (45 nm) • Radius 10 • Few hundred PEs • Chip side 600-700 PE • 400-500 thousand PEs • 100s of cycles to cross
Communication Expensive • What if we just built a crossbar? • Interconnect area scales as N2 • Must exploit typical locality in design to reduce area • Rent’s Rule: IO=cNp • (0.5p0.75) typical • How well can we engineer low p? • Where show up in algorithm/computation design?
Optimizing • Must exploit physical locality (placement) • Reduce wire requirement (reduce p) • Reduce distance traveled over wires • new meaning to spatial locality • Interconnect must show up in our design • Run-time management • Algorithms
Old New Probability Distribution Delay Clock Cycle Scaling has Ended • Up to ~2000 scaled down clock cycle • Architecture scaling: fewer gates/clock • Now down to ~10 gates/clock • Energy-limited computation • Could run a few devices faster…but not all of them • Variation at nanoscale diminishing clock frequencies Future scaling is spatial
Atomic-Scale Physical Effects • As our devices approach the atomic scale, we must deal with statistical effects governing the placement and behavior of individual atoms and electrons.
Three Atomic-Scale “Problems” • Defects: Manufacturing imperfection • Occur before operation; persistent • Shorts, breaks, bad contact • Faults: • Occur during operation; transient • node X value flips: crosstalk, ionizing particles, bad timing, tunneling, thermal noise • Operational/lifetime defects: • Parts become bad during operational lifetime • Fatigue, electromigration, burnout…. • …slower • NBTI, Hot Carrier
Message • Opportunity • Large and capable computing systems • Continued scaling primarily in spatial capacity • Performance capabilities from parallelism • Dynamically (re)programmable/adaptive • Spatial Challenges • Distance=Delay • Communications take up space and energy • Demands • New models/abstractions/algorithms