600 likes | 735 Views
Computing Without Processors Thesis Proposal. Mihai Budiu July 30, 2001. Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems. This presentation uses TeXPoint by George Necula. Four Types of Research. Solve nonexistent problems
E N D
Computing Without ProcessorsThesis Proposal Mihai BudiuJuly 30, 2001 Thesis Committee: Seth Goldstein, chair Todd Mowry Peter Lee Babak Falsafi, ECE Nevin Heintze, Agere Systems This presentation uses TeXPoint by George Necula
Four Types of Research • Solve nonexistent problems • Solve past problems • Solve current problems • Solve future problems
The Law (source: Intel)
The Crossover Phenomenon technology time
Example Crossover access speed (ns) nocaches caches CPU DRAM 200 1980 time
Signal Propagation mm die size 20 distancein 1 clock now time
Reliability & Yield defects/chip occurring tolerable new process now time
Energy power CPU consumption thermal dissipation 100W now time
Instruction-Level Parallelism (ILP) instructions fetch commit now time
Premises of this Research • We will have lots of gates • Moore’s law continues • Nanotechnology • Contemporary architectures do not scale
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
ASH Application-Specific Hardware HLL program Compiler Circuit Reconfigurable hardware
ASH: A Scalable Architecture-- Thesis Statement -- Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture. • We can provide scalable compilers for translating high-level languages into hardware.
Example int f(void) { int i=0, j = 0; for (; i < 10; i++) j += i; return j; }
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
Huge structures Nano-RAM cell . In yellow: a CMOS RAM cell ASH and Nanotechnology • Build reconfigurable hardware using nanotechnology • Low Power: 1010 gates use less than 2 W • Low cost: nanocents/gate • High density: 105x over CMOS
Control-flow transfer Basic block Memory write Memory read Memory word A Limit Study of Performance A graph of the whole program execution:
memcpy Typical Program Graph (g721_e) Memory reads Control flow transfer 100% code cluster 100% memory cluster
How Time Is Spent No caches: reads expensive No speculation
Lesson The spatial model of computation has different properties.
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Future work
Program to circuits Memory partitioning Interconnection net CASH: Compiling for ASH
Reliability Computations & local storage 2. Split-phase Abstract Machines Unknown latency ops. 3. Configurations placed independently 4. Placement on chip Compilation int reverse(int x){ int k,r=0; for (k=0; k<32; k++) r |= x&1; x = x >> 1; r = r << 1; }} 1. Program
Power Split-phase Abstract Machines CFG SAM 1 SAM 3 SAM 2
Hyperblock => SAM • Single-entry, multiple exit • May contain loops
SAM => FSM Exit Start Loop Exit Local memory Remote Memory
The SAM FSM Computation args results Register exit start Predicates (control) Combinational logic
Signals Computation = Dataflow Programs Circuits a 7 x = a & 7; ... y = x >> 2; & 2 x >> • Variables => wires + tokens • No token store; no token matching • Local communication only
data data data valid ack valid valid reset Local Global Static Tokens & Synchronization • Tokens signal operation completion • Possible implementations:
ILP and Eager Muxes slow - - > > Speculation b x 0 if (x > 0) y = -x; else y = b*x; * ! f y Computation Predicates Static-Single Assignment implemented in hardware
Guard side-effects • Memory access • Procedure calls *q = 2; • Control looping • Decide exit branch Predicates x=... x=... • Select variable definition ...=x
Computing Predicates s t b • Correct for irreducible graphs • Correct even when speculatively computed • Can be eagerly computed
= Pipelining a[3] a[2] a[1] Loops + Dataflow 0 i 1 &a[0] for (i=0; i < 10; i++) a[i] += i; + + load + a[0] store
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
Microprocessors ASH Evolutionary Path The problem with ASH: Resources
CPU+ASH CPU ASH support computation + OS + VM core computation Memory
Outline • Motivation • ASH: Application-Specific Hardware • The spatial model of computation • CASH: Compiling for ASH • Evolutionary path • Conclusions • Future work
Scalable Performance performance ASH CPU now time
Summary • Contemporary CPU architecture faces lots of problems • Application-Specific Hardware (ASH) provides a scalable technology • Compiling HLL into hardware dataflow machines is an effective solution
Timeline now CASH core Explore architectural/compiler trade-offs Hw/sw partitioning (ASH + CPU) Loop parallelization Memory partitioning Writethesis Costmodels ASH Simulation 06/01 09/01 12/01 04/02 06/02 09/02 12/02
Extras • Related work • Reconfigurable hardware • Other cross-over phenomena • A CPU + ASH study • More about predicates
Related Work • Hardware synthesis from HLL • Reconfigurable hardware • Predicated execution • Dataflow machines • Speculative execution • Predicated SSA back
Interconnection network Universal gates and/or storage elements Programmable Switches Reconfigurable Hardware back to presentation
Main RH Ingredient: RAM Cell 0 0 0 1 a0 data a0 a1 & a2 a1 a1 Universal gate = RAM data in 0 control Switch controlled by a 1-bit RAM cell back