350 likes | 482 Views
Alviso. Rick McGeer (HP) Erik Rubow (Ericsson) Stephen Lonergan (U Vic) Amin Vahdat (UCSD). Outline. Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules
E N D
Alviso Rick McGeer (HP) Erik Rubow (Ericsson) Stephen Lonergan (U Vic) Amin Vahdat (UCSD)
Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions
Alviso Motivation • Make NetFPGA programming accessible to network designers • NetFPGA: FPGA-based 4-port switch board • Key to building high-speed software-defined networks • Typical NetFPGA designer knows software routers, not VLSI design
NetFPGA • Basic building block is a Xilinix FPGA (Virtex-II) • Programming tools are Verilog (simulator-based HDL), Synopsys/Cadence/Xilinx synthesis tools for FPGA • Problems • Verilog very low-level design tool • Many details of hardware design must be mastered by designer • No high-level network-based design environment • Why is this interesting? • Software on GP processors still can’t keep up with modern switching equipment • Modern high-performance software routers require very substantial hardware (typically, GPGPUs)
Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions
Alviso • C-like language whose modules can be easily realized as either hardware or software • Some restrictions • No memory allocation • No functions or recursion • Built-in parallelism • No forks
Alviso Elements • Module: Basic unit of design • Roughly equivalent to an Object in software design • Container of processes (see below) and variables • No shared variables across module boundaries! • All communications into/out of modules through “ports” (similar to software parameters) • Exactly equivalent to hardware ports
Alviso Elements • Process: Basic element of computation • Roughly equivalent to a thread in software • Equivalent to a block of logic in hardware • Begins immediately on load • Runs to completion
Alviso Elements • Port: Variable explicitly written or read by a process in a module • Sole means of communication into/out of a module • Equivalent to a hardware port • Always latched (see below) • Roughly equivalent (in software) to a public object variable with a get/set method (read = get, write = set)
Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions
The Problem of Parallel Design x=2; x=x+1; • The central assumption of design: the finite state model of computation • Every variable is a little FSM • Quiescent unless explicitly perturbed by an instruction • But parallel design breaks this model for shared variables x = 3…right? proc thread1() { x=2; x=x+1; } proc thread2() { x=x*100; } Value of x is indeterminate
All the Problems in Parallel Design Break Down into solving this • How do we recover a semantically-consistent deterministic model of design with efficient communications? • A key to efficient multicore programming, hardware/software codesign,…. • There are other problems, but without solving this one they are all built on a house of sand…
Historical Answer: Restrict Communications • Problem is fundamentally one of communication • Unrestricted asynchronous communication breaks design model • Solution 1: No shared variables between threads • Inefficent: effectively, every thread is in its own address space • Solution 2: Locks and semaphores: restrict ability of other threads to play with state during computation • Deadlock! • Locks themselves become a nondeterministic, asynchronous communication channel….
Requirements of our Solution • Semantics independent of external systems (e.g., a thread scheduler) • Efficient communication between threads • Designs fully implementable in either hardware or software • Module behavior identical independent of hardware or software realization – semantics independent of implementation • A caveat: mixed hardware/software systems will vary in behavior, depending on mix of hardware/software components • Hardware components are much faster than software components
Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Synchronous Languages • A Practical Realization • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions
Lessons From Hardware • Hardware Design is… • Highly parallel • Efficient • Deterministic • Independent of mysteries such as thread scheduling…. • How did those guys do that? • And, more to the point, how can we?
Classic Hardware Design • Banks of acyclic “combinational” logic, separated by clocked latches • Data flows unidirectionally in logic, latches update at clock edge Latch Latch Logic Logic
Means… • Acyclic logic: logic banks compute in fixed time – length of longest path through the circuit • Latches update only on clock edges: value of logic inputs stable during computation • Computation divided into “cycles” of fixed length: no communication between logic blocks during computation
Mapping Alviso to Hardware Ports Latch Latch Logic Logic Process(es) Variables
Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions
Adapting Hardware to Languages • Shared Variables == Latches • Logic Blocks == Threads • Threads run for fixed block of time, then “wait” for next cycle of computation • Shared variables only update when all threads are waiting • No interrupts, no locks, no semaphores….
Alviso • Synchronous/Reactive Language • Computation in “zero” time, communication takes time “one” • Means: no communication while computing • Follows: Esterel, Lustre, ReactiveC, Signal, V++, SMV • C-like syntax • Major new innovation: “wait” statement • “wait”: halt computation and wait for variables to update • Each thread must execute a wait statement within a fixed period of time • Means: each cyclic computation graph (aka, loop) must contain a wait statement
A quick example proc thread1 { x = 2; while(true) { x++; wait; } } proc thread2 { wait; while(true) { x <<= 1; wait; } } x = 3 x = 4 But what about after the wait?
Answer: Deterministic Priority • What happens with conflicts on shared variable updates? • No effect on computation: updates only visible after wait • But x can only have one value…which should we choose? • Answer: priority. Processes have deterministic priority (total order on processes). In the event of conflict, higher-priority process wins
Alviso Computational Graph • wait statements lead to a computation graph that is a forest of DAGs • Roots of the DAGs: initial statements of processes and statements immediatelyfollowing wait statements • Leaves: final statements of processes and wait statements • Computation terminates at a leaf on each cycle • Starts on the next cycle at the subsequent root • Computation in cycle is traversal of the DAG from root to leaf
Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions
Interprocess Zero-Delay Signaling • Sometimes, you just have to break the rules • Occasionally, processes need to signal each other in the same cycle • To gain exclusive access to a shared variable, for example • Multi-cycle locking too inefficient • Almost every S/R language eventually incorporates some form of zero-delay interprocess signaling • Exceptions: V++, ReactiveC • Almost always makes hash of the semantics • Question: How can we do interprocess zero-delay signaling without making a mess?
Answer: Go Back to Hardware • Zero-delay signaling is OK: what makes a mess is zero-delay loops • Hardware: run zero-delay wires in only one direction • Software: impose a priority order on processes • High-priority processes execute “first” • Higher-priority processes can signal lower-priority processors (but not vice-versa) • Concrete realization: Mutex
Mutex • Single-bit shared variable • Two states: “locked” and “unlocked” mutex foo; If (foo.lock()) { …execute guarded code… } • lock() operation • Only succeeds (returns 1) if mutex is unlocked • Prevents any subsequent lock on foo from succeeding until unlock() is executed • unlock() releases lock at the beginning of next cycle • So, e.g., if (foo.lock()) foo.unlock() holds lock for this cycle
Implementing Mutex Safely • Hardware: no issue • Arrange blocks of logic corresponding to processes in priority order • Mutex signals flow from high-priority to low-priority process • Arbitration on variable write works the same way • Software: same idea • Run processes in priority order • High-priority processes run before lower-priority processes • Mutex locks in high-priority process automatically visible to lower-priority process • But price is very high: conceptually, serialized a parallel computation
Recovering Parallelism With Mutexes • Recall: Each process defines a forest of DAGS • Call each such DAG a fiber • Each Mutex defines a partial order among fibers • FA > FB iff • Fiber A and Fiber B both lock Mutex F • A is higher priority than B • At every cycle, exactly one fiber per process will run • For this cycle, choose any schedule consistent with partial orders on runnable fibers • Optimization: locked mutexes don’t affect schedule (all lock operations in cycle will fail, and only succesful locks introduce dependency) • Therefore: disregard partial orders imposed by locked mutexes
Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions
Alviso Status And Conclusion • Hardware synthesis chain written and tested on a few sample designs • Need for zero-delay intermodule communication noted • Arbitration on memory interface • Software interpreter written and tested • XML intermediate form under development • Planned first release April 2011 • Is it perfect? Far from it… • Need users to help us figure out how to make it better • Contact: erik.rubow@ericsson.com, rick.mcgeer@hp.com