1 / 35

Alviso

Alviso. Rick McGeer (HP) Erik Rubow (Ericsson) Stephen Lonergan (U Vic) Amin Vahdat (UCSD). Outline. Motivation A Quick Tour of Alviso The Problem of Unrestricted Communication in Parallel Systems Lessons from Hardware: Restricted Communication between modules

konala
Download Presentation

Alviso

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alviso Rick McGeer (HP) Erik Rubow (Ericsson) Stephen Lonergan (U Vic) Amin Vahdat (UCSD)

  2. Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions

  3. Alviso Motivation • Make NetFPGA programming accessible to network designers • NetFPGA: FPGA-based 4-port switch board • Key to building high-speed software-defined networks • Typical NetFPGA designer knows software routers, not VLSI design

  4. NetFPGA • Basic building block is a Xilinix FPGA (Virtex-II) • Programming tools are Verilog (simulator-based HDL), Synopsys/Cadence/Xilinx synthesis tools for FPGA • Problems • Verilog very low-level design tool • Many details of hardware design must be mastered by designer • No high-level network-based design environment • Why is this interesting? • Software on GP processors still can’t keep up with modern switching equipment • Modern high-performance software routers require very substantial hardware (typically, GPGPUs)

  5. Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions

  6. Alviso • C-like language whose modules can be easily realized as either hardware or software • Some restrictions • No memory allocation • No functions or recursion • Built-in parallelism • No forks

  7. Alviso Elements • Module: Basic unit of design • Roughly equivalent to an Object in software design • Container of processes (see below) and variables • No shared variables across module boundaries! • All communications into/out of modules through “ports” (similar to software parameters) • Exactly equivalent to hardware ports

  8. Alviso Elements • Process: Basic element of computation • Roughly equivalent to a thread in software • Equivalent to a block of logic in hardware • Begins immediately on load • Runs to completion

  9. Alviso Elements • Port: Variable explicitly written or read by a process in a module • Sole means of communication into/out of a module • Equivalent to a hardware port • Always latched (see below) • Roughly equivalent (in software) to a public object variable with a get/set method (read = get, write = set)

  10. Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions

  11. The Problem of Parallel Design x=2; x=x+1; • The central assumption of design: the finite state model of computation • Every variable is a little FSM • Quiescent unless explicitly perturbed by an instruction • But parallel design breaks this model for shared variables x = 3…right? proc thread1() { x=2; x=x+1; } proc thread2() { x=x*100; } Value of x is indeterminate

  12. All the Problems in Parallel Design Break Down into solving this • How do we recover a semantically-consistent deterministic model of design with efficient communications? • A key to efficient multicore programming, hardware/software codesign,…. • There are other problems, but without solving this one they are all built on a house of sand…

  13. Historical Answer: Restrict Communications • Problem is fundamentally one of communication • Unrestricted asynchronous communication breaks design model • Solution 1: No shared variables between threads • Inefficent: effectively, every thread is in its own address space • Solution 2: Locks and semaphores: restrict ability of other threads to play with state during computation • Deadlock! • Locks themselves become a nondeterministic, asynchronous communication channel….

  14. Requirements of our Solution • Semantics independent of external systems (e.g., a thread scheduler) • Efficient communication between threads • Designs fully implementable in either hardware or software • Module behavior identical independent of hardware or software realization – semantics independent of implementation • A caveat: mixed hardware/software systems will vary in behavior, depending on mix of hardware/software components • Hardware components are much faster than software components

  15. Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Synchronous Languages • A Practical Realization • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions

  16. Lessons From Hardware • Hardware Design is… • Highly parallel • Efficient • Deterministic • Independent of mysteries such as thread scheduling…. • How did those guys do that? • And, more to the point, how can we?

  17. Classic Hardware Design • Banks of acyclic “combinational” logic, separated by clocked latches • Data flows unidirectionally in logic, latches update at clock edge Latch Latch Logic Logic

  18. Means… • Acyclic logic: logic banks compute in fixed time – length of longest path through the circuit • Latches update only on clock edges: value of logic inputs stable during computation • Computation divided into “cycles” of fixed length: no communication between logic blocks during computation

  19. Mapping Alviso to Hardware Ports Latch Latch Logic Logic Process(es) Variables

  20. Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions

  21. Adapting Hardware to Languages • Shared Variables == Latches • Logic Blocks == Threads • Threads run for fixed block of time, then “wait” for next cycle of computation • Shared variables only update when all threads are waiting • No interrupts, no locks, no semaphores….

  22. Alviso • Synchronous/Reactive Language • Computation in “zero” time, communication takes time “one” • Means: no communication while computing • Follows: Esterel, Lustre, ReactiveC, Signal, V++, SMV • C-like syntax • Major new innovation: “wait” statement • “wait”: halt computation and wait for variables to update • Each thread must execute a wait statement within a fixed period of time • Means: each cyclic computation graph (aka, loop) must contain a wait statement

  23. A quick example proc thread1 { x = 2; while(true) { x++; wait; } } proc thread2 { wait; while(true) { x <<= 1; wait; } } x = 3 x = 4 But what about after the wait?

  24. Answer: Deterministic Priority • What happens with conflicts on shared variable updates? • No effect on computation: updates only visible after wait • But x can only have one value…which should we choose? • Answer: priority. Processes have deterministic priority (total order on processes). In the event of conflict, higher-priority process wins

  25. Alviso Computational Graph • wait statements lead to a computation graph that is a forest of DAGs • Roots of the DAGs: initial statements of processes and statements immediatelyfollowing wait statements • Leaves: final statements of processes and wait statements • Computation terminates at a leaf on each cycle • Starts on the next cycle at the subsequent root • Computation in cycle is traversal of the DAG from root to leaf

  26. Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions

  27. Interprocess Zero-Delay Signaling • Sometimes, you just have to break the rules • Occasionally, processes need to signal each other in the same cycle • To gain exclusive access to a shared variable, for example • Multi-cycle locking too inefficient • Almost every S/R language eventually incorporates some form of zero-delay interprocess signaling • Exceptions: V++, ReactiveC • Almost always makes hash of the semantics • Question: How can we do interprocess zero-delay signaling without making a mess?

  28. Answer: Go Back to Hardware • Zero-delay signaling is OK: what makes a mess is zero-delay loops • Hardware: run zero-delay wires in only one direction • Software: impose a priority order on processes • High-priority processes execute “first” • Higher-priority processes can signal lower-priority processors (but not vice-versa) • Concrete realization: Mutex

  29. Mutex • Single-bit shared variable • Two states: “locked” and “unlocked” mutex foo; If (foo.lock()) { …execute guarded code… } • lock() operation • Only succeeds (returns 1) if mutex is unlocked • Prevents any subsequent lock on foo from succeeding until unlock() is executed • unlock() releases lock at the beginning of next cycle • So, e.g., if (foo.lock()) foo.unlock() holds lock for this cycle

  30. Implementing Mutex Safely • Hardware: no issue • Arrange blocks of logic corresponding to processes in priority order • Mutex signals flow from high-priority to low-priority process • Arbitration on variable write works the same way • Software: same idea • Run processes in priority order • High-priority processes run before lower-priority processes • Mutex locks in high-priority process automatically visible to lower-priority process • But price is very high: conceptually, serialized a parallel computation

  31. Recovering Parallelism With Mutexes • Recall: Each process defines a forest of DAGS • Call each such DAG a fiber • Each Mutex defines a partial order among fibers • FA > FB iff • Fiber A and Fiber B both lock Mutex F • A is higher priority than B • At every cycle, exactly one fiber per process will run • For this cycle, choose any schedule consistent with partial orders on runnable fibers • Optimization: locked mutexes don’t affect schedule (all lock operations in cycle will fail, and only succesful locks introduce dependency) • Therefore: disregard partial orders imposed by locked mutexes

  32. Outline • Motivation • A Quick Tour of Alviso • The Problem of Unrestricted Communication in Parallel Systems • Lessons from Hardware: Restricted Communication between modules • Alviso: A Synchronous Language • Restricted combinational communication • Motivation • The Mutex statement • Strict priority on processes • Recovering maximal parallelism • Status and Conclusions

  33. Alviso Status And Conclusion • Hardware synthesis chain written and tested on a few sample designs • Need for zero-delay intermodule communication noted • Arbitration on memory interface • Software interpreter written and tested • XML intermediate form under development • Planned first release April 2011 • Is it perfect? Far from it… • Need users to help us figure out how to make it better • Contact: erik.rubow@ericsson.com, rick.mcgeer@hp.com

More Related