770 likes | 876 Views
Industrial Semantics Or How to Stop the Maths Getting in the Way of the Marketing Joe Stoy Founder and Principal Engineer Bluespec, Inc. (with help from many at Bluespec) APPSEM05 Workshop, 15 September 2005. Basic Message. An industrial tool needs good semantics Robust Simple
E N D
Industrial Semantics Or How to Stop the Maths Getting in the Way of the Marketing Joe Stoy Founder and Principal Engineer Bluespec, Inc. (with help from many at Bluespec) APPSEM05 Workshop, 15 September 2005
Basic Message • An industrial tool needs good semantics • Robust • Simple • Conforming to users’ model • But the theory must be “under the covers” • Learning curve • Perceived learning curve
Outline • A technology based on Term-Rewriting Systems • A superstructure based on functional programming semantics • Semantic issues
Tool and market • For designing chips (ASICs, FPGAs, ...) • currently low-level with Verilog or VHDL • chip complexity rising (millions of gates) • For chip designers, verification engineers, system architects • ASICs have huge NREs ($500K–$1M) • mistakes (respins) cost another NRE • tools run into millions of $$$ per team, form a significant fraction of a company’s budget (e.g., ~10%) • tools tend to run on UNIX (Solaris, Linux)
History of this technology Research@MIT on high-level synthesis & verification (Prof. Arvind et. al.) Technology Productization within industry Major pilot project: Arbiter for 160 Gb/s router (1.5M gates, 200 MHz, 1.3m) Technology Bluespec, Inc. High-level synth. tool Product available VC funding ~1996 2000 2003 2004
Design Flow BluespecSystemVerilog Transaction CycleAccurateSimulation DALDesign Assertion Level BluespecSynthesis Bluesim EventBasedSimulation Verilog RTL RTLSynthesis Netlist
SystemC/Verilog Bluespec Bluespec SystemC/ Verilog Cosim: Typical Use Models Bluespec integrated into SystemC/Verilog SystemC/Verilog integrated into Bluespec Where Bluespec is: • …integrated into Verilog System-on-Chip (SoC) design • …back-annotated into SystemC model • …part of mixed SystemC/Bluespec model Where Bluespec is designed with: • …existing Verilog IP re-used • …a SystemC model awaiting the Verilog
A technology based on term-rewriting systems
Term-rewriting systems while some rule is applicable • choose an applicable rule • apply it to the term (or a subcomponent) • FP’s standard operational semantics • Our rewritings are less free-wheeling (don’t change structure of term) • maybe “state transition system” a better name
Clocked synchronous hardware The compiler translates BSV source code into Verilog RTL Collection of State Elements I S“Next” S O Transition Logic
p1 pn d1 dn Scheduling and control logic Modules’ (Current state) Modules’ (Next state) “CAN_FIRE” “WILL_FIRE” Rules p1 f1 Scheduler fn pn d1 Muxing cond action dn
Term-rewriting systems while some rule is applicable • choose an applicable rule • apply it to the term (or a subcomponent) • FP’s standard operational semantics • Our rewritings are less free-wheeling (don’t change structure of term) • Rules are constructed from guarded atomic actions with interfaces
Guarded Atomic Actions • Actions are guarded fifo.enq(x); • if fifo full, cannot happen • hides lots of tedious bureaucracy • Actions are atomic
Atomicity • atomic
Atomicity • ατομος
Atomicity • a-tomic
Atomicity • a-tomic • not • asymmetric • atypical • amoral
Atomicity • a-tomic • not • asymmetric • atypical • amoral • cut • microtome • tomography • tome (of a multi-volume book)
Atomicity • Rules are atomic • “Not cut” • Whenever they run, they run to completion • never interrupted • No other activities are interleaved with them • This greatly simplifies design • avoids many race conditions
Guarded Atomic Actions • Actions can be composed a1;a2 • resulting action atomic • guarded by guards of a1 and a2
Guarded Atomic Actions • Conditionals: if (b) a1; • Guarded by “b implies (a1’s guards)” • Another conditional: when (b) a1; • guarded by b (and a1’s guards) • Yet another perhaps-if (b) a1; • unguarded • a1’s guards conjoined to b • Nice algebra • (separate question — which to have in BSV)
… with interfaces • A BSV design is structured by modules • Modules communicate only through interfaces
module Modules and interfaces state interface rule
An example module mkTest (); int n = 15; // constant Reg#(int) state <- mkReg(0); NumFn f <- mkFact2(); rule go (state == 0); f.start (n); state <= 1; endrule rule finish (state == 1); $display (“Result is %d”, f.result()); state <= 2; endrule endmodule: mkTest interface NumFn; method Action start (int n); method int result (); endinterface module mkFact2 (NumFn); Reg#(int) x <- mkReg(?); Reg#(int) j <- mkReg(0); rule step (j > 0); x <= x * j; j <= j - 1; endrule method start (n) if (j == 0); x <= 1; j <= n; endmethod method result () if (j == 0); return x; endmethod endmodule: mkFact2
Method invocationsfit into rules module mkTest () ; … Fact f <- mkFact2(); … rule finish (state==1); $display(“…%d”, f.result()) state <= 2; endrule endmodule module mkFact2 (Fact); Reg#(int) x <- mkReg(?) Reg#(int) j <- mkReg(n); … method result () if (j == 0); return x; endmethod endmodule • Rule condition is: state==1 && j==0 • Explicit condition and all implicitconditionsof all method calls in the rule • Thus, • a part of the rule’s condition (j == 0), and • a part of a rule’s computation (reading x) are in a different module, via a method invocation
Modularizing rules module mkTest () ; … Fact f <- mkFact2(); rule go (state==0); f.start(15); state <= 1; endrule … endmodule module mkFact2 (Fact); Reg#(int) x <- mkReg(?) Reg#(int) j <- mkReg(0); … method start (int n) if (j == 0); x <= 1; j <= n; endmethod endmodule • Rule condition: state==0 && j==0 • Rule actions: state<=1, x<=1 and j<=15 • Thus, a part of the rule’s action is in a different module
Order of Evaluation • Not Lazy (e.g. Haskell’s) • Schedule as many rules as possible in each clock cycle • (patented technology – James Hoe et al)
Clocked synchronous hardware The compiler translates BSV source code into Verilog RTL Collection of State Elements I S“Next” S O Transition Logic
Rule semanticsmapped to hardware semantics rule steps Ri Rj Rk Rules Rj Rk HW clocks Ri The effect of each cycle is as if a sequence of rules was executed one-at-a-time Consequence: The HW state can never result from an interleaving of actions from different rules Rule atomicity (therefore, correctness) is preserved
p1 pn d1 dn Scheduling and control logic Modules’ (Current state) Modules’ (Next state) “CAN_FIRE” “WILL_FIRE” Rules p1 f1 Scheduler fn pn d1 Muxing cond action dn
… leads to pragmaticconstraints on rule combination • Initial set: • A rule fires within a clock cycle • A rule fires at most once in a clock cycle • A rule’s effect is only visible in the next clock • We only combine rules in a certain fixed order within a cycle • All rules which read a register must precede any which write it • We only consider rules enabled at the start of the clock cycle • Each rule is independent of previous rules executing in the same cycle • The logic path delay depends on individual rule paths, and not on combinations of rules • … • (Some since relaxed)
Benefits of atomic-action semantics
Consider this example • Process 0 increments register x • Process 1 transfers a unit from register x to register y • Process 2 decrements register y • This is an abstraction of some real applications: • Bank account: 0 = deposit to checking, 1 = transfer from checking to savings, 2 = withdraw from savings • Packet processor: 0 = packet arrives, 1 = packet is processed, 2 = packet departs • … 0 2 1 +1 -1 +1 -1 x y
0 2 1 +1 -1 +1 -1 x y Concurrency in the example cond0 cond1 cond2 • Process j (= 0,1,2) only updates under condition condj • Only one process at a time can update a register. Note: • Process 0 and 2 can run concurrently if process 1 is not running • Both of process 1’s updates must happen “indivisibly” (else inconsistent state) • Suppose we want to prioritize process 2 over process 1 over process 0 Process priority: 2 > 1 > 0
0 2 1 +1 -1 +1 -1 x y Is either correct? cond0 cond1 cond2 Process priority: 2 > 1 > 0 if ((!cond1 || cond2) && cond0) always @(posedge CLK) // process 0 if (!cond1 && cond0) x <= x + 1; always @(posedge CLK) // process 1 if (!cond2 && cond1) begin y <= y + 1; x <= x – 1; end always @(posedge CLK) // process 2 if (cond2) y <= y – 1; always @(posedge CLK) begin if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1; if (cond2) y <= y – 1; else if (cond1) y <= y + 1; end Where’sthe error? Which of these solutions are correct, if any? What’s required to verify that they’re correct? Now, what if I Δ’d the priorities: 1 > 2 > 0? And, what if the processes are in different modules?
0 2 1 +1 -1 +1 -1 x y With Bluespec, design is direct Process priority: 2 > 1 > 0 cond0 cond1 cond2 (* descending_urgency = “proc2, proc1, proc0” *) rule proc0 (cond0); x <= x + 1; endrule rule proc1 (cond1); y <= y + 1; x <= x – 1; endrule rule proc2 (cond2); y <= y – 1; endrule Functional correctness follows directly from rule semantics Related actions are grouped naturally with their conditions—easy to change Interactions between rules are managed by the compiler (scheduling, muxing, control) Same hardware as the RTL
Reorder Buffer Verification-centric design
FIFO FIFO FIFO FIFO FIFO FIFO FIFO FIFO Example from CPU design RegisterFile RegisterFile • Speculative, out-of-order • Many, many concurrent activities Re-OrderBuffer(ROB) Re-OrderBuffer(ROB) ALUUnit ALUUnit Decode Decode Fetch Fetch FIFO FIFO MEMUnit MEMUnit Branch Branch InstructionMemory InstructionMemory DataMemory DataMemory Nirav Dave, MEMOCODE, 2004
E Get operandsfor instr W Writebackresults Di K State Instruction Operand 1 Operand 2 Result Do Head Get a readyALU instr Put MEM instr results in ROB Put ALU instr results in ROB Insert aninstr intoROB Tail Empty Waiting Resolvebranches Dispatched Killed Done ROB actions RegisterFile Re-Order Buffer Instr - V - V - - E Instr - V - V - - E W Instr A V 0 V 0 - ALUUnit(s) W Instr B V 0 V 0 - W Instr C V 0 V 0 - DecodeUnit V 0 W Instr D V 0 - E Instr - V - V - - E Instr - V - V - - E Instr - V - V - - Get a readyMEM instr MEMUnit(s) E Instr - V - V - - Instr - V - V - - E Instr - V - V - - E Instr - V - V - - E E Instr - V - V - - Instr - V - V - - E Instr - V - V - - E
But, what about allthe potential race conditions? • Reading from the register file at the same time a separate instruction is writing back to the same location • Which value to read? • An instruction is being inserted into the ROB simultaneously with a dependent upstream instruction’s result coming back from an ALU • Put a tag or the value in the operand slot? • An instruction is being inserted into the ROB simultaneously with a branch mis-prediction • must kill the mis-predicted instructions and restore a “consistent state” across many modules
Dispatch Instr • Mark instructiondispatched • Forward to appropriateunit • Insert Instr in ROB • Put instruction in firstavailable slot • Increment tail pointer • Get source operands • - RF <or> prev instr • Write Back Results to ROB • Write back results toinstr result • Write back to all waitingtags • Set to done • Commit Instr • Write results to registerfile (or allow memorywrite for store) • Set to Empty • Increment head pointer • Branch Resolution • … • … • … Rule Atomicity • Lets you code each operation in isolation • Eliminates the nightmare of race conditions (“inconsistent state”) under such complex concurrency conditions All behaviors are explicable as a sequence of atomic actions on the state
Performance Semantics Another processor example
RF IF Dec Exe Mem Wb bI bD bE bW iMem dMem Rule-based Specifications bypasses • Each pipeline stage is described as a set of atomic rules: R1 = 2 + 3 R1 = 5 rule Execute Add: when(bD.first == (Ri = va + vb)) ==> begin result = va + vb; // compute addition bE.enq (Ri = result); // enqueue result into bE bD.deq; // dequeue instruction from bd end Any legal behavior can be understood in terms of applying one rule at a time
RF IF Dec Exe Mem Wb bI bD bE bW iMem dMem Performance Concerns • The designer wants to make sure that one instruction executes every cycle • FIFOs must support both enq and deq in each cycle A cycle in slow motion I4 I3 I2 I1 I0 I5
What are the semantics of FIFOs?
data_inpush_req_npop_req_nclkrstn data_outfullempty Example from a commerciallyavailable FIFO IP component These constraints are taken from several paragraphs of documentation, spread over many pages, interspersed with other text
A FIFO interface in BSV interface FIFOQueue #(type aType); method Action push (aType val); method aType first(); method ActionValue#(aType) pop(); method Action clear(); endinterface
enab n pop rdy not empty Methods as ports • push: • n-bit argument • has side effect (action) • first: • n-bit result • has no side effect • pop: • n-bit result • has side effect (action) • clear: • no argument • has side effect (action) n enab push rdy not full n rdy first not empty FIFOQueue module enab clear rdy always true
FIFO semantics: types interface FIFOQueue #(type aType); method Action push (aType val); method aType first(); method ActionValue#(aType) pop(); method Action clear(); endinterface
FIFO semantics: laws • Algebra of enq, deq, etc • Not at present part of BSV • Though SVA assertions are • Needed for formal verification work • So far, all in atomic-action world