230 likes | 323 Views
On Controllers, Soft Connections, and Logical Topologies. Michael Pellauer MIT CSAIL Angshuman Parashar, Michael Adler, Joel Emer Intel VSSAD. The Setup. (For both our HAsim simulator and the talk) Virtex5 110t on HiTechGlobal PCIe accelerator Future: FSB-based accelerators. Larrabee?
E N D
On Controllers, Soft Connections, and Logical Topologies Michael Pellauer MIT CSAIL Angshuman Parashar, Michael Adler, Joel Emer Intel VSSAD
The Setup • (For both our HAsim simulator and the talk) • Virtex5 110t on HiTechGlobal PCIe accelerator • Future: FSB-based accelerators. Larrabee? • Use HAsim’s Remote-Request-Response (RRR) • Protocol of communication between SW/HW • Allows calls from one to the other run program emulate instr FPGA Host Processor translate address dump stats PCIe
The Problem of the Day • Just because you can talk doesn’t mean you have anything interesting to say! • We must control higher-level interactions between software and hardware • Example: “Dump Stats” command • Transmit requests intra-FPGA, aggregate responses • Future: think about multiple-FPGA setup Cache dump stats PCIe Interface RRR Controller … FPGA Branch Pred
The HAsim Controller • Software sees it as… • Hardware sees it as… run, pause, … Controller Host Software setParam RRR dump stats Different modules access different services run, pause, … setParam Controller dump stats Which modules use which service is very fluid enable events debug assertion fail
Problem: HDLs’ Inflexible Interfaces Core Controller Front End RRR Fetch PCIe Simulator HW Module Instantiation • Branch Predictor has a bug • Want to send some debug info to the Controller • Fundamental Problem: HDLs allow communication only up and down hierarchy • Verilog OOMRs are not an acceptable solution • Gets worse if we have alternative modules Branch Pred
Our Solution: Soft Connections • Goal: “soften” rigid communication hierarchy • Users separately instantiate named endpoints • Can read and write as if they were half of a guarded FIFO (FI and FO) • Instantiator’s interface does not change • Bluespec standard ModuleCollect library send() recv() mkSend mkRecv “fet2dec” “fet2dec” Added During Bluespec Static Elaboration Compiler Phase
Review: Static Elaboration Phase Hardware Toolflow: Software Toolflow: source source Elaborate w/params Compile design1 design2 design3 .exe run w/ params run w/ params run1 run1 run2 run3 run1 run1.1 run1 run1 run2.1 run1 run1 run3.1 run1 run1 … … … … • Inline function calls and datatypes as combinational logic • Instantiate modules with specific parameters • Resolve polymorphism/overloading
Elaboration-Time Algorithm let (sends, recvs) = getCollection() // Get from ModuleCollect foreachsinsendsdo letrs = matchByName(s.name, recvs) ifrs == {} andnots.optionalthen error(“Unmatched Send:” + s.name) elseifrs == {r} then connect(s, r) // instantiate buffering else error(“Multiple Receives connected to:” + s.name) recvs = recvs – rs // remove matched recvs foreachrinrecvsdo error(“Unmatched Receive:” + r.name) Open Question: Can we do this in SystemVerilog as well?
“Multicast” Connections Standard receive modules ID + data send() mkSend “debug_out” send() mkSend Standard send modules listen() “debug_out” mkListener “debug_out” send() mkSend “debug_out” • A one-to-many Send (broadcast) • A many-to-one Recv (listener) recv() mkRecv (now multiple recvs are no longer an error) “start_prog” recv() mkRecv “start_prog” broadcast() mkBcast recv() mkRecv “start_prog” “start_prog”
Building 2-Way Communication Pair of normal send and recv getReq() mk Server makeResp() “stats_count” broadcastReq() mk Client makeReq() mk Client getResp() mk Server getReq() getResp() getReq() “stats_count” “mem_load” mk Server makeResp() “stats_count” makeResp() ID + data makeReq() mk Client “mem_load” getResp() “mem_load” Standard Server modules ID + data Standard Client modules • More complex abstractions from primitives • Client/Server • “Multicast” Client/Server makeReq() getReq() mkClient mkServer getResp() makeResp() “mem_load” “mem_load”
Controller Services: Revisited • Which should get which type of soft connection? • Commands/Params: • Receive from software, send to many modules • One-to-Many Broadcast • Can make a nice abstraction for local commands, params • Events/Stats: • Receive from software, send to many modules, aggregate responses • Many-to-one Client • Assertions/Debug: • Receive from many modules, send to software • Many-to-one Receive
Case Study: span • span(c) = number of instantiation boundaries crossed between sender and receiver • Roughly, the pain of changing a communication path • In HAsim, 118/217 connections are to/from Controller • We start to worry about the massive fan-in
Logical Topology vs Physical Topology station station station station station station • We described the “logical” communication topology • Could be implemented with different physical topology • Could use Rings/Trees/Grids to offset massive fan-in • Implemented: Rings and Trees • So far no improvement over physical point-to-point this station doesn’t have #5 Station routing tables made at elaboration station has an address for “foo” #5 “foo” send station has to know #5 means “foo” send recv Connection interface does not change! “foo” recv recv
Take Aways • FPGA-as-accelerator model is rapidly maturing • The FPGA-as-raw-fabric model is not ideal • Something like HAsim’s Controller helps • Coordinates interaction between FPGA/SW • Need different Hardware-design techniques for FPGA accelerators • More flexibility needed: reconfigurations common • Soft Connections bring flexibility to interfaces • Make it easier to have a fluid set of modules which interact with the controller • Logical topology != Physical topology • Designer needs help with both
Thank You! pellauer@csail.mit.edu
The Controller’s Services • Commands: • Receive “start” or “pause” from software • Controller distributes to all interested hardware modules • Params: • Receive dynamic command line values • Controller distributes to interested hardware modules • Events: • Software can enable, disable • Controller aggregates, sends to software • Stats: • Software requests dump periodically • Controller passes on request, aggregates responses • Assertions: • Controller passes failures on to software • Debug: • Controller passes info on to software
Making “Gateware” more like Software • Ultimately we want many distributed “services” throughout the FPGA talking to software • They communicate at different rates • It makes sense for the variable/rare services to share the same interconnect on the FPGA • Flexibility of communication == Easier development • Today: Development plan and issues
Review: Soft Connections Point-to-Point “Smart” Synthesis Boundaries Client/Server makeReq() getReq() mkClient mkServer getResp() makeResp() “funcp_fet” “funcp_fet” A try_xfer() xfer_ack() B mkB send() send “fet2dec” send() recv() mkRecv mkSend “fet2dec” “fet2dec” try_xfer() xfer_ack() mkB addDanglingSend(mkB.outg[3], “fet2dec”, “Inst”); outg outg outg outg outg … Compiler Log: “Dangling Send fet2dec [3] {Inst}”
Proposed Primitive: One-To-Many Standard receive modules • A “Broadcast” Send • when (r[0] == 0): • try_xfer(q.first()) • if (ack) r[0] <= 1 • rule when (all r == 1): • all r <= 0 • q.deq() recv() mkRecv “start_prog” • when (r[1] == 0): • try_xfer(q.first()) • if (ack) r[1] <= 1 recv() mkRecv “start_prog” broadcast() mkBcast • when (r[2] == 0): • try_xfer(q.first()) • if (ack) r[2] <= 1 “start_prog” recv() mkRecv “start_prog” • when (r[3] == 0): • try_xfer(q.first()) • if (ack) r[3] <= 1 recv() mkRecv “start_prog” All rules and registers inserted during static elaboration (don’t know how many receivers during instantiation) • Tougher alternative: many FIFOs
Proposed Primitive: Many-to-One Standard send modules ID + data • A “listener” receive send() • rule when (q0.notEmpty): • try_xfer(q0.first(), 0) • if (ack) q0.deq() mkSend All rules inserted during static elaboration (don’t know IDs during instantiation) “debug_out” • rule when (q1.notEmpty): • try_xfer(q1.first(), 1) • if (ack) q1.deq() send() mkSend “debug_out” listen() mkListener • rule when (q2.notEmpty): • try_xfer(q2.first(), 2) • if (ack) q2.deq() “debug_out” send() mkSend “debug_out” • rule when (q3.notEmpty): • try_xfer(q3.first(), 3) • if (ack) q3.deq() send() mkSend “debug_out” • Is a fairness guarantee needed?
Proposed Primitive: Hub Servers ID + data Standard Client modules • Hub Server, Distributed Clients • 1 Many-to-One Connection • Reverse is many One-to-One connections • Remove the ID and send it to the appropriate destination makeReq() mkClient getResp() getReq() mkHub Server “mem_load” makeResp() makeReq() mkClient “mem_load” getResp() “mem_load”
Proposed Primitive: Hub Client ID + data Standard Server modules • Hub Client, Distributed Servers • 1 One-to-Many Connection • 1 Many-to-One Connection getReq() mkServer makeResp() “stats_count” broadcastReq() mkHub Client getReq() getResp() mkServer “stats_count” makeResp() “stats_count” Ability to send to individuals as well?