290 likes | 608 Views
Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface. Overview. A DMA controller using Bluespec is presented Including several examples showing one possible refinement flow First model shows a simple 1 channel, 1 port model (222 lines)
E N D
Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface
Overview • A DMA controller using Bluespec is presented • Including several examples showing one possible refinement flow • First model shows a simple 1 channel, 1 port model (222 lines) • Final model shows a 2 channel, 2 port model, with pipelined and concurrent transactions (308 lines) • Testbenches included for all models
General Setup slave master Testbench DMA Target master slave
DMA • All DMA models contain a configuration port (slave), to read & write configuration and status registers • One or two memory ports are included • Configuration/Status registers • Source and destination address register • Transfer count • Enable and current status • Port selection
File Organization • DMA.bsv – The DMA model • TestBench.bsv – The testbench • sysTestBenchBench.out.expected – expected simulations results • Socket_IFC.bsv – defines a simple socket protocol, interfaces, structures, and utility functions. • EdgeFIFOs.bsv – some specialized FIFOs used throughout the designs • Targets.bsv – A simple target module used for testing • Makefile – usual
Socket Interface • We use a simple Socket interface which is similar to OCP • Request and Responses are decoupled • Several requests may be outstanding (pipelined)
Socket Interface reqOp reqAddr reqData reqInfo reqAccept respOp respAddr respData respInfo respAccept Master interface Slave interface Request side Target Initiator Response side
Socket_Ifc.bsv • Defines structures for Request and Responses • Interfaces for the Master and Slave typedef struct { RespOp respOp; RespInfo respInfo; RespAddr respAddr; RespData respData; } Socket_Resp interface Socket_master_req_ifc; method ReqOp getReqOp (); method ReqInfo getReqInfo (); method ReqAddr getReqAddr (); method ReqData getReqData (); method Action reqAccept (); endinterface
Socket_Ifc.bsv • Conversion functions from FIFO interfaces to Interface • Utilities to connect Master and slave interface • Convenience functions for debug function Socket_master_ifc fifos_to_master_ifc (FIFOF#(Socket_Req) reqs, FIFOF#(Socket_Resp) resps);
Edge FIFOs • Specialized FIFOs • Pipeline FIFOs – a pipeline register with a FIFO interface. • Bypass FIFOs – a 1 element FIFO which allows non-registered operations • Unguarded versions – Disables Bluespec’s implicit conditions; needed for socket protocol to provide data every cycle
Targets • Contains just a “dummy” target to act like a “memory” for testing • Minimum 2 cycle latency requests Rules for Read and Write requests Slave responses
Model development details • V0 – Simple FSM based DMA controller. 1 channel, 1 bus • V1 – Rule-based DMA controller, allowing pipelined requests • V2 – 2 Memory port, allowing pipelined concurrent read and write request • V3 – Modified version of V2 showing Bluespec’s elaboration features • V4 – 2 channels, 2 ports, concurrent and pipelined transactions
V0 – simple DMA FSM Idle Read Finish Write Finish Write config Rules for Read and Write config/status resisters Config requests Slave Config responses mmu Pipeline FIFO Master Non-idle arc write a mmu request or read a response Bypass FIFO
VO DMA behavior • 8 cycles to move each word • 2 cycle in memory • 2 cycle in DMA (enqueue request, and grad data) cycle: 25 Target mem: Socket_Req{RD, 001, 001001, 0000000000000000}} cycle: 26 cycle: 27 cycle: 28 cycle: 29 Target mem: Socket_Req{WR, 002, 005001, 0000100100001001}} cycle: 30 cycle: 31 cycle: 32 cycle: 33 Target mem: Socket_Req{RD, 001, 001002, 0000000000000000}}
V0 thoughts • Most cycles are spent waiting for the mmu to respond • Read requests cannot overlap with write requests • V1 decouples all these activities • Read and write requests start anytime one is needed (and all pre-conditions are met) • Responses are taken and acted upon when they arrive.
DMA V1 • Each read request passes the write address to the write side via a FIFO • Reads or Writes can start at any time rule startRead (dmaEnabledR && readCntrR > currentReadR ) ; let req = Socket_Req {reqAddr : readAddrR, reqData : 0, reqOp : RD, reqInfo : 1}; mmuReqF.enq( req ) ; // Enqueue the Write destination address destAddrF.enq( destAddrR ) ; // increment addresses, decrement the counter. readAddrR <= readAddrR + 1 ; currentReadR <= currentReadR + 1 ; destAddrR <= destAddrR + 1 ; endrule
V1 DMA write rule • Implicit conditions check FIFO states as needed • Rule urgency puts writes before reads (* descending_urgency = "startWrite, startRead" *) rule startWrite ( True ) ; let wreq = Socket_Req {reqAddr : destAddrF.first, reqData : responseDataF.first, reqOp : WR, reqInfo : 2 }; // tag info with 2 // enqueue the request. mmuReqF.enq( wreq ) ; // remove wdata from the fifos destAddrF.deq ; responseDataF.deq ; endrule
V1 Behavior • Fully pipelined behavior • Achieves maximum throughput - 2 cycles per word Target mem: Socket_Req{RD, 001, 001000, 0000000000000000}} cycle: 18 Target mem: Socket_Req{RD, 001, 001001, 0000000000000000}} cycle: 19 Target mem: Socket_Req{RD, 001, 001002, 0000000000000000}} cycle: 20 Target mem: Socket_Req{RD, 001, 001003, 0000000000000000}} cycle: 21 Target mem: Socket_Req{WR, 002, 005000, 0000100000001000}} cycle: 22 Target mem: Socket_Req{WR, 002, 005001, 0000100100001001}} cycle: 23 Target mem: Socket_Req{WR, 002, 005002, 0000100200001002}} cycle: 24 Target mem: Socket_Req{WR, 002, 005003, 0000100300001003}} cycle: 25 Target mem: Socket_Req{RD, 001, 001004, 0000000000000000}}
V2 – A second memory port • Port can be second memory, bus, or peripheral, separate read/write ports. • Hardware additions: • Second master interface to DMA • New FIFOs for interface • Configuration register to mark port • Duplicated rules for each port • Bluespec’s Rule analysis insures safe use of shared hardware – MMUs, FIFOs, etc. • Muxes and control logic added automatically
V2 DMA behavior • Concurrent read and writes on different memories • Peak throughput – 1 word per cycle • Pipeline behavior maintained cycle: 23 Target memA: Socket_Req{WR, 002, 005000, 0000100000001000}} cycle: 24 Target memB: Socket_Req{RD, 001, 001004, 0000000000000000}} Target memA: Socket_Req{WR, 002, 005001, 0000100100001001}} cycle: 25 Target memB: Socket_Req{RD, 001, 001005, 0000000000000000}} Target memA: Socket_Req{WR, 002, 005002, 0000100200001002}} cycle: 26 Target memB: Socket_Req{RD, 001, 001006, 0000000000000000}} Target memA: Socket_Req{WR, 002, 005003, 0000100300001003}} cycle: 27 Target memB: Socket_Req{RD, 001, 001007, 0000000000000000}}
V3 – Bluespec Elaboration • Bluespec allow manipulation of most objects (e.g., Rules, FIFOs) during elaboration • Reduces cut & paste code, allows better reuse • V3 defines function to generate rules for each mmu port function Rules generatePortDMARules (Bool rdPortCond, Bool wrPortCond, FIFOF#(Socket_Req) requestF, FIFOF#(Socket_Resp) responseF );
V3 Results • Same behavior as V2 • Real difference is about 26 lines of code out of 227 lines. Less than 10 % for a second mmu port.
V4 – multiple channels • Each channel is separate DMA engine – has it own read/write address, config/status registers, etc. • V4 model changes constructs from scalar to vector, e.g. • Rule generation function used to create second set of rule second channel • Bluespec’s rules manages concurrency // The destination address registers Vector#(NumChannels,Reg#(ReqAddr)) destAddrRs <- replicateM( mkReg(0) );
V4 Behavior • Multiple channels, multiple ports, fully pipelined, concurrent reads and writes across channels Target memB: Socket_Req{RD, 000, 001026, 0000000000000000}} Target memA: Socket_Req{WR, 001, 005023, 0000102300001023}} cycle: 135 Target memB: Socket_Req{RD, 000, 001027, 0000000000000000}} Target memA: Socket_Req{RD, 002, 002017, 0000000000000000}} cycle: 136 Target memB: Socket_Req{WR, 003, 007016, 0000201600002016}} Target memA: Socket_Req{WR, 001, 005024, 0000102400001024}}
Summary • Concurrency automatically analyzed and control logic automatically synthesized • 4 unique DMA architectures minimum development effort – compare lines of code • Allow rapid exploration and analysis of different architectures