240 likes | 349 Views
RAMP Stats and Monitoring. Derek Chiou , Bill Reinhart, Nikhil Patil with Krste Asanovic and Joel Emer. Goals/Requirements. Provide functionality equivalent to software-based simulators at RAMP speeds Full observability Monitoring for events
E N D
RAMP Stats and Monitoring Derek Chiou, Bill Reinhart, Nikhil Patil with KrsteAsanovic and Joel Emer
Goals/Requirements • Provide functionality equivalent to software-based simulators at RAMP speeds • Full observability • Monitoring for events • Triggers for breakpoints, dumping state, etc. • Trace (lossy and lossless) • Aggregate Statistics • Baseline functionality automatically included • Resource efficient • Flexible • Dynamic and static configurablility • Integrated with other infrastructure (component interfaces)
At Least Three Levels of Debug/Monitoring/Stats • Platform/Unmodellevel • Bringing up BEE3/ACP system independent of RAMP code • May be strange bugs that get exercised with RAMP usage model • Simulator (Model) level • Simulator may model target incorrectly • Monitor simulator bandwidth requirements • Could be very different than target machine (e.g., cache of target cache) • Target level • The target machine may have been implemented correctly, but that is incorrect • Stats/tracing of working target • We focus on simulator (model)/target level, but hopefully some will be useful for platform level as well
Bill Reinhart, Nikhil A Patil Statistics/Monitoring Philosophy • Instrument simulator communication (eg, RAMP channels) • Communication mechanisms are logically connected to command network • Can export/examine/change anything being communicated • No need to add additional code if that is sufficient • Turn off to save resources when possible • Introduce additional communication to export where communication does not already exist • Use standard simulator communication (channel) interfaces • Automatically provides target timing information • Connected to null end-point that logically dumps • Pipe to /dev/null • Potentially have non-timed interface, but need time reference point
Simple Example F D E M W State compressor compressor
Required Support • Endpoint support • Channel support • Transport (network) • Naming
User vs Simulator Initiated • Precise User-Initiated • function call to read/write value at specific target time • Can be implemented through timed channels • Commands live in target time • Can be handled logically as a compressor • discard data unless there is a command • How far ahead in target time should pull command be issued? • Too close impact performance but enables precise control • Too far makes reacting to event difficult • Imprecise User-Initiated • Issue a read of state, perform whenever, report back target time • Simulator-initited • dump everything, filter later • can be slow if there is limited bandwidth, storage, filtering
Required Support: Endpoint • Provide state connected to command network • Same interface as a register, drop in replacement • Stats counters, monitor points, control points, etc. • Provide default compressors/filters • Output every n cycles • Output on rollover • Output toggled on signal • Etc.
Required Support: Channel • Optional connection to control network • Use internal buffering to look back in time • Channels implements as circular buffer in BRAM • Far more storage than needed (in general) • Can look back in time • Can save bandwidth by only exporting when needed tail head
Required Support: Transport • Transport • To units: commands, configuration, state changes, etc. • From units: Extract target/host state, statistics, etc. • Could be virtual channel(s) on common physical network • LossyNetwork? • Lossless for now, support lossy at endpoint • QoS? • A ring or a ring of rings for simplicity • Ordered network simpler • helps reconstruction of data outside • But, could result in less efficiency
Required Support: Naming/Tagging • Naming of source of data • Command • read P1.iCache.num_hits stats register translated to actual register • Returned data/Trace entry • Needs to be tagged to indicate data • Each stats entry also includes at least • Target time • Potentially platform/host time for platform/simulator-level debugging
FPGA Debug HariAngepat, Chris Craik and Derek Chiou Electrical and Computer Engineering University of Texas at Austin
Introduction • FPGA Simulators offer magnitude speedup • However, can suffer from traditional hardware issues of limited visibility and debugging challenges • RAMP Simulators face additional complexity to due scalability requirements that may prevent instrumenting every signal in the simulator 1 FPGADBG
Challenge • How to bring software level debugging visibility to RAMP simulators without dramatically increasing resources or affecting timing closure
Challenge • How to bring software level debugging visibility to RAMP simulators without dramatically increasing resources or affecting timing closure • Revisit idea of FPGA state readback in combination with gdb style debug interfaces
Our Technique • 1) Leverage FPGA readback mechanism to exploit as much free visibility as possible • FPGA frame readback exists in V2Pro, V4, V5 • Can sample flip-flop state dynamically • Can sample BRAM/LUT (notes on this later..) • Can use JTAG hardware for latency-tolerant low-resource physical link 1 FPGADBG
Our Technique • 2) Provide a GDB interface that can debug both a software process, as well as a FPGA fabric simultaneously. • Can display FPGA netlist symbols alongside software symbols • Can allow for hybrid CPU/FPGA platform debugging (ie. X86-FSB-FPGA) 1 FPGADBG
FPGADBG Toolflow Software Sources (C/C++/…) Hardware Sources (Verilog/VHDL/…) Compiler Hierarchy Name Preservation Constraints Debug Flags (-g -Ox) Synthesis FPGA Implementation Symbol Table ASCII Disassembly Binary Executable Logic Allocation Map PAR Netlist FPGA Bitstream Dummy! FPGADBG – Interactive extension that enables non-intrusive debugging of software running on FPGA (GDB-Py) Software Debugger (GDB) 1 FPGADBG
Architecture • Designed as set of C/Python libraries • GDB Interface (plugin) • Netlist Frontend (parsing, mapping) • FPGA Backend (board comm, readback) • Hardware library (step control, ICAP readback) • GDB frontend allows connecting to software-based portions of a simulator • Assumes design-level support for step • Allows design to ensure consistent state before sampling 1 FPGADBG
Architecture Target Application User Logic Target OS Target Virtual Machine GDB GDB Plugin Bindings (Python) Domain Step Control Readback Engine (ICAP) FPGADBG Core (Python) FPGA Chip Comm (C) FPGA Readback (C) Netlist Parser (Python) IO Logic (Transport Layer) FPGA Fabric HW/SW Simulation Platform 1 FPGADBG
Netlist Parsing Top myREG regOut dout Bit 6597758 0x005e0200 5758 Block=SLICE_X88Y18 Latch=XQ Net=dout(3)Bit 6597838 0x005e0200 5838 Block=SLICE_X88Y16 Latch=XQ Net=dout(1)Bit 6604350 0x005e0400 5758 Block=SLICE_X88Y18 Latch=YQ Net=dout(2)Bit 6604430 0x005e0400 5838 Block=SLICE_X88Y16 Latch=YQ Net=dout(0) inst "regOut(1)" "SLICE",placed R72C45 SLICE_X88Y16 ,cfg " BXINV::BX BXOUTUSED::#OFF BYINV::BY BYINVOUTUSED::#OFF BYOUTUSED::#OFF ... DXMUX::0 DYMUX::0 F::#OFF F5USED::#OFF FFX:myREG/dout_1:#FF FFX_INIT_ATTR::INIT0 FFX_SR_ATTR::SRLOW FFY:myREG/dout_0:#FF FFY_INIT_ATTR::INIT0 FFY_SR_ATTR::SRLOW ... ";inst "regOut(3)" "SLICE",placed R71C45 SLICE_X88Y18 ,cfg " BXINV::BX BXOUTUSED::#OFF BYINV::BY BYINVOUTUSED::#OFF BYOUTUSED::#OFF ... DXMUX::0 DYMUX::0 F::#OFF F5USED::#OFF FFX:myREG/dout_3:#FF FFX_INIT_ATTR::INIT0 FFX_SR_ATTR::SRLOW FFY:myREG/dout_2:#FF FFY_INIT_ATTR::INIT0 FFY_SR_ATTR::SRLOW ...“ ; 1 FPGADBG
Netlist Parsing • FPGA toolflow introduces optimizations and naming issues Physical Netlist Alias Detection Vector Merger Hierarchy Construction Frame Address Mapping Symbolic Netlist FPGA Cmd Generator ReadbackCmd Parser Bitstream Reorder FPGA Board Communication ReadbackBitstream
Limitations • Hardware readback has limitations: • RAMs require offline readback due to resource contention issues • FPGA frame span large vertical stripes potentially restricting visibility if some logic cannot be disabled during sampling • Hierarchy must be preserved during synthesis to ensure understandable netnames • Step control requires design-level support 1 FPGADBG
Status & Future Work • Current prototype implements board communication with the XUP Virtex2Pro30 with JTAG-based frame readback • Frontend netlist parser support hierachical node generation, bit vector merging and some support for aliased signals. • Full GDB shell expected to be released in Q1-2009 with support for Virtex5{110/330} 1 FPGADBG