330 likes | 355 Views
An Integrated Debugging Environment for Reprogrammable Hardware Systems. Kevin Camera Hayden So Bob Brodersen Berkeley Wireless Research Center University of California, Berkeley. AADEBUG 2005. Outline. Motivation Existing platform Existing design/verification flow Proposed solution
E N D
An IntegratedDebugging Environment forReprogrammable Hardware Systems Kevin CameraHayden SoBob Brodersen Berkeley Wireless Research CenterUniversity of California, Berkeley AADEBUG 2005
Outline • Motivation • Existing platform • Existing design/verification flow • Proposed solution • Environment features • Walkthrough • Implementation strategy
Application Domain • Direct-mapped, reprogrammable hardware systems • FPGA-based signalprocessing andsupercomputingarrays
Superior power, computation, and cost efficiency than any processor-based solution, due to direct mapping of algorithms FPGA Computing Benefits Chang, Wawrzynek, Brodersen; ISCA ‘05
BEE2: 2nd Berkeley Emulation Engine • (5) Xilinx V2P100 per board • ~100K logic cells • 2 PowerPC405 cores • 444 dedicated multipliers • 1MB on-chip SRAM • 3.125Gb/s duplex links • (4) DDR2 banks per FPGA • 72 bits per bank with ECC • Up to 12.8 (DDR400) or 17 (533DDR) GB/s bandwidth • Up to 4GB capacity
BEE Design Flow • Design entry is in the Matlab/Simulink environment • Graphical, library based; also allows custom HDL • Typical FPGA path to physical implementation • HDL synthesis and place and route • Hierarchy is flattened in each pass (non-modular flow)
Design Verification Methods • High-level functional simulation • HDL/RTL simulation • Native FPGA execution Complexity,Accuracy
High-level Functional Simulation • Design executionin Matlab/Simulink • Intended to becorrect byconstruction • Fastest software-based simulation • Powerful and convenient algorithm exploration
Drawbacks of High-level Simulation • Even with high level of abstraction, vastly slower than hardware • Trend is worsening with increased FPGA capacity • Doesn’t cover any side-effects or requirements of the backend tool chain
HDL/RTL Simulation • Varying levelsof accuracy • Access toarbitraryinternal signals • But, simulation speed is even slower • Parameterization/Iteration is much harder
Native FPGA Execution • Runs at full speed of hardware • Three tools for on-FPGA testing: • Xilinx ChipScope Pro • System Generator HW-in-the-loop • Good old-fashioned signal probing
Xilinx ChipScope Pro • Inserts BRAM cores into design and binds to JTAG • Captures selected signals and provides trigger conditions • Signals of interest must be chosen in advance • Captured state is limited by available BRAM • Any changes require tool flow re-iteration
System Generator HW-in-the-loop • Allows hardware itself to accept and process data from Simulink via JTAG • Arbitrary number of data elements can be accessed as “ports” • Very powerful tool, but features limited process control
Hands-on Hardware Debugging • Most accurate method for finding timing-related bugs in a “production” system • Tradeoffs are all too well-known: • Complex equipment • Limited probing pins • A priori signal output • Limited input options
Drawback of On-FPGA Execution • Place and route time is a major bottleneck • Complete run is needed for every design change • Increasingly problematic due to larger FPGA capacity
Proposed Solution • Enable extensive debugging and design exploration functionality directly on the hardware platform • Vastly superior execution time for today’s large-scale computing challenges • Exploit the spatial resources of the hardware to assist in debugging • Essentially a -g switch to the hardware design flow • Minimize or eliminate iterations through implementation flow
Caveats • Final timing of design will not be preserved • Critical path will definitely be increased,but 106 is a lot of headroom • Timing-driven implementation still needed once verification is complete • Significantly more FPGA capacity and memory will be needed • Acceptable for scalable BEE-like platforms and for modular, tiled algorithms
Essential Features of Environment • Robustly parameterized library components with soft configuration • Design exploration without tool iterations • Readily accessible variable contents • Reading and writing of any values by user • Complete user-driven control over process execution • Single-step, bursts, breakpoints, assertions
1: Parameterized Library • Number of bits • Saturate / Wrap • Binary point position • Microarchitecture • Library components provide configuration parameters as inputs, which can be set by variables • Allows runtime modification of function properties, including precision, range, and latency • Enables design-space exploration at hardware speed, plus correction of configuration errors without re-implementation
2: Data Management • Ability to dynamically observe any variable’s value at the user’s request • Ability to overwrite a variable’s value at runtime and continue operation • Ability to rewind system state within the bounds of buffer capacity
2: Data Management Requirements • Too expensive to re-implement the hardware to expose new data • All variables are streamed into local and off-chip storage, such as DRAM and disks • Unlike software, hardware is highly parallel, and often deeply pipelined • Memory requirements could be extreme • Can be offset by hierarchical memory architecture and/or periodic sampling
3: Process Control • Inherit the most useful features of software debuggers like GDB • Cycle-by-cycle (single-step) execution • Breakpoints (either state dependent, or fixed cycle count) • Implemented using multiple clock domains and clock buffer control • Already available for use on BEE2
Walkthrough: Design • Use specialized libraries to provide soft configuration • Integrates directly into the existing BEE2 tool flow
Walkthrough: Tagging • User tags signals of interest with debugging testpoints • Defines a variable name • Defines other parameters of interest for data observation • Also includes breakpoints and assertions
Walkthrough: Stitching • “Stitcher” updates the design before entering back-end tool flow • Inserts logic as needed for debug functions • Instantiates PowerPC core and master controller • Adds underlying connections to route data
Walkthrough: Runtime • User can monitor variables and control process execution from remote client • Embedded PowerPC software provides a thin service layer • Client is fully integrated with Matlab and Simulink input description
Control Architecture on BEE2 Control FPGA Network PPC ClockBufferLogic 100MHz User Defined (~1-10MHz) Single-step Clockdomains Breakpointinterrupt Control DRAM User FPGA Inserted Logic UserDesign
Stitching • Stitcher traverses the design hierarchy and: • Replaces debugging component placeholders with necessary logic • Creates a simple route from all variables to off-chip storage devices • During execution, the stitcher records: • A mapping between variable names and their physical variable unit in hardware • The latency within the variable routing network
Variable Control Unit (VCU) • Inserted in place of each variable block in design • Automatically implied for every state variable in a state machine • Combination of local buffers and off-chip DRAM • Exact memory allocation is subject to experimentation
Debug Controller (DC) • Interface between all variable and assertion instances, the runtime user shell, and process control “services” • Regulates the system clock both for exceptions and to prevent variable storage overflows
Future Work • Complete infrastructure for BEE2 • Extensive experiments with variable memory • Efficient methods for variable routing • Storage requirements and hierarchy • Time/Space tradeoffs for periodic sampling • Generalize framework to define concepts such as variable priorities, multiple debug levels, and extensions to text-based languages