RAMP Infrastructure

RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

RAMP: An infrastructure to build simulators using FPGAs

CPU CPU CPU CPU Target Model Interconnect Network DRAM Host Platform Run Target Model on Host Platform Hard Work

Reduce, Reuse, Recycle • Reduce effort to build target models • Users just build components, infrastructure handles connections (The RDL Compiler) • Reuse components by having good abstractions • Across different target models • Across different host platforms • XUP, Calinx, BEE2, BEE3, also Altera (see Greg) • Recycle existing IP for use as simulation models • Commercial processor RTL is its own model

Unit A Unit B Unit C Pipeline Channel FIFO Channel RAMP Target Models Units • Relatively large chunks of functionality • e.g., processor + L1 cache • User-written in some HDL or software Channels • Point-point, undirectional, two kinds: • FIFO channel: Flow-controlled interface • Pipeline channel: Simple shift register, bits drop off end • Generated by RAMP infrastructure

Target FIFO Channel Parameters • Need buffering of at least (Forward+Reverse) latency to get full bandwidth over link • RAMP infrastructure instantiates channel with desired parameters D D Datawidth RDY ENQ RDY Buffering DEQ Forward Latency Reverse Latency

Target Pipeline Channel Parameters • Only recommended for expert use in target models • (Should use FIFO channels and latency-insensitive protocols in target design) D D Datawidth Forward Latency

Unit A Unit B Unit C RAMP Description Language (RDL) Target: [ Greg Gibeling, UCB ] • User describes target model topology, channel parameters, and (manual) mapping to host platform FPGAs using RDL • RDL Compiler (RDLC) generates configurations Generated links carry channels RDLC Host: Unit B Generated Unit Wrappers Unit A Unit C FPGA2 FPGA1

Virtual Target Clock

Virtualized RTL Improves FPGA Resource Usage • RAMP allows units to run at varying target-host clock ratios to optimize area and overall performance • Example 1: Multiported register file • Example, Sun Niagara has 3 read ports and 2 write ports to 6KB of register storage • If RTL mapped directly, requires 48K flip-flops • Slow cycle time, large area • If mapping into block RAMs (one read+one write per cycle), takes 3 host cycles and 3x2KB block RAMs • Faster cycle time (~3X) and far less resources • Example 2: Large L2/L3 caches • Current FPGAs only have ~1MB of on-chip SRAM • Use on-chip SRAM to build cache of active piece of L2/L3 cache, stall target cycle if access misses and fetch data from off-chip DRAM

Start/Done Timing Interface Wrapper • Wrapper generated by RDL asserts “Start” on the physical FPGA cycle when the inputs to the unit are ready for the next target cycle • Unit asserts “Done” when it finishes the target cycle and its outputs are ready • Unit can take variable amount of time • Unvirtualized RTL unit can connect “Done” to “Start” (but must not clock until “Start”) Start In1 Unit Out In2 Done

Distributed Timing Models

Pipeline target channel implemented as distributed FIFO with at least L buffers Host: RDYs Start Start RDY Unit A Unit B D D ENQ DEQ Done Done DEQs Distributed Timing Example Unit B Unit A D Target: Latency L

Timing Target FIFO Channel • Can build timed credit-based flow control (CBFC) FIFO inside Target model, using pipeline channels for communicating data forwards and credits backwards • But this puts two CBFCs in series (one in target unit, one hidden in host implementation of pipeline channels) • RDL can generate a unified FIFO that merges both of these behind the FIFO interface Target: Latency L D D D D Credit control RDY RDY ENQ DEQ Credits

Other Automatically Generated Networks • Control network has workstation as master and every unit as slave device • Memory-mapped interface with block transfers • Used for initialization, stats gathering, debugging, and monitoring • Units can connect to DRAM resources outside of timed target channels • Used to support emulation and virtualization state • Units can communicate with each other outside of timed target channels • Support arbitrary communication. E.g., for distributed stats gathering

Wide Variety of RAMP Simulators

Simulator Design Choices • Structural Analog versus Highly Virtualized • Functional-only versus Functional+Timing • Timing via (virtual) RTL design versus separate functional and timing models • Hybrid software/hardware simulators We’re trying to build layers of abstractions that are useful to all types of simulator Also, trying to make modules in different styles inter-operate

Effective Abstractions Hide Details

…But Provide Inter-Operability

Work in Progress: Stay Tuned

RAMP Infrastructure

RAMP Infrastructure

Presentation Transcript

RAMP INSPECTIONS

RAMP

ADA Ramp Construction

Ramp Grinding

Ramp it up!

RAMP Gold Update

RAMP-White

RAMP Team

Chromaticity during ramp

RAMP Gold Wrap

RAMP Gold

2395 – bare ramp

Ramp

RAMP

Xilinx RAMP donations

ramp

ramp filter

RAMP-White

RAMP Blue Status

RAMP Infrastructure

Ramp Loading

Ramp Presentation