1 / 23

SCORE

S tream C omputations O rganized for R econfigurable E xecution. SCORE. Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS group André DeHon California Institute of Technology – Dept. Computer Science. http://brass.cs.berkeley.edu/SCORE/.

gates
Download Presentation

SCORE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StreamComputationsOrganized forReconfigurableExecution SCORE Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John WawrzynekUniversity of California, Berkeley – BRASS groupAndré DeHonCalifornia Institute of Technology – Dept. Computer Science http://brass.cs.berkeley.edu/SCORE/

  2. Goal: Software Survival • Software for microprocessors survives on new devices • Binary compatibility • Automatic improvement • Software for reconfigurable devices does not • Substantial effort to port/redeploy FPL 2000 (8/30/00)

  3. Outline • Problem: Software Survival • A New Compute Model • SCORE Components • Preliminary Results • Future Work FPL 2000 (8/30/00)

  4. Why Can’t Reconfig. Software Survive? • Resource constraints/sizes are exposed: • to programmer • in low-level representation (netlist) • Design revolves around device size • Algorithmic structure • Exploited parallelism FPL 2000 (8/30/00)

  5. The SCORE Approach • A compute model with unbounded resources • Efficient hardware virtualization • Demand paging FPL 2000 (8/30/00)

  6. Page-Compatible Devices • Family of devices with: • Common page definition • Varying number of pages • Binary Compatibility • Automatic Performance Improvement FPL 2000 (8/30/00)

  7. Page Execution Execute time Reconfigure Virtualizing a Netlist (is bad) • Netlist is sensitive to timing • Disallow asynchronous features (e.g. busses) • Synchronous • WASMII [Ling+Amano, FCCM ’93] • Page I/O via registers • Execute each cycle of every page • Hugereconfigurationoverhead! FPL 2000 (8/30/00)

  8. Previous Attempts at Virtualization • Multi-context • DPGA [DeHon, FPGA ‘94] • TM-FPGA [Xilinx, FCCM ‘97] • Configuration Cache • Striped • PipeRench [CMU, FPGA ’98] • Pipelined reconfiguration • Restricted to feed-forward pipelines FPL 2000 (8/30/00)

  9. Stream is: • Unidirectional page-to-page link • FIFO queue of data tokens • Unbounded depth Streams • Goal • Less frequent reconfiguration • Batch process block of inputs • Amortize reconfiguration cost over large data set FPL 2000 (8/30/00)

  10. Stream Implementation • Only one endpoint (page) loaded • Stream = memory buffer • Desire distributed, on-chip memory • Both endpoints (pages) loaded • Stream = wire FPL 2000 (8/30/00)

  11. DCT Zig-zag DCT Zig-Zag Quantize / ZLE Quantize / ZLE HuffmanEnc. Huffman Enc. Execution Example: Spatial FPL 2000 (8/30/00)

  12. Quant / ZLE Huffman Enc. DCT Zig-zag Execution Example: Time-Multiplexed FPL 2000 (8/30/00)

  13. Graph-based Compute Model Scheduler Run-time Support Hardware Support SCORE Components FPL 2000 (8/30/00)

  14. SCORE Compute Model • Computation = graph of compute nodes • Concretely: compute pages • Abstractly: operators with local state (FSM) • Communication = streaming data flow • Storage = • Streams • Memory segments,accessed through streams FPL 2000 (8/30/00)

  15. SCORE Hardware Model • Paged FPGA • Compute Page (CP) • Fixed-size slice of RC hardware • Fixed number of I/O ports • Distributed, on-chip memory • Configurable Memory Block (CMB) • Stream access • High-level interconnect • Microprocessor • Run-time support + user code FPL 2000 (8/30/00)

  16. SCORE Run-Time Support • Mechanics of run-time reconfiguration • Page swap [context save/load] • Reconfigure interconnect • Page Scheduling • Which page to run where, when • Static … Dynamic FPL 2000 (8/30/00)

  17. .25: 12.9mm2 (1/9 of PII-450) .18: 6.7mm2 (1/16 of PIII-600) Functional Simulation • FPGA based on HSRA [Berkeley, FPGA ’99] • CP: 512 4-LUTs • CMB: 2Mbit DRAM • Area for CP-CMB pair: • Page reconfiguration: 5000 cycles (from CMB) • Synchronous operation (same clock speed as processor) • x86 microprocessor • Page Scheduler task • Swap on timer interrupt (every 250,000 cycles) • Fully dynamic scheduling FPL 2000 (8/30/00)

  18. Application Pages Segments JPEG Encode 13 6 Decode 13 4 MPEG Encode 45 102 Wavelet Encode 14 6 Decode 15 6 Applications • Multimedia processing applications • Hand-partitioned into 512-LUT pages • Good applications • Primarily feed-forward (feedback loops fit in HW) • Bad applications • Large, tight feedback loops (e.g. ADPCM) FPL 2000 (8/30/00)

  19. Application: JPEG Encode FPL 2000 (8/30/00)

  20. Scaling Results: JPEG Encode Total Time (Makespan in millions of cycles) Physical Compute Pages FPL 2000 (8/30/00)

  21. Summary • SCORE enables software survival on reconfigurable systems • Binary compatibility • Automatic performance scaling • Virtual Hardware • Requirements: • Graph-based compute model • Paged FPGA hardware • Run-time support for RTR/Scheduling FPL 2000 (8/30/00)

  22. Future Work • Compilation/CAD • Partitioning FSM operators into pages • Study architectural parameters • Page size • CMB size • Tolerable reconfiguration time • Scheduling • Static scheduling FPL 2000 (8/30/00)

  23. More Info on the Web • SCORE project: • http://brass.cs.berkeley.edu/SCORE/ • Tutorial: • http://brass.cs.berkeley.edu/documents/ score_tutorial.html FPL 2000 (8/30/00)

More Related