1 / 15

Configurable Soft Processor Arrays Using the OpenFire Processor

Configurable Soft Processor Arrays Using the OpenFire Processor. Stephen Craven Cameron Patterson Peter Athanas Configurable Computing Lab Virginia Tech. Outline. Motivation Single Chip Multi-Processors Application-Specific Instruction set Processors OpenFire Processor

Download Presentation

Configurable Soft Processor Arrays Using the OpenFire Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing Lab Virginia Tech Craven

  2. Outline • Motivation • Single Chip Multi-Processors • Application-Specific Instruction set Processors • OpenFire Processor • Features and Configurability • Performance • Configurable Array Example: Median Image Filtering • Optimizations • Performance Comparisons Craven

  3. Motivation: SCMP • Moving towards Single Chip Multi-Processors (SCMP) because: • Underutilized silicon budget • Diminishing ROI on Instruction Level Parallelism • Design and verification too costly • SCMPs more energy efficient • SCMPs can leverage existing IP • SCMPs by nature are easily scalable • Fast, on-chip inter-processor communication • SCMP is fashionable (Cell, Pentium D, Athlon x2) • Hard and soft processors in Xilinx and Altera FPGAs Craven

  4. Motivation: ASIP • Application-Specific Instruction set Processors (ASIP) allow: • Optimum match of instruction set to application • Performance benefits approaching ASICs while retaining programmability • Architectural features customized to application • Datapath width sizing • Memory and cache hierarchy tuning • Available commercially through Tensilica • Complete design flows and generated custom toolsets • $$$ • Academic/Research use through ASIPMeister • Closed source • GUI Only Craven

  5. Motivation: Configurable Arrays • Merging SCMP with ASIP combines benefits of both: • Reduced design time utilizing existing IP • Programmability of SCMP with performance improvements of ASIP • FPGAs ideal platform for configurable array research and implementation • Rapid prototyping • Mature tool chains • Xilinx and Altera offer devices with embedded processing cores (PPC and ARM) Craven

  6. OpenFire • Configurable 32-bit RISC processor • Specialized for processor arrays • Instructions based on Xilinx MicroBlaze • Uses MicroBlaze tool chain (mb-gcc, XPS, etc.) • Can execute subset of MicroBlaze code without modification • All MicroBlaze instructions supported except for division, barrel shifting, and status register and cache related instructions • Not burdened by features unused in arrays (interrupts, exceptions, caches, interfaces) • Open source • Released under MIT license • Support utilities provided (C simulator, BRAM loaders, etc.) • Differs from previously available MicroBlaze clone aeMB: • Works correctly and extensively documented Craven

  7. Performance • Cycle accurate with MicroBlaze except for: • Multiply has 5 cycle latency (3 for MicroBlaze) • Single cycle instruction fetches (2 cycles for MicroBlaze) • 100 MHz on a Xilinx Virtex II-Pro 30 speed grade 6 OpenFire 641 slices 58.47 DMIPS MicroBlaze 734 slices 58.98 DMIPS* • Performance variable depending on configuration: • 16-bit datapath implementation reduces area to 402 slices, speed increases to 106 MHz * Minimal MicroBlaze implementation (no OPB, division unit, barrel shifter, or cache) at 100 MHz Craven

  8. ALU PC Register File 32x32 Mult* Add PC MSB Imm Data Mem Bit Fns Compare Extensibility • Additional instructions, including multicycle operations, can be easily added inside ALU without affecting critical path • Potential for at least 10 new 2-operand instructions in instruction space Craven

  9. Extensibility • OpenFire datapath customizable from 32-bits downwards • Instructions are constant 32-bits wide • Custom datapath widths limit program size • Program Counter is treated same as any data word • 8-bit datapath => 64 instruction program • 16-bit datapath => 16,384 instruction program • Planned extensions include: • Increasing number of Fast Simplex Link (FSL) bus I/Os • Fast ALU-to-FSL and FSL-to-ALU operations • Additional debugging capabilities Craven

  10. Case Study: Image Filtering • 3x3 Median Image Filter written in C • Soft Processor Arrays created • Master node – MicroBlaze with DDR SDRAM • Slave nodes – OpenFires connected in ring network with master Craven

  11. Array Creation Process • Automated flow for array creation • Edit DEFINE.V to set processor parameters • Create C code for master MicroBlaze and slave OpenFires • Verification of C code available through XMD simulator and simple OpenFire C simulator • Makefile-based flow automatically: • Creates ring network of desired size • Compiles programs and initializes BRAMs • Runs the EDK tool flow to generate a bitstream • FSL debugging bus on the OpenFire provides observablity to the processor during operation Craven

  12. Array Results • Slave processor area reduced 45% by downsizing datapath to 16-bits • Required only slight modifications to original C code • Allows more OpenFires on chip, increasing throughput • Near-linear speedup with increasing array size Craven

  13. Future Directions • Research goal: Automated flow for creating optimized heterogeneous arrays of soft processors • Input – Parallel HLL description of application • Optimizations: datapath sizing, instruction removal / addition, dual-issue processor cores, alu-to-network & network-to-alu operations, microcode controller, full datapath implementations • Optimization objective: Maximize array throughput by • Increasing individual node throughput • Reducing area to add additional nodes Craven

  14. Conclusion • Configurable soft processor arrays offer the best of SCMPs and ASIPs • Simplified design • Improved performance • OpenFire processor designed for use in processor arrays • Excellent performance / area • Highly configurable • Datapath width adjustment can produce noticeable performance improvement Craven

  15. References • OpenFire source code and utilities: http://www.ccm.ece.vt.edu/~scraven/ • James-Roxby, P., Schumacher, P., and Ross, C. “A Single Program Multiple Data Parallel Processing Platform for FPGAs,” FCCM’04 Craven

More Related