160 likes | 399 Views
Configurable Soft Processor Arrays Using the OpenFire Processor. Stephen Craven Cameron Patterson Peter Athanas Configurable Computing Lab Virginia Tech. Outline. Motivation Single Chip Multi-Processors Application-Specific Instruction set Processors OpenFire Processor
E N D
Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing Lab Virginia Tech Craven
Outline • Motivation • Single Chip Multi-Processors • Application-Specific Instruction set Processors • OpenFire Processor • Features and Configurability • Performance • Configurable Array Example: Median Image Filtering • Optimizations • Performance Comparisons Craven
Motivation: SCMP • Moving towards Single Chip Multi-Processors (SCMP) because: • Underutilized silicon budget • Diminishing ROI on Instruction Level Parallelism • Design and verification too costly • SCMPs more energy efficient • SCMPs can leverage existing IP • SCMPs by nature are easily scalable • Fast, on-chip inter-processor communication • SCMP is fashionable (Cell, Pentium D, Athlon x2) • Hard and soft processors in Xilinx and Altera FPGAs Craven
Motivation: ASIP • Application-Specific Instruction set Processors (ASIP) allow: • Optimum match of instruction set to application • Performance benefits approaching ASICs while retaining programmability • Architectural features customized to application • Datapath width sizing • Memory and cache hierarchy tuning • Available commercially through Tensilica • Complete design flows and generated custom toolsets • $$$ • Academic/Research use through ASIPMeister • Closed source • GUI Only Craven
Motivation: Configurable Arrays • Merging SCMP with ASIP combines benefits of both: • Reduced design time utilizing existing IP • Programmability of SCMP with performance improvements of ASIP • FPGAs ideal platform for configurable array research and implementation • Rapid prototyping • Mature tool chains • Xilinx and Altera offer devices with embedded processing cores (PPC and ARM) Craven
OpenFire • Configurable 32-bit RISC processor • Specialized for processor arrays • Instructions based on Xilinx MicroBlaze • Uses MicroBlaze tool chain (mb-gcc, XPS, etc.) • Can execute subset of MicroBlaze code without modification • All MicroBlaze instructions supported except for division, barrel shifting, and status register and cache related instructions • Not burdened by features unused in arrays (interrupts, exceptions, caches, interfaces) • Open source • Released under MIT license • Support utilities provided (C simulator, BRAM loaders, etc.) • Differs from previously available MicroBlaze clone aeMB: • Works correctly and extensively documented Craven
Performance • Cycle accurate with MicroBlaze except for: • Multiply has 5 cycle latency (3 for MicroBlaze) • Single cycle instruction fetches (2 cycles for MicroBlaze) • 100 MHz on a Xilinx Virtex II-Pro 30 speed grade 6 OpenFire 641 slices 58.47 DMIPS MicroBlaze 734 slices 58.98 DMIPS* • Performance variable depending on configuration: • 16-bit datapath implementation reduces area to 402 slices, speed increases to 106 MHz * Minimal MicroBlaze implementation (no OPB, division unit, barrel shifter, or cache) at 100 MHz Craven
ALU PC Register File 32x32 Mult* Add PC MSB Imm Data Mem Bit Fns Compare Extensibility • Additional instructions, including multicycle operations, can be easily added inside ALU without affecting critical path • Potential for at least 10 new 2-operand instructions in instruction space Craven
Extensibility • OpenFire datapath customizable from 32-bits downwards • Instructions are constant 32-bits wide • Custom datapath widths limit program size • Program Counter is treated same as any data word • 8-bit datapath => 64 instruction program • 16-bit datapath => 16,384 instruction program • Planned extensions include: • Increasing number of Fast Simplex Link (FSL) bus I/Os • Fast ALU-to-FSL and FSL-to-ALU operations • Additional debugging capabilities Craven
Case Study: Image Filtering • 3x3 Median Image Filter written in C • Soft Processor Arrays created • Master node – MicroBlaze with DDR SDRAM • Slave nodes – OpenFires connected in ring network with master Craven
Array Creation Process • Automated flow for array creation • Edit DEFINE.V to set processor parameters • Create C code for master MicroBlaze and slave OpenFires • Verification of C code available through XMD simulator and simple OpenFire C simulator • Makefile-based flow automatically: • Creates ring network of desired size • Compiles programs and initializes BRAMs • Runs the EDK tool flow to generate a bitstream • FSL debugging bus on the OpenFire provides observablity to the processor during operation Craven
Array Results • Slave processor area reduced 45% by downsizing datapath to 16-bits • Required only slight modifications to original C code • Allows more OpenFires on chip, increasing throughput • Near-linear speedup with increasing array size Craven
Future Directions • Research goal: Automated flow for creating optimized heterogeneous arrays of soft processors • Input – Parallel HLL description of application • Optimizations: datapath sizing, instruction removal / addition, dual-issue processor cores, alu-to-network & network-to-alu operations, microcode controller, full datapath implementations • Optimization objective: Maximize array throughput by • Increasing individual node throughput • Reducing area to add additional nodes Craven
Conclusion • Configurable soft processor arrays offer the best of SCMPs and ASIPs • Simplified design • Improved performance • OpenFire processor designed for use in processor arrays • Excellent performance / area • Highly configurable • Datapath width adjustment can produce noticeable performance improvement Craven
References • OpenFire source code and utilities: http://www.ccm.ece.vt.edu/~scraven/ • James-Roxby, P., Schumacher, P., and Ross, C. “A Single Program Multiple Data Parallel Processing Platform for FPGAs,” FCCM’04 Craven