1 / 27

Application-Specific Customization of FPGA Soft-core Processors

Journal Paper Presentation. Application-Specific Customization of FPGA Soft-core Processors. Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi Course: ENGG 6090*6 – Winter07 Date: Apr. 5 th , 2007. Outlines. Introduction. Parameterized Soft-cores.

krista
Download Presentation

Application-Specific Customization of FPGA Soft-core Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Journal Paper Presentation Application-Specific Customization of FPGA Soft-core Processors Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi Course: ENGG 6090*6 – Winter07 Date: Apr. 5th, 2007

  2. Outlines • Introduction. • Parameterized Soft-cores. • Micro-architectural Trade-offs and ISA Sub-setting. • Fast Application-specific Customization. • Conclusion.

  3. Resources • P. Yiannacouras, J. Steffan and J. Rose, “Exploration and Customization of FPGA-Based Soft Processors” in IEEE Transactions on Computer-aided Design of integrated Circuits and Systems, Vol. 26, NO. 2, Feb. 2007. • D. Sheldon, R. Kumar, R. Lysecky, F. Vahid and D. Tullsen, “Application-Specific Customization of Parameterized FPGA Soft-Core Processors” in IEEE/ACM Int. Conf. on Computer-Aided Deisgn, Nov. 2006.

  4. Soft-core vs. Hard-core • A hard-core processor is laid out on the chip next to the FPGA’s configurable logic fabric • A soft-core processor is synthesized onto the FPGA’s fabric, just like any other circuit. • soft-core processors advantages: • Utilizing standard mass-produced • Enabling a custom number of microprocessors • Soft-core processors disadvantages: • Reduced processor performance • Higher power consumption • Larger size.

  5. Commercial Soft-cores • Xilinx MicroBlaze • A 32-bit soft-core processor. • A single-issue in order execution processor. • Configurable to five components: multiplier, barrel shifter, divider, floating-point unit (FPU), and data cache. • Altera Nios II. It has three mostly unparameterized variations: • Nios II/e, a small unpipelined 6 cycles per instruction (CPI) processor with serial shifter and software multiplication; • Nios II/s, a five-stage pipeline with multiplier-based barrel shifter, hardware multiplication, and instruction cache • Nios II/f, a large six-stage pipeline with dynamic branch prediction, and instruction and data caches.

  6. Parameterized Soft-cores • Configurability. • Application Specific. • Size, performance and power constraints. • Configurable Parameters: • Instantiating Functional Units (0,1). • Unit-Specific Parameters (Cache type/size). • Instruction Set Architecture. • Pipelining (Depth).

  7. Exploration and Customization of FPGA-Based Soft Processors • Exploration of the micro-architectural tradeoffs for soft processors • A set of customization techniques: • Tuning the micro-architecture to the application. • Subsetting the ISA • Hybrid approach • To improve the performance/area of a soft processor for a specific application. • A CAD Tool.

  8. Approach • Developing a customization tool that will generate the most customized soft-core. • SPREE (soft-processor rapid exploration environment). • Targeting functional unit customization and ISA subsetting.

  9. SPREE • Input: Textual Description (ISA& Datapath). • ISA & datapath verification. • Constructing the Datapath. • Control Generation. • Synthesizable RTL (Verilog)

  10. Framework • Altera Startix I. • Comparison with Nios-II variations (e, s and f) • MIPS Instructtion Set. • Performance Metrics • Area in LE • Performance in MIPS • Efficiency in MIPS/LE • Equal weight for performance and area • Benchmark • 20 varied applications (fir, FFT, DES, CRC, QSORT, Bubble-sort)

  11. SPREE vs. Nios

  12. Micro-architecture Exploration (1) • Functional Units • Shifter Implementation (serial, shared multiplier) • Multiplication (SW, HW).

  13. Micro-architecture Exploration (2) • Pipelining • Depth • Organization

  14. Micro-architecture Customization • 6 micro-architectural axes • Exhaustive search for the generated solutions.

  15. ISA Subsetting • Eliminate the unused instruction • Simplify Control Unit  Reduce Area • Less than 50% utilization of the ISA.

  16. Impact of ISA subsetting Impact on Performance Impact on Area

  17. Results • Fine Customization Environment • an improvement in performance per area of 14.1% on average across all benchmarks. • Combined approach improved the performance per area by 24.5% on average across all applications.

  18. Application-Specific Customization of Parameterized FPGA Soft-Core Processors • A methodology for fast application-specific customization of a parameterized FPGA soft core. • Targeting 1-2 hours Runtime • Near-optimal Results • Traditional CAD with 0-1 Knapsack Algorithm • Synthesis-in-the-loop exploration.

  19. Framework • Xilinx MB on Virtex-II Pro FPGA • Comparison with Base and Full MB • Performance Metrics • Area in equivalent LUTs • Performance by the application runtime in (ms) • Benchmark • 11 applications from EEMBC

  20. Justification

  21. Approach-1 • Traditional CAD Approach • 0-1 knapsack problem • Maximize performance • Constraint on area • 6 synthesis/execution runs

  22. Approach-2 • Synthesis-in-the-loop • pre-determines the impact each parameter individually has on design metrics • then search the parameters in sequence, ordered from highest impact to lowest. • Two orders (fixed-ordered and impact-ordered)

  23. Results • Exhaustive search took 11 hours. • The fixed impact-ordered tree approach had the fastest runtime of 108 minutes. • Knapsack algorithm with similar results to the fixed impact-ordered tree approach. • Similar results for 50% constraint. No Constraint Fixed 80% constraint Per application 80% constraint

  24. Results • Reimplementation on Spartan2 FPGA • 1.5 hours runtime for the fixed-order impact-ordered tree • 200 minutes for the application-specific impact-ordered tree

  25. Scalability • Increasing the number of parameters • Increase the runtime. • Fixed-order impact-ordered tree and knapsack scale well.

  26. Conclusion • Impact of customization on performance and area. • Emphasis on performance. • Customizable parameters span the micro-architecture and the ISA. • Use of near-optimal solutions to save on runtime. • Possibility to look for finer customization, but scalability have to be addressed. • Finer customization might consider 0-1 parameters or multi-valued parameters.

  27. THANK YOU Q&A

More Related