1 / 19

Application-Specific Customization of Soft Processor Microarchitecture

Application-Specific Customization of Soft Processor Microarchitecture. Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Edward S. Rogers Sr. Department of Electrical and Computer Engineering. Processors and FPGA Systems. We seek improvement through customization.

zagiri
Download Presentation

Application-Specific Customization of Soft Processor Microarchitecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Edward S. Rogers Sr. Department of Electrical and Computer Engineering

  2. Processors and FPGA Systems We seek improvement through customization • Processors lie at the “heart” of FPGA systems UART Soft Processor Memory Interface Custom Logic Ethernet • Performs coordination and even computation • Better processors => less hardware to design

  3. Motivating Application-Specific Customizations of Soft Processors We want to evaluate effectiveness of specialization • FPGA Configurability • Can consider unlimited processor variants • A soft processor might be used to run either: • A single application • A single class of applications • Many applications, but can be reconfigured • Applications differ in architectural requirements • Can specialize architecture for each application

  4. Research Goals Measure efficiency gained through real implementations • To investigate • The potential for “Application-tuning” • Tune processor microarchitecture to favour an application • Preserve general purpose functionality • “Instruction-set Subsetting” • Sacrifice general purpose functionality • Eliminate hardware not required by application • Combination of both methods

  5. SPREE System(Soft Processor Rapid Exploration Environment) Processor Description ISA Datapath SPREE • Input: Processor description • Made of hand-coded components • SPREE System • Verify ISA against datapath • Datapath Instantiation • Control Generation • Multi-cycle/variable-cycle FUs • Multiplexer select signals • Interlocking • Branch handling RTL • Output: Synthesizable Verilog

  6. Back-End Infrastructure RTL We can measure area/performance/energy accurately Benchmarks (MiBench, Dhrystone 2.1, RATES, XiRisc) Modelsim RTL Simulator Quartus II 4.2 CAD Software Stratix 1S40C5 • Cycle Count 2. Resource Usage 3. Clock Frequency 4. Power

  7. Comparison to Altera’s Nios II • Has three variations: • Nios II/e – unpipelined, no HW multiplier • Nios II/s – 5-stage, with HW multiplier • Nios II/f – 6-stage, dynamic branch prediction

  8. Architectural Parameters Used in SPREE We focus on core microarchitecture • Multiplication Support • Hardware FU or software routine • Shifter implementation • Flipflops, multiplier, or LUTs • Pipelining • Depth • (2-7 stages) • Organization • Forwarding

  9. SPREE vs Nios II • 3-stage pipe • HW multiply • Multiply-based • shifter faster smaller

  10. Exploration of Soft Processor Architectural Customizations • Architectural-tuning • Instruction-set subsetting • Combination (Arch-tuning + Subsetting)

  11. 1. Architectural Tuning Experiment • Vary the same parameters • Multiplication Support • Shifter implementation • Pipelining • Determine • Best overall (general purpose) processor • Best per application (application-tuned) • Metric: Performance per Area (MIPS/LE) • Basically inverse of Area-Delay product

  12. Performance per Area of All Processors 32% 14.1%

  13. 2. Instruction-set Subsetting • SPREE automatically removes • Unused connections • Unused components • Reduce processor by reducing the ISA • Can create application-specific processor • Eliminate unused parts of the ISA

  14. Instruction-set Usage of Benchmarks Strong potential for hardware reduction • Applications do not use complete ISA

  15. Area Reduction from Subsetting Area reduced by 60% in some Similar reductions for energy, small impact on performance 23% Fraction of Area , 23% on average

  16. 3. Combining Application Tuning and Instruction-set Subsetting • Subsetting is effective on its own • Can apply subsetting on top of tuning • Compare different customization methods • Tuning • Subsetting • Tuning + Subsetting

  17. Combining Application Tuning and Instruction-set Subsetting Tuning reduces the waste that subsetting eliminates 25% 16% 14%

  18. Summary of Presented Architectural Conclusions • Application tuning • 14% average efficiency gain • Will increase with more architectural axes • Instruction-set Subsetting • Up to 60% area & energy savings • 16% average efficiency gain • Combined Tuning & Subsetting • 25% average efficiency gain

  19. Future Work • Consider other promising architectural axes • Branch prediction, aggressive forwarding • ISA changes • Datapaths (eg. VLIW) • Caches and memory hierarchy • Compiler assistance • Can improve tuning & subsetting

More Related