Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling

Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling Seonil Choi, Ronald Scrofano, and Viktor K. Prasanna University of Southern California MAPLD 2002, September, 2002 funded by the DARPA Power-aware Computing and Communications program

Outline • Motivation • Design Methodology • Example Matrix Multiplication Designs • Results • MILAN

FPGAs: Current Trends • Large FPGAs (40M+ gates) • Embedded multipliers, processors • Military and commercial systems using FPGAs • Digital Signal Processing: matrix operations, FFT, window operations, filtering • Image processing • Internet • Performance metrics • Energy, Latency, and Area

Mapping Kernel Applicationsonto FPGAs • FPGAs lack a fixed structure comparable to that of general purpose processors • Too fine-grained to model at a high level • Very large design space • Many degrees of freedom • Cannot simulate all designs at a low level • Energy-efficient designs • Analyze efficiency early in design cycle • Analyze effect of algorithm changes • Consider energy efficiency vs. area and latency • Energy consumed by configurable logic blocks and routing • Look-up tables, flip flops, registers, RAM • Various length interconnects

Kernel Application Energy-Efficient Design Design Methodology: Overview • Use domain-specific modeling • Explore the design space at a high level • Verify chosen designs at a low level • Select a set of designs 1. Domain Selection 2. Domain-Specific Modeling 3. Tradeoff Analysis and Manual Design Space Exploration 4. Low-Level Simulation of Candidate Designs

Architecture FPGA Domain • A family of architectures and algorithms for a given kernel application • E.g. matrix multiplication on a linear array • Fixes architecture of FPGA • FPGA too fine-grained to model at high-level • No fixed structure comparable to that of a general purpose processor • Difficult to model at a high level • Domain imposes high-levelstructure • Facilitates high-levelmodeling and highlevel performance analysis

1. Domain Selection • Choose domains by analyzing algorithms and architectures for a given kernel • Tradeoffs in Energy, Area, Latency Kernel Various Architecture Families Domain 1 Domain 2 Domain n Domain Specific Modeling Domain Specific Modeling Domain Specific Modeling . . . System-wide Energy Function System-wide Energy Function System-wide Energy Function Design Space Exploration, Optimizations Design Space Exploration, Optimizations Design Space Exploration, Optimizations

2. Domain-Specific Modeling (1) • High-level model • Model parameters are specific to the domain • Identify only those parameters that make a significant impact on energy consumption • Others need not be studied • Design is abstracted to allow easier (but coarse) tradeoff analysis and design space exploration • Benefit: Rapid evaluation of architectures and algorithms without low-level simulation • Identify candidate designs that meet requirements Domain-Specific Model (parameterized) Domain (fixed architecture) FPGA (flexible architecture)

Domain-Specific Modeling (2) Domain Components RModules Interconnects Component specific parameters (n, pe, f, sa) Function Estimation Component specific power function Component power state matrices System-wide energy function Specific design in the domain System-wide energy

Architecture, parameters with ranges of a component VHDL code for sample designs MILAN Model Interpreters Low-level Simulators (XPower, ModelSim,…) Component specific power function Power function builder (curve fitting …) Power estimates Estimation of Power Functions (3) • Using sample implementations • VHDL coding, simulation, measuring power using XPower • Estimation method • Generate random input vectors for estimation • Repeat experiments for statistical significance

Xilinx XST Synthesis Waveforms Component VHDL VHDL File Netlist Xilinx Place&Route ModelSim .ncdVHDL .ncd file .vcd file XPower Power Low-Level Simulation of Components (4) • Accurate power estimates for RModules and Interconnects • Randomly generated test input waveforms • Switching activity is a consideration • Results can be reused

3. Tradeoff Analysis and Manual Design Space Exploration • Vary model parameters to see the effect on performance. • Analyze tradeoffs • Weed out designs that are not promising

4. Low Level Simulation of Candidate Designs • Verify high-level estimation of energy and area for a design • Select the best design within the range of the estimation error among candidate designs • Similar to low-level simulation of components Xilinx XST Synthesis Candidate Designs Waveforms VHDL VHDL File Netlist Xilinx Place&Route Modelsim .ncdVHDL .ncd file .vcd file XPower Energy

Example Problem: Matrix Multiplication • Multiply two n  n matrices as efficiently, in terms of energy, as possible • No hard area or latency constraints • Area and latency considered, but no specific constraints • Why matrix multiplication? • Fundamental to many applications in DSP • LU Decomposition • CFAR detection requires matrix-vector multiplication

[xilinx.com] Optimized Design from Xilinx • Provides baseline for comparison • 3  3 block matrix multiplication • Low area

FPGA PE Matrix Entries Cache MAC Design 1: Uniprocessor Architecture • Same area as Xilinx design • Block matrix multiplication • Single processing element (PE) • Model Parameters: cache size, precision, power states

Design 2: Linear Array Architecture • Low latency • Array of processing elements (PEs) • Model parameters: number of PEs, precision, power states

High-Level Comparisons (1)

High-Level Comparisons (2)

Experimental Procedure for Low-Level Simulations • Code designs in VHDL • Synthesize • Place and route • Simulate with test input waveforms • Measure power dissipation with XPower

Linear Array Difference Xilinx Low-Level Simulation for Statistical Analysis of Energy Dissipation • Dependency of energy on input data switching activity • Simulation for statistical significance • 50 randomly generated sets of input matrices • Comparison with Xilinx design for 3x3 matrix multiplication • Confidence intervals give range around experimental value in which, with given confidence level, true value lies • With 95% confidence, our design consumes 32% less energy compared to the Xilinx design

Xilinx ISE4.1i and XPower are used to measure the system-wide energy Xilinx Virtex-II XC2V1500 device is used Accuracy of the High-Level Model

MILAN Objectives MILAN is a model-based, extensiblesimulation framework It provides a unified environment capable of: • modeling a large class of embedded systems and applications • driving design space exploration tools for rapid evaluation ofa large design space • seamlessly integrating different widely-used simulators into a single framework for hierarchical simulation • enabling rapid evaluation of different performance metrics such as energy, latency, and throughput

The MILAN Architecture

Design Space Application Model Resource Model Generic Modeling Environment (GME 2000) Design Space Exploration (analytical technique) Constraints Offline Estimates High-level Perf. Estimator Identify a set of designs Instruction Level Simulator Cycle Accurate Simulator RT-level Simulator Final Design Accuracy Level of abstraction Hierarchical Simulation Design Flow Using MILAN Application (Task Graph) Hardware Resources

Concluding Remarks • Design methodology based on domain-specific modeling • High-level energy estimation • High-level tradeoff analysis • Matrix multiplication example • Improved energy-efficiency compared to Xilinx design (baseline) • Further References • “Energy-Efficient Matrix Multiplication on FPGAs” (FPL 2002) • “Energy Efficiency of FPGAs and Programmable Processors for Matrix Multiplication” (Manuscript)

Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling

Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling

Presentation Transcript

Domain-Specific Modeling Languages and Generators - Examples

Domain-specific Modeling as an Enabling Technology for SMEs

How domain specific are Domain Specific Languages?

On the Combination of Domain Specific Modeling Languages

A Case for Energy Efficient Design

OOPSLA Workshop on Domain - Specific Modeling

Toward a Semantic Anchoring Infrastructure for Domain-Specific Modeling Languages

Domain Specific Language

UDM An Infrastructure for Implementing Domain-Specific Modeling Languages

Design of Kernel APIs for Road Network Applications

The Power of Communication: Energy-Efficient NoCs for FPGAs

Visualization of scientific data - Domain-specific applications

Domain Specific Languages

Collected Experiences of Defining Domain-Specific Modeling Languages

Energy Efficient Buildings Design

Benefits Of Energy Efficient MEP Design

Domain Specific Models

Domain Specific Languages

Energy Efficient Design

On the role of Domain Ontologies in the design of Domain-Specific Visual Modeling Languages

OOPSLA Workshop on Domain - Specific Modeling