David Sheldon, Frank Vahid * Department of Computer Science and Engineering

Making Good Points: Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine

Parameterized Component: Cache Line Concatenation W1 16 bytes 40% avg savings 4 physical lines filled when line size is 32 bytes bus Counter Off Chip Memory [Zhang/Vahid/Najjar, ISCA 2003, ISVLSI 2003, TECS 2005]

FPGA Systems are Often Built from Parameterized Components • Parameterized components include: • Cache (e.g., size, associatively, line size) • Processors • Co-processors • Buses (e.g., bit width, network-on-chip structure) Cache config config uP config MPEG Enc Bus config config DSP FPGA David Sheldon, UC Riverside

Microblaze Soft-Core Processor – Design Space due to Parameters Pareto points: Points where no point exists that is better in all metrics. • 520 points • Over 10 days • ~35 min per point • <1 min to execute • Remaining time was in synthesis and place and route Cycles Equivalent LUTs

Designer A Pareto Points Differ Per Application and Per Criteria App a1 Energy c1 c2 c3 Pareto points Platform (a) Time ... c1 c2 c3 c3 c1 App a2 Energy c2 Time (b) Designer B David Sheldon, UC Riverside

Previous Work: Parameter Interdependency graph Platune’s Architecture • Platune [Givargis/Vahid 2002]: • Introduced parameter interdependency graph • Edges – parameters are dependent • Nodes not connected – independent • Search dependent parameters exhaustively; compose local Pareto points into global points • Greatly reduces search space if independent parameters • Good results, 44 hours • Randomized Approaches • Pareto Simulated Annealing (PSA) [Talarico 2006] • Good results, 6 hours • Genetic Algorithms [Ascia 2005] • Good results, 4 hours Supply Voltage code code a code $-MEM Bus size assoc. CPU–I$ Bus MEM MIPS I$ D$ size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside

Our Approach • We developed • Design-of-Experiments (DoE)-based technique to automatically generate a parameter interdependency graph • Relieves designer of burden • Technique to generate Pareto-points via parameter interdependency graph edge-weight-basedalgorithm • Improve speed versus Platune • Called DoE-Based Pareto-Point Generator (DPG) Performance Time David Sheldon, UC Riverside

Design of Experiments (DoE) $-MEM Bus code a code code • DoE generates a set of orthogonal experiments that allows for statistical analysis of the search space Bi Bi size assoc. 2k CPU–I$ Bus Bi MEM 8 MIPS I$ Supply Voltage 4.1 D$ size assoc. linesize 8k CPU–D$ Bus 8 32 Supply Voltage m-i$ a code $-m code m-i$ code d$ line i$ assoc d$ size d$ assoc i$ size David Sheldon, UC Riverside

DPG Algorithm • Subsequent DoE analysis determines main effects of parameters Supply Voltage code code a code $-MEM Bus size assoc. Supply Voltage CPU–I$ Bus MEM MIPS I$ m-i$ a code $-m code m-i$ code i$ assoc d$ line D$ i$ size d$ assoc d$ size size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside

DPG Algorithm (cont.) • Compute weight of each pair of nodes • Sort edges in decreasing weight • DK, (I$ assoc, CPU-I$ address code) • DI, (I$ assoc, CPU I$ code) • IK, (CPU-I$ code, CPU I$ address code) • IQ, (CPU-I$ code, $-MEM address code) • KQ, (CPU I$ address code, $-MEM address code) • ... Supply Voltage code code a code $-MEM Bus size assoc. CPU–I$ Bus MEM MIPS I$ D$ size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside

DPG Algorithm (cont.) • Pair wise merge of nodes • Creates a sparse set of Pareto points • The designer can direct the tool to fill in the regions of interest Original Pareto points Energy Filled in Pareto points Time David Sheldon, UC Riverside

Platune – Pareto Graph with Fill-in jpeg David Sheldon, UC Riverside

Platune – Pareto Graph with Fill-in b1_histogram David Sheldon, UC Riverside

Interdependency Graph Comparison: Manual vs. Automated jpeg b1_histogram g3fax David Sheldon, UC Riverside

Platune Results • DPG is 30x faster than Platune • 2.5x faster than Genetic Algorithms 44 David Sheldon, UC Riverside

Xilinx Microblaze Soft-Core Processor • Tuned the Microblaze for various benchmarks • Exhaustive data generated for 12 benchmarks for comparison • The Microblaze also has a configurable cache, which allows for over 3,000 configurations. • For these tests we used results previously generated thus giving us only 64 configurations. mul Microblaze bs MSR FPU PCMP div David Sheldon, UC Riverside

Network on Chip – Results • DPG also works on larger design spaces David Sheldon, UC Riverside

DPG Scales Well David Sheldon, UC Riverside

Conclusion • DoE-Based Pareto-Point Generation (DPG) algorithm quickly finds good Pareto Points • Results were better and obtained faster than previous Platune or randomized techniques • Approach is easier to use – no designer knowledge of parameter interdependencies is needed • Useful for FPGAs as well as other parameterized systems, such as SOCs synthesized to ASICs, parameterized SOCs, etc. David Sheldon, UC Riverside

David Sheldon, Frank Vahid * Department of Computer Science and Engineering