190 likes | 205 Views
This paper presents an automated technique, called the DoE-Based Pareto-Point Generator (DPG), for generating Pareto points in design space exploration using statistical analysis. The DPG algorithm is faster and produces better results compared to previous methods. The results show its scalability and effectiveness in various design spaces, including FPGA systems and cache configurations.
E N D
Making Good Points: Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine
Parameterized Component: Cache Line Concatenation W1 16 bytes 40% avg savings 4 physical lines filled when line size is 32 bytes bus Counter Off Chip Memory [Zhang/Vahid/Najjar, ISCA 2003, ISVLSI 2003, TECS 2005]
FPGA Systems are Often Built from Parameterized Components • Parameterized components include: • Cache (e.g., size, associatively, line size) • Processors • Co-processors • Buses (e.g., bit width, network-on-chip structure) Cache config config uP config MPEG Enc Bus config config DSP FPGA David Sheldon, UC Riverside
Microblaze Soft-Core Processor – Design Space due to Parameters Pareto points: Points where no point exists that is better in all metrics. • 520 points • Over 10 days • ~35 min per point • <1 min to execute • Remaining time was in synthesis and place and route Cycles Equivalent LUTs
Designer A Pareto Points Differ Per Application and Per Criteria App a1 Energy c1 c2 c3 Pareto points Platform (a) Time ... c1 c2 c3 c3 c1 App a2 Energy c2 Time (b) Designer B David Sheldon, UC Riverside
Previous Work: Parameter Interdependency graph Platune’s Architecture • Platune [Givargis/Vahid 2002]: • Introduced parameter interdependency graph • Edges – parameters are dependent • Nodes not connected – independent • Search dependent parameters exhaustively; compose local Pareto points into global points • Greatly reduces search space if independent parameters • Good results, 44 hours • Randomized Approaches • Pareto Simulated Annealing (PSA) [Talarico 2006] • Good results, 6 hours • Genetic Algorithms [Ascia 2005] • Good results, 4 hours Supply Voltage code code a code $-MEM Bus size assoc. CPU–I$ Bus MEM MIPS I$ D$ size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside
Our Approach • We developed • Design-of-Experiments (DoE)-based technique to automatically generate a parameter interdependency graph • Relieves designer of burden • Technique to generate Pareto-points via parameter interdependency graph edge-weight-basedalgorithm • Improve speed versus Platune • Called DoE-Based Pareto-Point Generator (DPG) Performance Time David Sheldon, UC Riverside
Design of Experiments (DoE) $-MEM Bus code a code code • DoE generates a set of orthogonal experiments that allows for statistical analysis of the search space Bi Bi size assoc. 2k CPU–I$ Bus Bi MEM 8 MIPS I$ Supply Voltage 4.1 D$ size assoc. linesize 8k CPU–D$ Bus 8 32 Supply Voltage m-i$ a code $-m code m-i$ code d$ line i$ assoc d$ size d$ assoc i$ size David Sheldon, UC Riverside
DPG Algorithm • Subsequent DoE analysis determines main effects of parameters Supply Voltage code code a code $-MEM Bus size assoc. Supply Voltage CPU–I$ Bus MEM MIPS I$ m-i$ a code $-m code m-i$ code i$ assoc d$ line D$ i$ size d$ assoc d$ size size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside
DPG Algorithm (cont.) • Compute weight of each pair of nodes • Sort edges in decreasing weight • DK, (I$ assoc, CPU-I$ address code) • DI, (I$ assoc, CPU I$ code) • IK, (CPU-I$ code, CPU I$ address code) • IQ, (CPU-I$ code, $-MEM address code) • KQ, (CPU I$ address code, $-MEM address code) • ... Supply Voltage code code a code $-MEM Bus size assoc. CPU–I$ Bus MEM MIPS I$ D$ size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside
DPG Algorithm (cont.) • Pair wise merge of nodes • Creates a sparse set of Pareto points • The designer can direct the tool to fill in the regions of interest Original Pareto points Energy Filled in Pareto points Time David Sheldon, UC Riverside
Platune – Pareto Graph with Fill-in jpeg David Sheldon, UC Riverside
Platune – Pareto Graph with Fill-in b1_histogram David Sheldon, UC Riverside
Interdependency Graph Comparison: Manual vs. Automated jpeg b1_histogram g3fax David Sheldon, UC Riverside
Platune Results • DPG is 30x faster than Platune • 2.5x faster than Genetic Algorithms 44 David Sheldon, UC Riverside
Xilinx Microblaze Soft-Core Processor • Tuned the Microblaze for various benchmarks • Exhaustive data generated for 12 benchmarks for comparison • The Microblaze also has a configurable cache, which allows for over 3,000 configurations. • For these tests we used results previously generated thus giving us only 64 configurations. mul Microblaze bs MSR FPU PCMP div David Sheldon, UC Riverside
Network on Chip – Results • DPG also works on larger design spaces David Sheldon, UC Riverside
DPG Scales Well David Sheldon, UC Riverside
Conclusion • DoE-Based Pareto-Point Generation (DPG) algorithm quickly finds good Pareto Points • Results were better and obtained faster than previous Platune or randomized techniques • Approach is easier to use – no designer knowledge of parameter interdependencies is needed • Useful for FPGAs as well as other parameterized systems, such as SOCs synthesized to ASICs, parameterized SOCs, etc. David Sheldon, UC Riverside