1 / 19

David Sheldon, Frank Vahid * Department of Computer Science and Engineering

Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods. David Sheldon, Frank Vahid * Department of Computer Science and Engineering University of California, Riverside

ganesa
Download Presentation

David Sheldon, Frank Vahid * Department of Computer Science and Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Good Points: Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine

  2. Parameterized Component: Cache Line Concatenation W1 16 bytes 40% avg savings 4 physical lines filled when line size is 32 bytes bus Counter Off Chip Memory [Zhang/Vahid/Najjar, ISCA 2003, ISVLSI 2003, TECS 2005]

  3. FPGA Systems are Often Built from Parameterized Components • Parameterized components include: • Cache (e.g., size, associatively, line size) • Processors • Co-processors • Buses (e.g., bit width, network-on-chip structure) Cache config config uP config MPEG Enc Bus config config DSP FPGA David Sheldon, UC Riverside

  4. Microblaze Soft-Core Processor – Design Space due to Parameters Pareto points: Points where no point exists that is better in all metrics. • 520 points • Over 10 days • ~35 min per point • <1 min to execute • Remaining time was in synthesis and place and route Cycles Equivalent LUTs

  5. Designer A Pareto Points Differ Per Application and Per Criteria App a1 Energy c1 c2 c3 Pareto points Platform (a) Time ... c1 c2 c3 c3 c1 App a2 Energy c2 Time (b) Designer B David Sheldon, UC Riverside

  6. Previous Work: Parameter Interdependency graph Platune’s Architecture • Platune [Givargis/Vahid 2002]: • Introduced parameter interdependency graph • Edges – parameters are dependent • Nodes not connected – independent • Search dependent parameters exhaustively; compose local Pareto points into global points • Greatly reduces search space if independent parameters • Good results, 44 hours • Randomized Approaches • Pareto Simulated Annealing (PSA) [Talarico 2006] • Good results, 6 hours • Genetic Algorithms [Ascia 2005] • Good results, 4 hours Supply Voltage code code a code $-MEM Bus size assoc. CPU–I$ Bus MEM MIPS I$ D$ size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside

  7. Our Approach • We developed • Design-of-Experiments (DoE)-based technique to automatically generate a parameter interdependency graph • Relieves designer of burden • Technique to generate Pareto-points via parameter interdependency graph edge-weight-basedalgorithm • Improve speed versus Platune • Called DoE-Based Pareto-Point Generator (DPG) Performance Time David Sheldon, UC Riverside

  8. Design of Experiments (DoE) $-MEM Bus code a code code • DoE generates a set of orthogonal experiments that allows for statistical analysis of the search space Bi Bi size assoc. 2k CPU–I$ Bus Bi MEM 8 MIPS I$ Supply Voltage 4.1 D$ size assoc. linesize 8k CPU–D$ Bus 8 32 Supply Voltage m-i$ a code $-m code m-i$ code d$ line i$ assoc d$ size d$ assoc i$ size David Sheldon, UC Riverside

  9. DPG Algorithm • Subsequent DoE analysis determines main effects of parameters Supply Voltage code code a code $-MEM Bus size assoc. Supply Voltage CPU–I$ Bus MEM MIPS I$ m-i$ a code $-m code m-i$ code i$ assoc d$ line D$ i$ size d$ assoc d$ size size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside

  10. DPG Algorithm (cont.) • Compute weight of each pair of nodes • Sort edges in decreasing weight • DK, (I$ assoc, CPU-I$ address code) • DI, (I$ assoc, CPU I$ code) • IK, (CPU-I$ code, CPU I$ address code) • IQ, (CPU-I$ code, $-MEM address code) • KQ, (CPU I$ address code, $-MEM address code) • ... Supply Voltage code code a code $-MEM Bus size assoc. CPU–I$ Bus MEM MIPS I$ D$ size assoc. linesize CPU–D$ Bus David Sheldon, UC Riverside

  11. DPG Algorithm (cont.) • Pair wise merge of nodes • Creates a sparse set of Pareto points • The designer can direct the tool to fill in the regions of interest Original Pareto points Energy Filled in Pareto points Time David Sheldon, UC Riverside

  12. Platune – Pareto Graph with Fill-in jpeg David Sheldon, UC Riverside

  13. Platune – Pareto Graph with Fill-in b1_histogram David Sheldon, UC Riverside

  14. Interdependency Graph Comparison: Manual vs. Automated jpeg b1_histogram g3fax David Sheldon, UC Riverside

  15. Platune Results • DPG is 30x faster than Platune • 2.5x faster than Genetic Algorithms 44 David Sheldon, UC Riverside

  16. Xilinx Microblaze Soft-Core Processor • Tuned the Microblaze for various benchmarks • Exhaustive data generated for 12 benchmarks for comparison • The Microblaze also has a configurable cache, which allows for over 3,000 configurations. • For these tests we used results previously generated thus giving us only 64 configurations. mul Microblaze bs MSR FPU PCMP div David Sheldon, UC Riverside

  17. Network on Chip – Results • DPG also works on larger design spaces David Sheldon, UC Riverside

  18. DPG Scales Well David Sheldon, UC Riverside

  19. Conclusion • DoE-Based Pareto-Point Generation (DPG) algorithm quickly finds good Pareto Points • Results were better and obtained faster than previous Platune or randomized techniques • Approach is easier to use – no designer knowledge of parameter interdependencies is needed • Useful for FPGAs as well as other parameterized systems, such as SOCs synthesized to ASICs, parameterized SOCs, etc. David Sheldon, UC Riverside

More Related