1 / 20

Center for Embedded Computer Systems University of California, Irvine

Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis. Sumit Gupta. Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark. Supported by Semiconductor Research Corporation. M e m o r y. Control. ALU. Data path.

winka
Download Presentation

Center for Embedded Computer Systems University of California, Irvine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis Sumit Gupta Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark Supported by Semiconductor Research Corporation

  2. M e m o r y Control ALU Data path High Level Synthesis Transform behavioral descriptions to RTL/gate level From C to CDFG to Architecture x = a + b c = a < b x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x; If Node c T F d = e - f g = h + i j = d x g l = e + x

  3. Our Approach to HLS C Input VHDL Output Original CDFG Scheduling & Binding Optimized CDFG • Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling • Source-level code refinement using Pre-synthesis transformations • Code Restructuring by Speculative Code Motions • Operation replication to improve concurrency • Transformations applied dynamically during scheduling to exploit new opportunities due to code motions • Extract a high degree of parallelization using extensive Code Transformations • Improve Resource Utilization and increase Code Compaction • Reduce impact of programming style and control constructs on HLS results • Our approach is particularly suited to descriptions with nested conditionals and loops Source-Level Compiler Transformations Scheduling Compiler Transformations

  4. Hierarchical Intermediate Representation • We use Hierarchical Task Graphs (HTGs) • Maintain structured view of design description • Consists of hierarchy of basic blocks and HTG nodes • 3 Types of HTG Nodes: • Single: No sub-nodes • Compound: sub-nodes • Loop: Encapsulate loops • Augmented by data dependency graphs • Enable Coarse-Grain transformations

  5. Trailblazing: Hierarchical Code Motion Technique • Can move operations across large pieces of code without visiting each node in between

  6. Across Hierarchical Blocks Reverse Speculation Speculation Conditional Speculation Speculative Code Motions Operation Movement to reduce impact of Programming Style on Quality of HLS Results a + If Node T F Early Condition Execution Evaluates conditions As soon as possible b + _ c

  7. Across HTG + Speculate Speculate Scheduling Heuristic BB 0 a • Get Available Ops • a, b, c, d • Determine Code Motions Required • Assign Cost to each Operation • Cost is based on data dependency chain • Schedule Op with lowest Cost + BB 1 b + BB 2 BB 3 BB 4 BB 5 c + BB 6 BB 7 BB 8 BB 9 + d

  8. Across HTG BB 0 a + c + + BB 1 b + BB 2 BB 3 BB 4 BB 5 Speculate BB 6 BB 7 BB 8 BB 9 + d Scheduling Heuristic BB 0 a + BB 1 b + BB 2 BB 3 BB 4 BB 5 c + BB 6 BB 7 BB 8 BB 9 + d

  9. Resource Allocation + If Node If Node BB 0 BB 0 T F T F BB 2 BB 2 BB 1 BB 1 S0 _ _ d d c a a + + S1 b b + + _ _ c BB 3 S2 BB 3 BB 4 e _ _ BB 4 S3 e Unbalanced Conditional Scheduled Design Increasing the Scope of Code Motions Original Design

  10. Resource Allocation + _ _ _ _ _ _ _ Insert New Scheduling Step in Shorter Branch If Node If Node BB 0 BB 0 T F T F BB 2 BB 2 BB 1 BB 1 S0 d d c c a a + + S1 b b e e + + BB 3 BB 3 S2 BB 4 BB 4 e

  11. BB 0 a = b + c If Node BB 1 T F d = a e = g + h BB 2 BB 3 BB 4 After CSE BB 0 a = b + c If Node BB 1 T F d = b + c e = g + h BB 2 BB 3 BB 4 HTG Representation Common Sub-Expression Elimination a = b + c; c = b < c; if (c) d = b + c; else e = g + h; C Description

  12. BB 0 Speculate dcse = b + c BB 1 a = dcse BB 2 BB 3 BB 4 BB 5 d = dcse BB 6 BB 7 BB 8 New Opportunities for “Dynamic” CSEDue to Speculative Code Motions BB 0 BB 1 a = b + c BB 2 BB 3 BB 4 BB 5 d = b + c BB 6 BB 7 BB 8

  13. SPARKHigh Level Synthesis Framework

  14. Experimentation • Experiments for several transformations • Pre-synthesis transformations: loop invariant code motions, CSE • Speculative Code Motions • Dynamic CSE • We have used Spark to synthesize designs derived from several industrial designs • MPEG-1, MPEG-2, GIMP Image Processing software • Scheduling Results • Number of States in FSM • Cycles on Longest Path through Design • VHDL: Logic Synthesis • Critical Path Length (ns) • Unit Area

  15. Target Applications

  16. Code Motions: Logic Synthesis Results Within Basic Blocks & Across Hierar. Blocks + Reverse Speculation & Early Condition Execution + Speculation Condition Speculation

  17. CSE/Dynamic CSE Results All Code Motions Enabled + Only Dynamic CSE + Only CSE + CSE & Dynamic CSE

  18. Conclusions • Parallelizing code transformations enable a new range of HLS transformations • Can provide the needed improvement in quality of HLS results for them to be competitive against manually designed circuits. • Synthesis approach can dominate SOC embedded systems design • Can enable productivity improvements in microelectronic design • Built a synthesis system with a range of code transformations • Platform for applying Coarse and Fine-grain Optimizations • Code transformations address complex control flow • Tool-box approach where transformations and heuristics can be developed • Enables finding the right synthesis script for different application domains • Performance improvements of 60-70 % across a number of designs • We have also shown its effectiveness on an Intel design

  19. Publications • Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive Designs S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in DATE, March 2003 • SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler TransformationsS. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI Design 2003 Best Paper Award • Dynamic Common Sub-Expression Elimination during Scheduling in High-Level SynthesisS. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2002 • Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor BlocksS. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2002 • Conditional Speculation and its Effects on Performance and Area for High-Level SynthesisS. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSS 2001 • Speculation Techniques for High Level synthesis of Control Intensive DesignsS. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DAC 2001 • Analysis of High-level Address Code Transformations for Programmable ProcessorsS. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATE 2000 • Synthesis of Testable RTL Designs using Adaptive Simulated Annealing AlgorithmC.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI Design, 1998 Best Student Paper Award Book Chapter • ASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by Wai-Kai Chen

More Related