190 likes | 382 Views
Integrated Hardware-Software Co-Synthesis and High-Level Synthesis for Design of Embedded Systems under Power and Latency Constraints. Alex Doboli VLSI Systems Design Laboratory Department of Electrical and Computer Engineering State University of New York (SUNY) Stony Brook, USA
E N D
Integrated Hardware-Software Co-Synthesis and High-Level Synthesis for Design of Embedded Systems under Power and Latency Constraints Alex Doboli VLSI Systems Design LaboratoryDepartment of Electrical and Computer Engineering State University of New York (SUNY) Stony Brook, USA e-mail: adoboli@ece.sunysb.edu
Motivation of the Paper • Challenges for Hw/sw co-synthesis of low power systems: • System-level design decisions are critical for establishing the power • consumption of an implementation • It is hard to decide effective latency-power consumption • trade-offs without knowing in detail the used hardware resources Our Proposed Solution Hw/sw co-synthesis and HLS have to be integrated so that efficient power saving decisions can be contemplated at the system level
Presentation Outline • System Representation for Integrated Co-Synthesis • Performance Modeling for Co-Synthesis • Modeling of Design Decisions for Co-Synthesis • Integrated Co-Synthesis Methodology • Experimental Results • Conclusions and future work
Related Work • Co-synthesis traditionally contemplates following assumptions: • Hardware is abstracted (Ernst 93, Gupta 92, Yen 97) by its capacity to • (1) concurrently execute operations and • (2) be shared by similar operations • => Power consumption minimization needs detailed perspective on Hw • System functionality is described as a task graph with data dependencies, only • (Dave 97, Henkel 99, Yen 97) • => Control dependencies enable better quality implementations • Recently, Co-synthesis for low-power embedded systems emerged: • Henkel @ DAC 99suggests hw-sw partitioning for low-power systems • Dave et al @ DAC 97 propose low-power co-synthesis including • resource allocation, scheduling and performance estimation
What do we propose? Paper presents an integrated approach to hardware-software co-synthesis and HLS for design of low-power embedded systems Goal is to find the hardware-software implementation of a system, that minimizes overall power consumption while satisfying a global latency constraint Assumptions: All available hardware resources are known (I.e. general-purpose processing elements, functional units (adders, multipliers, etc))
What do we propose? • Integrated method for Co-synthesis & HLS was realized using simulated annealing: • Operation clusters are partitioned and scheduled to PEs (as in traditional • co-synthesis) and operations are bound and scheduled to FUs (like in HLS) • Low-power oriented aspects such as PE shut-down are contemplated for • each solution • Exploration is guided by Performance Models (PM). PMs capture the relationship • between latency and power consumption and design decisions i.e. binding and scheduling • Our Contribution: Integrated approach to co-synthesis and HLS permits: • 1) More accurate latency and power estimations • 2) Exposes RTL-level decisions for power reduction at the system level • => More effective performance trade-offs during co-synthesis
System Representation for Integrated Co-Synthesis and HLS • Hierarchical Data and Control Dependency Graph includes: • Operation nodes (ON) used for High-level synthesis • Cluster nodes (CN) used for Hardware/Software Co-synthesis • Data and control dependencies exist among the nodes • ON and CN are annotated with: • - Execution time and power consumption
1 System Representation for Integrated Co-Synthesis and HLS Communication node Cond1 2 - Cond1 Cluster node 6 4 3 - Cond2 Cond2 7 * * * * Cluster node + + 8 5 Operation node 1
Performance Modeling for Co-Synthesis • Motivation: • For effective co-synthesis, performances of implementations have to be accurately related to functionality and design decisions • Solution: • EmployPerformance Models that accurately capture timing and power consumption
Modeling of Design Decisions • Performance model includes • Constant part that corresponds to the • Time/power characteristics of the functional nodes • Data and control dependencies • Variable part that reflects • Design decisions I.e. scheduling, binding • Values for latency and power consumption result by • numerically evaluating PMs
T2_ex 1 HDCG + T1_ex + Latency 2 + 0 3 Start T3_ex 4 5 + + End T4_ex T5_ex max max max max Performance Modeling for Co-Synthesis Latency Performance Model
Modeling of Design Decisions for Co-Synthesis • We modeled the impact on latency and power consumption Performance Models of • Data and Control dependencies • ON and CN Scheduling • ON Binding and CN Partitioning
Integrated Co-Synthesis Methodology HDCG + Latency Constraints Performance Model Generation Integrated co-synthesis and HLS Partitioning (binding) + Scheduling Ri Ti Ti_ex Performance Model Establish Resource Shut-Down Points ij_stop Latency & Power Consumption
Co-Synthesis Methodology • Partitioning and scheduling are realized as simulation annealing: • Neighborhood definition: a new solution differs by the • execution orderof one pair of CNs/ONs or by the • binding of one CN/ON • Initial Solution:CNs/ONs are uniformly distributed to • hardware resources and scheduled using list-scheduling • Hierarchical exploration: Is emulated using different • probabilities for binding (lower) and scheduling (higher)
Establishment of Resource Shut-Down Points • Greedy approach to identifying shut-down points of unused • hardware resources based on resulting power saving and without • violating timing constraints: • Algorithm is called after each scheduling decision • Each possible shut-down point of a resource is found by examining • ON/CN schedules and identifying the idle times of a resource • Shut-down points are decided based on the amount of the resulting • power savings
Experimental Results • Experimental set-up: • Considered examples • Video coding algorithm H261 • 4 x 4 determinant calculator • Imposed latency constraints • 2000 ms and 4000 ms for H261 • 300 ms and 1000 ms for the determinant calculator • Distinct resource sets considered for each application • Quality of designs was studied by comparing the results of • the integrated approach with task-level co-synthesis (also using simulated annealing)
Experimental Results Power savings for the integrated co-synthesis method range from 1.6% to 27% Power savings are on the average higher than 10% Latencies are also in general smaller for the proposed method We observed that obtained power savings decrease as latency constraint relaxes => more functionality is placed into software and power savings are less effective
Conclusions • We discussed integrated approach to co-synthesis and HLS for minimizing power consumption so that latency constraint is satisfied • We proposed - Hierarchical Data and Control Dependency Graphs - Performance Modeling for latency and power consumption - Hierarchical exploration algorithm for co-synthesis and HLS - Greedy technique for finding resource shut-down points • Experiments showed effective latency and power consumption trade-offs for the integrated co-synthesis approach
Future Work • Extend the integrated co-synthesis methodology by including memory synthesis • Consider a more detailed communication synthesis process • Experiment with alternative methods for hierarchical exploration • Incorporate the shut-down analysis into the combined co-synthesis & HLS methodology