180 likes | 280 Views
A NEW GRAPH STRUCTURE FOR HARDWARE-SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS. G. N. Khan and M. Jin System-on-Chip Research Group Electrical & Computer Engineering Ryerson University, Toronto ON M5B 2K3. Hardware-Software (HW/SW) Co-design. Objective:
E N D
ANEW GRAPH STRUCTURE FOR HARDWARE-SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS G. N. Khan and M. Jin System-on-Chip Research Group Electrical & Computer Engineering Ryerson University, Toronto ON M5B 2K3
Hardware-Software (HW/SW) Co-design Objective: To design HW/SW early in the design cycle to produce more reliable, efficient and first time right design with in a reasonable time.
Hardware Software Partitioning • Assignment of System parts to hetrogeneous implementation units (Hardware and Software) • Meet constraints (Timing) and Minimize cost (Area, Time to Market) • Directly affects the cost and performance of final system
Specification • Traditionally in Plain English • MSC, SDL, SystemC were developed • Both textual and graphical representation like DAG (Directed Acyclic Graph) are used to describe system.
What isDADGP • Directed Acyclic Data dependency Graph with Precedence is an extension of DAG • DADGP is a super set of DAG • Two types of edges: 1) Weighted Dependency edge 2) Precedence edge
A 1 B 3 5 C 10 D DADGP Example • Arrow represents dependence relationship • Precedence edge is represented with a line • Precedence dependency captures the order of execution between nodes and such nodes can be executed in parallel. • Only necessary parallelism is exposed
Specification Profiling LD Path Search Mapping No Scheduling No Yes Valid Mapping Constraint Satisfied Yes Finish Overall System Partitioning Structure
System Partitioning Algorithm • Profiling and building an initial DADGP • Find the LD_path (longest delay path) in DADGP • Mapping of LD-path nodes to hardware • Schedule and if invalid mapping then goto Step iii • Update DADGP and calculate the total execution time of target system. • If system constraints (specified by the user) are not met then goto Step ii, otherwise quit.
Profiling Profiler collects the following data • Execution time • Amount of data transfer • Execution order • Data dependencies between nodes
Longest Delay Path Search • Finding the longest delay path in DADGP is like finding a bottleneck of the system • Minimizes search space for mapping • Longest Delay path means, longest execution path
Mapping • Maps a node to be hardware • Mapping can change the Longest Delay path, as well as DADGP • Mapping is valid if mapping that node to Hardware gives the shortest Longest Delay path
Scheduling • Very simple List Scheduling approach. • Schedules the earliest node first without violating the resource limit. • Exposes parallelism and changes the DADGP accordingly.
Summary of DADGP Scheduling • Start scheduling from the root of DADGP • Traverse down the tree and schedule the earliest starting time node • If the node is connected with precedence dependency edge, check whether exposing parallelism can eliminate that edge. When an edge is eliminated, DADGP structure may convert to two DADGPs. Roots of the two DADGPs are combined to form a single DADGP with a dummy root node. • In case of multiple descendents, schedule them forcibly by adding PEs • Update the PE resource (HW-SW) library
Constraints • Constraints of deadline and cost is given by the designer. • Hardware cost is calculated by gate count. • Different granularity level should be explored if no solution is found.
Edge Detection Example Gx Pair of 3x3 masks are convolved to estimate gradients (Gx & Gy) in x and y directions HW-SW Library Precedence dependency Gy Gx2 Data dependency Gy2 Add
Gx Gy Gx Gy 0.1 0.1 0.1 SqX 0.1 SqX 0.1 SqY 0.1 0.1 SqY 0.1 0.1 Add Add Gx Gy 0.1 Gx Gy 0.1 Gx Gy 0.1 0.1 0.1 SqX SqY 0.1 SqX SqY SqX SqY 0.1 0.1 0.1 Add 0.1 Add 0.1 Add 0.1 Edge Detection Solutions
Conclusion • HW-SW Partitioning is a NP-hard problem • To find optimal partitioning Hardware-Software set is very difficult due to many factors affecting the partitioning decision. • DADGP Structure Expose Parallelism • The complexity of DADGP partitioning algorithm is approximately n2log(n).