150 likes | 287 Views
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems. Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati {kchatha,ranga}@ececs.uc.edu. Organization of Talk. Introduction Overview of Tool Codesign partitioner Pipelined Scheduler
E N D
A Tool for Partitioning andPipelined Schedulingof Hardware-SoftwareSystems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati {kchatha,ranga}@ececs.uc.edu
Organization of Talk • Introduction • Overview of Tool • Codesign partitioner • Pipelined Scheduler • Results • Conclusion
Introduction • Motivation: • The throughput of a loop oriented HW-SW • application can be maximized by obtaining a • pipelined implementation. • Objective: • To obtain a pipelined implementation of the • application on the codesign architecture such that: • - Throughput constraint is satisfied • - HW area constraint is satisfied • - Number of pipeline stages is minimized • - Increase in memory requirement is minimized
A HW Co-processor SW Processor B C Shared Memory Local Memory For SW-HW, HW-SW & HW-HW communication. D For SW-SW communication. IntroductionArchitecture and Task Graph S = 225 ns H = 175 ns (8 +, …) S = 200 ns H = 100 ns (4 *, 8 -, …) S = 400 ns H = 150 ns (4 *, 8 +, …) S = 100 ns H = 400 ns (3 *, 3 +, …) 10 Data items per dependence
Some Definitions • A pipelined design is characterized by its initiation interval. • Initiation interval (II) is the time difference between the • start of two consecutive iterations of the steady state. • Given a partitioned task graph there exists a theoretical • lower bound on the II of its pipelined schedule called the • Minimum Initiation Interval (MII). For a directed acyclic • task graph the MII is given by: • MII = max (Sum_hw, Sum_sw) • where Sum_hw is the sum of execution times of tasks bound • to HW and Sum_sw is the sum of execution times of tasks • bound to SW.
HW-SW Codesign Throughput and Area Constraints Task Graph Architecture Partition Design Satisfy throughput and area constraints. Constraint Satisfied ? Unable to Design with Given Constraints NO YES Calculate MII Set II = MII Satisfy throughput constraints, minimize the number of pipeline stages and minimize the increase in memory requirements. Obtain a Pipelined Schedule which executes in II time. Yes Schd found ? Output Successful Design YES NO Increase II NO II > Constraint ? YES
HW-SW Partitioner • Branch and bound algorithm • Initial solution tries to minimize MII • - Suitability of task to be assigned to HW is given by: • - Sort tasks in descending order of their suitabilities. • - Assign tasks to HW and SW alternatively from front and back • of the sorted list so that Sum_hw and Sum_sw remain • balanced. • We also apply heuristics to effectively limit the search space • of the algorithm.
HW-SW PartitionerArea Estimation • Resources required by tasks divided into two types: • 1. Shared - adders, subtractors, multipliers, dividers • 2. Unshared - interconnect and controller • Shared resource area estimated by taking the union of the • shared resources required by all the HW tasks. • Unshared resource area estimated by adding the area • associated with the unshared resources of all the HW tasks. • Total area estimated by taking the sum of area requirements • of shared and unshared resources.
Try to obtain a task schedule which executes in II time. (use list scheduling) Schd. Found ? Yes Success No Select a dependency to retime. (use RECOD Step 1) Retiming Transformation (use RECOD Step 2) Dependency found ? Yes No Failure Pipelined Scheduling
RECOD Step 1: Select a dependency to retime SW A 1. Dependency is an intra loop dependency (ILD). Var = 20 d = 0 d = 0 d = 0 d = 1 HW SW 2. Dependency between tasks bound to heterogeneous processors. B C D E SW HW d = 0 d = 0 d = 1 3. Dependency whose predecessor task belongs to longer constraining path. HW SW SW d = 0 F G H d = 0 d = 0 d = 0 Var = 10 4. Dependency representing the least number of data items transferred. I SW
RECOD Step 2:Partition to minimize increase in memory requirements. Set P A Cost function for the partitioner B C D E Set R F G H Cutset Retiming Transformation I Set S
JPEG Case Study • We specified the JPEG image compression algorithm as task graph with • 12 tasks. • We then obtained pipelined codesign implementations by specifying • different constraints on the II and HW area.
Execution Time • We evaluated the runtime of the tool by invoking it for 50 random task • graphs and searching for optimal HW-SW partitions.
Percentage deviation of initial solution from final • We calculated the percentage deviation in initiation interval of the initial • partition from the final partition. • The average percentage deviation was 8.4%.
Conclusion • The tool can optimize the throughput, area, pipeline stages • and memory requirements of pipelined HW-SW system. • The tool can obtain solutions for task graphs with upto 30 • nodes within a short period of time. • Although it assumes a single SW processor and single HW • coprocessor the technique can be extended to multiple • processor architectures. • The limitation of the tool is its inability to handle large task • graphs (> 30 nodes) in a reasonable amount of time. • A time out option with the branch and bound partitioner can • overcome this limitation.