120 likes | 281 Views
6. Application mapping. 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors. 6.3 HW/SW partitioning. 6.3.1 Introduction
E N D
6. Application mapping 6. Application mapping (part 2) 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors
6.3 HW/SW partitioning 6.3.1 Introduction • By hardware/software partitioning we mean the mapping of task graph nodes to either hardware or software. Applying hardware/software partitioning, we will be able to decide which parts must be implemented in hardware and which in software. 6.3.2 COOL (COdegigntoOL) • For COOL, the input consists of three parts: • Target technology : This part ofthe input to COOL comprises information about the available hardware platform components. The type of the processors used must be included in this part of the input to COOL. • Design constraints : The second part of the input comprises design constraints such as the required throughput, latency, maximum memory size, or maximum area for application-specific hardware. • Behavior : The third part of the input describes the required overall behavior. Hierarchical task graphs are used for this. COOL used twokinds of edges: communication edges and timing edges. 6. Application mapping (part 2)
For partitioning, COOL uses the following steps: • Translation of the behavior into an internal graph model. • Translation of the behavior of each node from VHDL into C. • Compilation of all C program for the selected target processor type, computation of the resulting program size, estimation of the resulting execution time. • Synthesis of hardware components: For each leaf node, application-specific hardware is synthesized. • Flattening the hierarchy: The next step is to extract a flat graph from the hierarchical flow graph • Generating and solving a mathematical model of the optimization problem: COOL uses integer linear programming (ILP) to solve the optimization problem. • Iterative improvements: In order to work with good estimates of the communication time, adjacent nodes mapped to the same hardware component are now merged. • Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created. 6. Application mapping (part 2)
The following index sets will be used in the description of ILP model: • Index set V denotes task graph nodes. Each vV corresponds to one task graph node. • Index set L denotes task graph node types. Each lL corresponds to one task graph node type. • Index set of M denotes hardware component types. Each mM corresponds to one hardware component type. • For each of the hardware components, there may be multiple copies, or “instances”. Each instance is identified by an indexjJ. • Index set KP denotes processors. Each kKP identifies one of the processors. • The following decision variables are required by the model: • Xv,m : this variable will be 1, if node v is mapped to hardware component type mM and 0 otherwise. • Yv,k : this variable will be 1, if node v is mapped to processor kKP and 0 otherwise. • NYl,k : this variable will be 1, if at least one node of type l is mapped to processor kKP and 0 otherwise. • Type is a mapping VL from task graph to their corresponding types. 6. Application mapping (part 2)
The cost function accumulates the total cost of all hardware units: C=processor costs + memory costs + cost of application specific hardware • We can now present a brief description of some of the constraints of the ILP model: • Operation assignment constraints: These constraints guarantee that each operation is implemented either in hardware or in software. • Additional constraints ensure that decision variables Xv,m and Yv,k have 1 as an upper bound and, hence, are in fact 0/1-valued variables: • If the functionality of a certain node of type l is mapped to some processor k, then the processors’ instruction memory must include a copy of the software for this function: 6. Application mapping (part 2)
Additional constraints ensure that decision variables NYl,k are also 0/1-valued variables: • Resource constraints • Precedence constraints • Design constraints • Timing constraints 6. Application mapping (part 2)
T2 • Example: In the following, we will show how these constraints can be generated for the task graph in Fig. 6.29. • Suppose that we have a hardware component library containing three components types H1, H2 and H3 with costs of 20, 25 and 30 cost units, respectively. Furthermore, suppose that we can also use a processor P of cost 5. • Execution times of tasks T1 to T5 on components T5 T1 T3 T4 6. Application mapping (part 2)
The following operation assignment constraints must be generated, assuming that a maximum of one processor (P1) is to be used: X1,1 + Y1,1 = 1 (Task 1 either mapped to H1 or to P1) X2,2 + Y2,1 = 1 (Task 2 either mapped to H2 or to P1) X3,3 + Y3,1 = 1 (Task 3 either mapped to H3or to P1) X4,1 + Y4,1 = 1 (Task 4either mapped to H3or to P1) X5,1 + Y5,1 = 1 (Task 5 either mapped to H1 or to P1) • Furthermore, assume that the types of tasks T1 to T5 are l=1, 2, 3, 3 and 1, respectively. Then, the following additional resource constraints are required: NY1,1 Y1,1 (6.17) NY2,1 Y2,1 NY3,1 Y3,1 NY3,1 Y4,1 NY1,1 Y5,1 (6.18) • The total function is: Where #() denotes the number of instances of hardware components. This number can be computed from the variables introduced so far if the schedule is also taken into account. 6. Application mapping (part 2)
For a timing constraint of 100 time units, the minimum cost design comprises components H1, H2 and P. This means that tasks T3 and T4 are implemented in software and all others in hardware. 6.4 Mapping to heterogeneous multi-processors • The different approaches for this mapping can be classified by two criteria: mapping tools may either assume a fixed execution platform or may design such a platform during the mapping and they may or may not include automatic parallelization of the source codes. • The DOL tools from ETH incorporate: • Automatic selection of computation templates • Automatic selection of communication techniques • Automatic selection of scheduling and arbitration • The input to DOL consists of a set of tasks together with use cases. The output describes the execution platform, the mapping of tasks to processors together with task schedules. This output is executed to meet constraints and to maximize objectives. 6. Application mapping (part 2)
DOL problem graph 6. Application mapping (part 2) • DOL architecture graph RISC HWM1 HWM1 RISC PTP bus shared bus PTP bus HWM2 shared bus HWM2
DOL specification graph 6. Application mapping (part 2)
DOL implementation 6. Application mapping (part 2) • An allocation : is a subset of the architecture graph, representing hardware components allocated (selected) for a particular design. • A binding : a selected subset of the edges between specification and architecture identifies a relation between the two. Selected edges are called bindings. • A schedule : assigns start times to each node v in the problem graph.