250 likes | 381 Views
Optimizing Address Assignment for Scheduling Embedded DSPs. Chun Xue, Zili Shao, Dr. Edwin H. M. Sha Dept. of Computer Science University of Texas at Dallas Dr. Bin Xiao Dept. of Computing Hong Kong Polytechnic University. Outline. Introduction Motivating Examples The Algorithms
E N D
Optimizing Address Assignment for Scheduling Embedded DSPs Chun Xue, Zili Shao, Dr. Edwin H. M. Sha Dept. of Computer Science University of Texas at Dallas Dr. Bin Xiao Dept. of Computing Hong Kong Polytechnic University
Outline • Introduction • Motivating Examples • The Algorithms • Experimental Results • Conclusion
Motivation • DSP processors provide dedicated Address Generation Units (AGUs). • AGUs can reduce address arithmetic instructions by modifying address register in parallel with the current instruction • Three modes: Auto-increment, Auto-decrement, and using Modify Register • Subsuming the address arithmetic instructions into indirect address modes improves code size and performance
Load *(AR0) ADAR AR0, 1 3 Add *(AR0) 4 ADAR AR0, 1 5 Stor *(AR0) 1 Load *(AR0)+ 2 Add *(AR0)+ 3 Stor *(AR0) AGU Example To Calculate: C = A + B Assembly Code without AGU Memory Layout Low A AR0 Assembly Code with AGU B C High The address arithmetic instructions can be reduced by modifying address register in parallel with the current instruction by AGU
Address Assignment Optimization • With a careful placement of variables in memory, • total number of address instructions can be reduce • Both code size and timing performance is improved • Address assignment – the optimization of memory layout of program variables • For single functional unit processors, this problem has been studied extensively. • However, little research has been done for multiple function units architecture like TI C6x VLIW processors.
The Previous Work – Single FU Processor • Address Assignment is first studied by Bartley and Liao. • They modeled the program as a graph theoretic optimization problem. • The problem is proved to be NP-hard. • An efficient algorithm is used to find the Maximum Weighted Path Covering
The Previous Work – Single FU Processor • Leuper and Marwedel proposed a tie-breaking heuristic and a variable partitioning method • Gebotys modeled the problem as a network flow problem • All these works have been done on Single Functional Unit with fixed schedule.
The Previous Work – Single FU Processor • Some work has been done on combining scheduling and address assignment • Rao et al. suggested modifying variable access sequence using expression tree transformations • Choi and Kim proposed an algorithm that tightly couples address assignment and scheduling. • All these algorithms target on single FU processors and can not be directly applied on multiple FU processors.
Example for Multiple-FU-Processor A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g (a) An input DAG (b) The Computation in each node
Fix Schedule with the Solve-SOA Address Assignment Optimization
Schedule Length CAN NOT be reduced • After the Solve-SOA algorithm (by Liao et. al.) is applied, the variables are re-arranged in memory • The number of address instructions is reduced • However, the total schedule length is not reduced as much • Why? Because of dependency constraints in the fixed schedule
Address Assignment with Scheduling • In this example, • we first obtain a nice address assignment • then we schedule based on the obtained address assignment • Therefore, both schedule length and the number of address instructions can be reduced.
Our Basic Idea • Address assignment with scheduling for multiple function units architecture • Construct a nice address assignment first • Perform scheduling based on the obtained address assignment • The experimental results show • 14-18% improvement over list scheduling • 7-10% improvement over Solve-SOA
MFSchSOA Algorithm • Get address assignment by mSOA( ), a modified Solve-SOA. Take a partial access sequence as input and generate an Address Assignment • Perform a multi-FU list scheduling with schedule length and address operation minimization • Assign the longest path from this node to leaf node as priority • Schedule based on a weighted bipartite matching graph
Partial Access Sequence A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g (a) An input DAG (b) The Computation in each node Node A | B | C | D | E | F | G | H d h a | a h e | e f d | b f b | b e f | a b g | f a h | b g b Variable (c) Partial Access Sequence
Address Assignment by mSOA Algorithm d h a | a h e | e f d | b f b | b e f | a b g | f a h | b g b (a) Partial Access Sequence d g d 1+1+1=3 1 h b a 1 1 1+1=2 g h 1 b f 1+1+1=3 1 f 1+1=2 e a 1 e (b) Access Graph: edge e(u,v) denotes u and v are adjacent to each other w(e) times in the partial access sequence. (c) The Address Assignment by Maximum Weight Path Cover from the Solve-SOA
Scheduling on 2-FU Processor d A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g h a g b f e (b) The computation in each node (c) The access sequence (a) An input DAG FU1 4 3 3 4 FU1 Step Node G C C G D F 1 Priority (e) Ready list (f) The schedule in the first step (d) The priority of each node
Scheduling on 2-FU Processor A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g d G: fah D bfb h 3 FU1 2 a 2 g 3 F abg FU2 3 b 1 C:efd f A dha e (a) The computation in each node (b) The access sequence (d) The access sequence FU1 FU1 Step 3 3 3 G 1 C D F A 2 F A (c) Ready list (e) The schedule in the second step
Weighted Bipartite Graph • The weight between FUi and the ready node u is calculated as follows: Z -2 if (distance between X & Y =0) W= Z – 1 if (distance between X & Y =1) Z otherwise Where: Z = the priority of u X = the last Variable accessed in FUi Y = the first Variable accessed in u
Experiment Result The Comparison on schedule length for MFSchSOA, simSOA and List Scheduling when there are 2 functional units.
Experiment Result The Comparison on schedule length for MFSchSOA, simSOA and List Scheduling when there are 3 functional units.
Experiment Result The Comparison on schedule length for MFSchSOA, simSOA and List Scheduling when there are 4 functional units.
Conclusions • In this paper, we propose an approach to optimize address operations on Muti-FU architecture by considering address assignment and scheduling together. • In our approach, we construct a nice address assignment first and then perform scheduling based on the obtained address assignment • The experimental results show our approach can greatly reduce code size and schedule length comparing with the previous work. * 14-18% improvement over list scheduling * 7-10% improvement over directly using Solve-SOA