300 likes | 391 Views
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION. Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer Science University of Texas at Dallas. Background. DSP processors generally provide dedicated address generation units (AGUs)
E N D
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY ANDLOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer Science University of Texas at Dallas
Background • DSPprocessors generally provide dedicated address generation units (AGUs) • AGUs can reduce address arithmetic instructions by modifying address register in parallel with the current instruction • Three modes: Auto-increment, Auto-decrement, and using Modify Register
Goal • Contrary to the traditional compilers, DSP compilers should carefully determine the relative location of data in memory to achieve compacted object code size and improved performance. • We propose a scheme to exploit Address Assignment and scheduling for loops.
Contribution • The algorithm utilizes the techniques of rotation scheduling,address assignment, and array transformation to minimize both address instructions and schedule length. • Compared to the list scheduling, AIRLS shows an average reduction of 35.4% in schedule length and an average reduction of 38.3% in address instructions.
1 Load *(AR0) 2 Adar AR0, 1 3 Add *(AR0) 4 Adar AR0, 1 5 Stor *(AR0) AGU Example Assembly Code without AGU To Calculate: C = A + B Memory Layout Low A AR0 Assembly Code with AGU B 1 Load *(AR0)+ 2 Add *(AR0)+ 3 Stor *(AR0) C High The address arithmetic instructions can be reduced by modifying address register in parallel with the current instruction by AGU
PROCESSOR MODEL • For each functional unit in a multiple functional units processor, there is an accumulator and an address register. • Memory access can only occur indirectly via address registers, AR0 through ARk. • indirect addressing with post-increment, post-decrement
ADDRESS ASSIGNMENT • With a careful placement of variables in memory, • total number of address instructions can be reduce • Both code size and timing performance is improved • Address assignment – the optimization of memory layout of program variables • For single functional unit processors, this problem has been studied extensively. • However, little research has been done for multiple function units architecture like TI C6x VLIW processors.
PREVIOUS WORK • Address Assignment is first studied by Bartley and Liao. • They modeled the program as a graph theoretic optimization problem. • The problem is proved to be NP-hard. • An efficient algorithm is used to find the Maximum Weighted Path Covering
LOOP SCHEDULING • DFG • Data Flow Graph G=(V, E, OP, d) • Static Schedule • Repeated pattern of an execution of an iteration • Unfolding • A schedule of unfolding factor f can be obtained by unfolding G f times. • Rotation • Is a scheduling technique used to optimize a loop schedule with resource constraints. It transforms a schedule to a more compact one iteratively.
Algorithm Step 1 • Put all nodes in the first row of Schedule S into a set R • Delete the first Row of schedule S • Shift S up by 1 control step • Use L to record schedule length of S
Algorithm Step 2 • Retime each node u in set R • r(u) = r(u) + 1 • Update each node with new computation • B[i] = A[i] + 5 transform into • B[i+1] = A[i+1] + 5
Algorithm Step 3 • Generate a new array transformation assignment based on the new computations from Step 2.
Algorithm Step 4 • Rotate each node u in set R by put u into the location with minimum address operations among all available locations. • Function to calculate number of address operations: • AD(x,y) = 0 if x, y are the same • AD(x,y) = 1 if x, y are next to each other • AD(x,y) = 2 otherwise
EXPERIMENT SETTING • Experiment conduct using filters • Performed on PC with Linux • All running time within 10 seconds
RESULT SUMMARY • Compared to the list scheduling, the reduction in schedule length become 45.3% and the reduction in address instructions become 40.7%. • Compared to the rotation scheduling, the reduction in schedule length become 24.8% and the reduction in address instructions become 40.7%.
FURTHER IMPROVEMENT • When we apply unfolding technique with a unfolding factor of 2, the average reduction in schedule length and number of address instructions are both increased • Higher unfolding factor will give more reduction in both schedule length and address instructions
CONCLUSION • We propose an algorithm, AIRLS • Utilizes array transformation, address assignment, and rotation scheduling techniques to reduce schedule length and address operations for loops on multiple functional units DSP • Can significantly reduce schedule length and address instructions comparing to the previous works.
FUTURE WORK • Consider multiple address registers per functional unit • Consider shared address registers • Consider minimum number of address registers needed