Optimizing Address Assignment for Scheduling Embedded DSPs

Optimizing Address Assignment for Scheduling Embedded DSPs Chun Xue, Zili Shao, Dr. Edwin H. M. Sha Dept. of Computer Science University of Texas at Dallas Dr. Bin Xiao Dept. of Computing Hong Kong Polytechnic University

Outline • Introduction • Motivating Examples • The Algorithms • Experimental Results • Conclusion

Motivation • DSP processors provide dedicated Address Generation Units (AGUs). • AGUs can reduce address arithmetic instructions by modifying address register in parallel with the current instruction • Three modes: Auto-increment, Auto-decrement, and using Modify Register • Subsuming the address arithmetic instructions into indirect address modes improves code size and performance

Load *(AR0) ADAR AR0, 1 3 Add *(AR0) 4 ADAR AR0, 1 5 Stor *(AR0) 1 Load *(AR0)+ 2 Add *(AR0)+ 3 Stor *(AR0) AGU Example To Calculate: C = A + B Assembly Code without AGU Memory Layout Low A AR0 Assembly Code with AGU B C High The address arithmetic instructions can be reduced by modifying address register in parallel with the current instruction by AGU

Address Assignment Optimization • With a careful placement of variables in memory, • total number of address instructions can be reduce • Both code size and timing performance is improved • Address assignment – the optimization of memory layout of program variables • For single functional unit processors, this problem has been studied extensively. • However, little research has been done for multiple function units architecture like TI C6x VLIW processors.

The Previous Work – Single FU Processor • Address Assignment is first studied by Bartley and Liao. • They modeled the program as a graph theoretic optimization problem. • The problem is proved to be NP-hard. • An efficient algorithm is used to find the Maximum Weighted Path Covering

The Previous Work – Single FU Processor • Leuper and Marwedel proposed a tie-breaking heuristic and a variable partitioning method • Gebotys modeled the problem as a network flow problem • All these works have been done on Single Functional Unit with fixed schedule.

The Previous Work – Single FU Processor • Some work has been done on combining scheduling and address assignment • Rao et al. suggested modifying variable access sequence using expression tree transformations • Choi and Kim proposed an algorithm that tightly couples address assignment and scheduling. • All these algorithms target on single FU processors and can not be directly applied on multiple FU processors.

Example for Multiple-FU-Processor A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g (a) An input DAG (b) The Computation in each node

Fix Schedule with Variables Placing by Alphabetic-order

Fix Schedule with the Solve-SOA Address Assignment Optimization

Schedule Length CAN NOT be reduced • After the Solve-SOA algorithm (by Liao et. al.) is applied, the variables are re-arranged in memory • The number of address instructions is reduced • However, the total schedule length is not reduced as much • Why? Because of dependency constraints in the fixed schedule

Address Assignment + Scheduling

Address Assignment with Scheduling • In this example, • we first obtain a nice address assignment • then we schedule based on the obtained address assignment • Therefore, both schedule length and the number of address instructions can be reduced.

Our Basic Idea • Address assignment with scheduling for multiple function units architecture • Construct a nice address assignment first • Perform scheduling based on the obtained address assignment • The experimental results show • 14-18% improvement over list scheduling • 7-10% improvement over Solve-SOA

MFSchSOA Algorithm • Get address assignment by mSOA( ), a modified Solve-SOA. Take a partial access sequence as input and generate an Address Assignment • Perform a multi-FU list scheduling with schedule length and address operation minimization • Assign the longest path from this node to leaf node as priority • Schedule based on a weighted bipartite matching graph

Partial Access Sequence A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g (a) An input DAG (b) The Computation in each node Node A | B | C | D | E | F | G | H d h a | a h e | e f d | b f b | b e f | a b g | f a h | b g b Variable (c) Partial Access Sequence

Address Assignment by mSOA Algorithm d h a | a h e | e f d | b f b | b e f | a b g | f a h | b g b (a) Partial Access Sequence d g d 1+1+1=3 1 h b a 1 1 1+1=2 g h 1 b f 1+1+1=3 1 f 1+1=2 e a 1 e (b) Access Graph: edge e(u,v) denotes u and v are adjacent to each other w(e) times in the partial access sequence. (c) The Address Assignment by Maximum Weight Path Cover from the Solve-SOA

Scheduling on 2-FU Processor d A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g h a g b f e (b) The computation in each node (c) The access sequence (a) An input DAG FU1 4 3 3 4 FU1 Step Node G C C G D F 1 Priority (e) Ready list (f) The schedule in the first step (d) The priority of each node

Scheduling on 2-FU Processor A: a = d + h B: e = a + h C: d = e + f D: b = b + f E: f = b + e F: g = a + b G: h = f + a H: b = b + g d G: fah D bfb h 3 FU1 2 a 2 g 3 F abg FU2 3 b 1 C:efd f A dha e (a) The computation in each node (b) The access sequence (d) The access sequence FU1 FU1 Step 3 3 3 G 1 C D F A 2 F A (c) Ready list (e) The schedule in the second step

Weighted Bipartite Graph • The weight between FUi and the ready node u is calculated as follows: Z -2 if (distance between X & Y =0) W= Z – 1 if (distance between X & Y =1) Z otherwise Where: Z = the priority of u X = the last Variable accessed in FUi Y = the first Variable accessed in u

Experiment Result The Comparison on schedule length for MFSchSOA, simSOA and List Scheduling when there are 2 functional units.

Conclusions • In this paper, we propose an approach to optimize address operations on Muti-FU architecture by considering address assignment and scheduling together. • In our approach, we construct a nice address assignment first and then perform scheduling based on the obtained address assignment • The experimental results show our approach can greatly reduce code size and schedule length comparing with the previous work. * 14-18% improvement over list scheduling * 7-10% improvement over directly using Solve-SOA

Optimizing Address Assignment for Scheduling Embedded DSPs

Optimizing Address Assignment for Scheduling Embedded DSPs

Presentation Transcript

Embedded System Scheduling

A simplified approach for optimizing hydropower generation scheduling

Wireless Scheduling and Channel Assignment

IP Address assignment: Offline discussion summary

SHARC DSPs

Scheduling policies for real-time embedded systems

Companding in DSPs

APNIC IPv6 Address Allocation and Assignment

Optimizing CATI Call Scheduling

An Improved Scheduling Technique for Time-Triggered Embedded Systems

DNS Address Assignment

DSPS Meeting

IPv6 Address Assignment to End Sites

Assignment 2 Non-preemptive scheduling

OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION

DSPs for Future Wireless Base-Stations

SACR: Scheduling-Aware Cache Reconfiguration for Real-Time Embedded Systems

IPv6 Unicast Address Assignment Considerations

Be-Nice Scheduling for embedded SMT processors

DSPs for future wireless systems

Energy Efficient Scheduling Techniques For Real-Time Embedded Systems

Embedded System Scheduling