180 likes | 314 Views
A Revisit to the Primal-Dual Based Clock Skew Scheduling Algorithm. Min Ni and Seda Ogrenci Memik EECS Department, Northwestern University. Agenda. Introduction Related Work The Primal-Dual Algorithm The existing primal-dual approach Our enhanced implementation Experimental Results
E N D
A Revisit to the Primal-Dual Based Clock Skew Scheduling Algorithm Min Ni and Seda Ogrenci Memik EECS Department, Northwestern University
Agenda • Introduction • Related Work • The Primal-Dual Algorithm • The existing primal-dual approach • Our enhanced implementation • Experimental Results • Conclusion
Introduction • The Problem of Clock Skew Scheduling MINIMIZE P constraint graph
Related Work • Existing Approaches for Solving Clock Skew Scheduling • Linear programming • Binary search with iterative shortest path problem • O(|V||E|log(C/n)) • Primal-dual based algorithm (Burns’) • O(|V|^2|E|)
The Primal-Dual Approach • Theory of the Primal-Dual Algorithm Primal variables Complementary slackness theorem: starting from feasible solution of PRIMAL, find feasible solution of DUAL, they can be optimal if certain conditions are met. dual variables
Starting from a feasible solution {Li, P}, if we can also find feasible solution { }to the above system of linear equations, the feasible solution is optimal. Primal-Dual Approach • The complementary slackness conditions General format: variable times constraints If > 0, then must be zero, those = 0 are called admissible edge.
Restricted Dual Problem • Solve the system of linear equations on only admissible edges • This is equivalent to solving the following restricted dual problem If minimum is 0, then we are done. However, it is still not straightforward to solve because it is on dual variables
Restricted Primal Problem • Check on the Restricted Primal Problem • It can be proved that this problem has an optimal solution 0 if there exists a cycle on the admissible graph Ga (consisting of admissible edges only).
Primal-Dual Algorithm • Starting from an empty admissible graph, incrementally reduce the clock period value until a cycle emerges in the admissible graph. • Two main tasks in while loop: • Find THETA; • Maintain Ga; The effect of reducing P is that more edges become admissible and those are inserted into admissible graph Ga.
PRIMAL-DUAL BURNS’ IMPLEMENTATION • A different strategy for maintaining the admissible graph Ga and updating THETA values results in different efficiency.
An Example edge becomes admissible Theta value skew • 5 iterations to find the minimum clock period P by updating admissible graph and theta value;
Enhanced implementation • Two major sources of overhead in the existing implementation • Scan through all edges (|E|) in the graph to create admissible graph Ga from scratch in each iteration; • Calculate theta values for all edges (|E|) in the graph and find the minimum one;
Maintaining admissible graph • Theorem: If exactly one minimum theta value edge (i, j) is added into the admissible graph Ga, then Ga is a forest until a cycle is generated. Add new admissible edge and remove edges becoming non-admissible; No need for calling negative cycle detection routine, maintaining a parent list instead; Complexity is |V| compared with the same step in Burns’ implementation |E|;
EFFICIENT CALCULATION OF THETA • Similar to Dijkstra’s shortest path algorithm, a set of edges are maintained as candidates of shortest path tree edges; • In our problem, we need to find minimum theta edge to add into Ga; • In Burns’ implementation, all edges are scanned during each iteration; theta values are recalculated for all edges; • We maintain a much smaller set of candidates in heap; theta values are only recalculated for a subset of this small candidate set. • O(logV) for maintaining the heap;
Asymptotic runtime improvement • Our implementation has an asymptotic runtime of ; while it is for Burns’ implementation; • Very similar to the improvement from Bellman-Ford algorithm ( )to Dijikstra’s ( ) algorithm for shortest path problem.
Experimental setup • Benchmark circuits • ISCAS89 large circuit • ITC99 • Delay data • Resynthesis in Synopsys Design Compiler (VHDL) • Delay is exported from Standard Delay Format (SDF) file • Comparison between Burns’ and ours • Same graph data structure • Same graph manipulating subroutines • Same routine for calculating theta values
CONCLUSIONS • A much more efficient primal-dual based algorithm to improve the runtime efficiency of Burns’ implementation of the primal-dual algorithm • Superior in both theoretical and practical runtime efficiency • On average 95X speed up on 20 test circuits