160 likes | 183 Views
Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis and ASIP Design. Philip Brisk. Ajay K. Verma. Paolo Ienne. International Conference on Computer-Aided Design (ICCAD) San Jose, CA, USA November 6, 2007. Outline. Interprocedural Register Allocation
E N D
Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis and ASIP Design Philip Brisk Ajay K. Verma Paolo Ienne International Conference on Computer-Aided Design (ICCAD) San Jose, CA, USA November 6, 2007
Outline • Interprocedural Register Allocation • Related Work • Contribution • Optimal algorithm for interprocedural register allocation • Optimal algorithm • Runs in polynomial time; is scalable • Experimental Results • Optimal algorithm runs faster than heuristics • Conclusion 1/13
Interprocedural Register Allocation • Register allocation in HLS/ASIP design • How many registers to physically allocate? • Interprocedural version – consider the whole program • Each scalar variable stored in a register • Variables whose lifetimes overlap require distinct registers • Goal: minimize the number of registers allocated • Modeled as graph coloring problem • NP-Compete for general graphs • Polynomial for certain classes of graphs 2/13
X Y Z P Q V Interferences • Local Interferences • Variables in the same procedure • Overlapping lifetimes • Global Interferences • Variables live across procedure calls • Interferences are transitive Main: V Call P V P: … Call Q … X Y X Z Y Z Main 3/13
Related Work • Interprocedural interference graph (IIG): G = (V, E) • V is all variables in the program • E includes both local and global interferences • Goal: Find a minimum coloring of G • Color IIG with heuristic • [Vemuri et al., TODAES ’02] • Scalable approach • [Beidas and Zhu, ASP-DAC ’05] • Color each procedure individually with heuristic • Propagate global interferences at call-points • Only build local interference graph for each procedure 4/13
Contribution • Chordal graphs can be colored in O(|V| + |E|) time [Gavril., Siam J. Comput., ’72] • Local interference graph for a procedure in SSA Form is chordal • [Brisk et al., IWLS ’05, TCAD ’06] • [Hack et al. Info. Proc. Letters, ’06] • [Bouchez et al., MS Thesis, ENS-Lyon ’05] • Contribution: • New SSA-based representation • Theorem: The IIG is chordal and can be colored optimally with a scalable algorithm 5/13
Recursive Calls • How to handle variables live across calls in a recursive chain? • Pushed onto stack • Cannot use registers • Call graph becomes a DAG • Strongly connected components – O(|V| + |E|) • Collapse each SCC into a single node 6/13
Optimal Coloring Algorithm • Allocate “global registers” • Hold variables live across procedure calls (and local variables, when free) • Max-weighted path in call graph – O(|V| + |E|) time for DAGs 2. Represent each procedure in pruned SSA Form • Local interference graph for each procedure is chordal • Copy local variables live across function calls to global registers 3. Color Assignment • Top-down color palette propagation • [Beidas and Zhu, ASP-DAC ’05] • Color each SSA-form procedure without building an interference graph • [Brisk et al., TCAD ’06; Hack and Goos, IPL ’06] 7/13
Launch and Landing Pads • When Pi is called • The maximum stack size is m = δi • Global registers T1…Tm store variables live across calls in the chain • Pi calls Pj at call point ck • L(ck) – set of variables live across the call • Let n = |L(ck)| be the number of variables • Launch and Landing Pads • Parallel copy (Tm+1...Tm+n) ψ(L(ck)) inserted before the call • Parallel copy L(ck) ψ-1(Tm+1…Tm+n) inserted after the call 8/13
Theorem: IIG is chordal CLIQUE N = 6 T1 T2 T3 T4 T5 T6 G1 G2 G3 G4 G5 G6 δ1 = 0 δ2 = 2 δ3 = 3 δ4 = 2 δ5 = 6 δ6 = 5 Tj interferes with each local variable in Gi Gi is chordal 9/13
Experiments • Applications taken from Mediabench and MiBench • Compiled using Machine SUIF • Comparison • Optimal color assignment • Color palette propagation • Heuristic – cannot guarantee optimality • Top-down, Bottom-up • [Beidas and Zhu, ASP-DAC ’05] • Smallest last-ordering heuristic for coloring • [Matula and Beck, JACM ’83] 10/13
Registers Allocated 11/13
Conclusion • Interprocedural register allocation • Optimal, polynomial-time algorithm • Runs efficiently in practice • SSA Form with Launch and Landing Pads • IIG is chordal • Color IIG with a scalable algorithm • Experiments • Optimal algorithm is faster than sub-optimal heuristics • Optimal algorithm never builds an interference graph 13/13
P1 P1 P1 P1 0 0 0 0 P2 P2 P2 P2 0 2 2 2 P1 P1 P1 P3 P3 P3 P3 P4 0 3 3 2 3 P4 P4 P4 2 2 0 c8 c10 c11 c7 c9 P5 P5 P5 6 0 6 P6 P6 P6 5 0 5 P2 P2 P2 P3 P3 P3 P4 P4 P4 c7 c7 c7 c7 1 1 0 1 c8 c8 c8 c8 0 2 2 2 c13 c13 c14 c14 c12 c12 c9 c9 c9 c9 3 3 3 0 c10 c10 c10 c10 2 0 2 2 P5 P5 P6 P6 c11 c11 c11 c11 5 5 0 5 c12 c12 c12 c12 5 5 5 0 δ6 = MAX{δ11, δ14} δ10 = |L(c10)| + δ1 δ7 = |L(c7)| + δ1 δ9 = |L(c9)| + δ1 δ11 = |L(c11)| + δ1 δ13 = |L(c13)| + δ3 δ2 = MAX{δ7, δ8} δ12 = |L(c12)| + δ2 δ8 = |L(c8)| + δ1 δ5 = MAX{δ12, δ13} δ3 = MAX{δ9} δ14 = |L(c14)| + δ4 δ4 = MAX{δ10} c13 c13 c13 c13 0 6 6 6 δ14 = 2 + 2 = 4 δ3 = MAX{3} = 3 δ4 = MAX{2} = 2 δ13 = 3 + 3 = 6 δ10 = 2 + 0= 2 δ2 = MAX{1, 2} = 2 δ5 = MAX{5, 6} = 6 δ8 = 2 + 0= 2 δ9 = 3 + 0= 3 δ11 = 5 + 0= 5 δ6 = MAX{5, 4} = 5 δ7 = 1 + 0 = 1 δ12 = 3 + 2 = 5 c14 c14 c14 c14 4 4 4 0 i δi Example ci |L(ci)| c7 1 1 c8 2 2 c7 c8 c8 c9 c10 c10 c11 c11 c7 c9 c9 3 3 c10 2 2 c11 5 5 c12 3 3 c13 c14 c12 c13 3 3 c14 2 2 δ1 = 0 # vars live across calls
Procedure: A V … Call B W … … V X … … W Y … … X Call B … Y Procedure: A V … T1Ψ(V) Call B V Ψ-1(T1) W … … V X … … W Y … … X T1Ψ(Y) Call B Y Ψ-1(T1) … Y Procedure: B Z … … Z Procedure: B Z … … Z V V T1 W V V W W X W X V Y Y X Z X Y T1 Z Y T1 Chromatic Number = 3 Chromatic Number = 2 Reducing the Chromatic Number 11/16