270 likes | 280 Views
This paper presents an optimal, polynomial-time algorithm for interprocedural register allocation in high-level synthesis using SSA form. The algorithm is scalable and utilizes a novel SSA-based intermediate representation for chordal color assignment. Experimental results show its effectiveness.
E N D
csda csda Optimal Polynomial-Time Interprocedural Register Allocation for High-Level Synthesis Using SSA Form Philip Brisk Ajay K. Verma Paolo Ienne
Outline • Register Allocation Overview • Interprocedural Register Allocation • Related Work • SSA Form With Launch and Landing Pads • Optimal Solution • Experimental Results • Conclusion
Modeling Register Allocation • For Procedure Pi… • Build interference graph Gi = (Vi, Ei) • Vi – One vertex for each variable • Ei – Edge between each pair of interfering variables • Two variables interfere if their lifetimes overlap • Compute the chromatic number χ(Gi) • Color assignment = Register assignment • NP-Complete in general
Local Interferences • Local Interferences – Single Procedure • Overlapping lifetimes • Static Single Assignment (SSA) Form • Interference graph is chordal X Y X Z Y Z Y Z X Y X Z
Global Interferences • Global Interferences • Variable V is live across a call to procedure P • V interferes with EVERY local variable in P • And all variables in all procedures reachable from P • Must consider all paths through the Call Graph Main: V Call P V P: … Call Q … Q: … Main P Q
Global Interferences and Recursion • Fact: • No register can hold a local variable across a recursive function call • Runtime stack is required • Some exceptions (e.g. static local variables) • Ignored here • Call Graph • Compute strongly connected components (SCCs) • Collapse each SCC into a single node • Resulting “Augmented Component Graph” is acyclic
Interprocedural Register Allocation • Interprocedural Interference Graph (IIG) • Undirected graph G = (V, E) • V – All variables in all procedures • E – Local AND global interferences • Compute chromatic number χ(G)
Related Work • Interprocedural Register Allocation in HLS • Color IIG with heuristic [Vemuri et al., TODAES ’02] • IIG is large • Polynomial heuristics are still slow • Scalable Approach [Beidas and Zhu, ASP-DAC ’05] • Color each procedure individually • Use any heuristic you want • Use any intermediate representation you want • Propagate global interferences at call points • IIG is never built
Contribution • Interprocedural register allocation • Optimal, polynomial-time algorithm • Scalable • IIG is never built • If built, it would be chordal • Each Procedure colored individually • SSA Form – interference graph is chordal • Special case of [Beidas and Zhu, ASP-DAC ’05] • Top-down color propagation • Novel SSA-based intermediate representation • Chordal color assignment (with offset)
Procedure Call P – Set of Procedures in App. Pi Pj Pi – Procedure ck – Call Point ck L(ck) – Set of variables live across ck ck: Call Pj … Preallocation of Global Registers • Global registers hold variables that are live across procedure calls • How many do we need?
Pi Preallocation of Global Registers • Compute: δ – Number of variables live… • At the entry of a procedure • Across a call point Procedure: Pi ck: Call … δ2 (δi is known) … δ1 δm L(ck) … δi = MAX {δk} 1 ≤ k ≤ m δk = δi + |L(ck)| (i.e. Over all points that call Pi)
P1 P1 P1 P1 0 0 0 0 P2 P2 P2 P2 2 2 0 2 P1 P1 P1 P4 P3 P3 P3 P3 3 3 0 3 2 P4 P4 P4 2 0 2 c8 c10 c11 c7 c9 P5 P5 P5 0 6 6 P6 P6 P6 0 5 5 P2 P2 P2 P3 P3 P3 P4 P4 P4 c7 c7 c7 c7 0 1 1 1 c8 c8 c8 c8 2 0 2 2 c13 c13 c14 c14 c12 c12 c9 c9 c9 c9 3 3 0 3 c10 c10 c10 c10 2 2 2 0 P5 P5 P6 P6 c11 c11 c11 c11 5 5 5 0 c12 c12 c12 c12 5 5 5 0 δ10 = |L(c10)| + δ1 δ11 = |L(c11)| + δ1 δ5 = MAX{δ12, δ13} δ9 = |L(c9)| + δ1 δ14 = |L(c14)| + δ4 δ2 = MAX{δ7, δ8} δ8 = |L(c8)| + δ1 δ3 = MAX{δ9} δ6 = MAX{δ11, δ14} δ7 = |L(c7)| + δ1 δ13 = |L(c13)| + δ3 δ12 = |L(c12)| + δ2 δ4 = MAX{δ10} c13 c13 c13 c13 6 6 0 6 δ10 = 2 + 0= 2 δ7 = 1 + 0 = 1 δ3 = MAX{3} = 3 δ6 = MAX{5, 4} = 5 δ4 = MAX{2} = 2 δ13 = 3 + 3 = 6 δ2 = MAX{1, 2} = 2 δ14 = 2 + 2 = 4 δ5 = MAX{5, 6} = 6 δ11 = 5 + 0= 5 δ8 = 2 + 0= 2 δ12 = 3 + 2 = 5 δ9 = 3 + 0= 3 c14 c14 c14 c14 4 0 4 4 i δi Example ci |L(ci)| c7 1 1 c8 2 2 c7 c8 c8 c9 c10 c10 c11 c11 c7 c9 c9 3 3 c10 2 2 c11 5 5 c12 3 3 c13 c14 c12 c13 3 3 c14 2 2 δ1 = 0
Pi P Preallocation of Global Registers • When Procedure Pi is called.. • At most δi variables live across calls leading to Pi • Holds for every path in the call graph • How to ensure that all variables live across calls leading to Pi are assigned to the right register? N = MAX {δi} – Number of global registers allocated T = {T1, …., TN}
Launch and Landing Pads • Procedure Pi calls Pj; (m = δi) • Assign variables live across calls leading to Pi to T1…Tm • Let ck be the call point; n = |L(ck)| • Launch Pad • Parallel copy placed before the call (Tm+1…Tm+n) ψ(L(ck)) • Landing Pad • Copy the values back after the call L(ck) ψ((Tm+1…Tm+n))
Theoretical Consequences of Launch and Landing Pads • Theorem: • All global interferences involve at least one global register • Corollary: • Local variables in distinct procedures do not interfere • Corollary: • No local variable in “main” has a global interference • Theorem: • Every variable defined locally in Pi (m = δi) • Interferes with global registers T1…Tm • Does NOT interfere with global registers Tm+1, … TN => Can assign local vars in Pi to global registers Tm+1, … TN
Reducing the Chromatic Number Procedure: A V … Call B W … … V X … … W Y … … X Call B … Y Procedure: B Z … … Z V W V W X Y Z X Y Chromatic Number = 3
Reducing the Chromatic Number Procedure: A V … T1Ψ(V) Call B V Ψ-1(T1) W … … V X … … W Y … … X T1Ψ(Y) Call B Y Ψ-1(T1) … Y Procedure: B Z … … Z V T1 V W X W V Y X Y T1 Z T1 Chromatic Number = 2
Pi P Characterizing the IIG • Theorem: • T is a clique in the IIG • Theorem: • IIG is chordal • Theorem: • Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)}
Example CLIQUE N = 6 T1 T2 T3 T4 T5 T6 G1 G2 G3 G4 G5 G6 δ1 = 0 δ2 = 2 δ3 = 3 δ4 = 2 δ3 = 6 δ6 = 5 Global interference Tj interferes with each local variable in Gi
Coloring Algorithm • Use SSA+LLP Form, but DON’T build the IIG • For Pi colors in the range 1..δi are unavailable • Color the local (chordal) interference graph Gi of Pi • Complexity: O(Vi + Ei) • For each vertex in Pi, replace color c with c + δi • Complexity: O(Vi)
Experiments • Applications taken from Mediabench and MiBench • Written in C • Compiled Using Machine SUIF • Optimal color assignment • Compare to heuristics • Color Palette Propagation • Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05] • Heuristic Color Assignment [Matula and Beck, JACM ’83]
Limitations • Global Variables • Interfere with all variables in the program • Lifetime can still be analyzed • Static Local Variables • Initialized on first access • Hold their values across function calls • Function Pointers • Resolution is NP-Complete
Conclusion • Inteprocedural register allocation in HLS • Optimal, polynomial-time algorithm • Uses SSA Form + Launch/Landing Pads • IIG is a chordal graph • Scalable – no need to build IIG • Significantly faster than sub-optimal heuristics • A few limitations • Global variables, local static variables • Function pointers • Resolution is NP-Complete
Related Work • Register Allocation in HLS • Clique Partitioning/Coloring Problem • [Tseng and Siewiorek, ’86] • Scheduled DFGs – Interval Graphs • [Kurdahi and Parker, ’87] • Scheduled Cyclic DFGs – Circular Arc Graphs • (NP-Complete) • [Stok, ’92] • Restrictions on Variable Lifetimes – Chordal Graphs • [Springer and Thomas, ’94] • Static Single Assignment Form – Chordal Graphs • [Brisk et al. 2005/6], [Hack and Goos, 2005/6], [Bouchez et al. 2005]