110 likes | 278 Views
Recursion Unrolling for Divide and Conquer Programs. Radu Rugina and Martin Rinard Presented by: Cristian Petrescu-Prahova. Divide and Conquer. Idea: Divide problem in smaller sub problems, solve each in turn Use recursion as primary control structure
E N D
Recursion Unrolling for Divide and Conquer Programs Radu Rugina and Martin Rinard Presented by: Cristian Petrescu-Prahova
Divide and Conquer • Idea: • Divide problem in smaller sub problems, solve each in turn • Use recursion as primary control structure • Base case computation terminates the recursion when a small enough size was reached • Combine results to generate solution of the original problem • Interesting properties: • Lots of inherent parallelism; natural recursively generated concurrency • Good cache performance; natural fits cache hierarchies • In practice: • Potentially too much time spent in divide/combine phases • Increasing the size of the base case alleviates the problem • But the simplest and least error-prone coding style reduces the problem to a minimum size (typically one) • Solution: recursion unrolling
Example: Divide and Conquer Array Increment void dcInc (int * p, int n) { if (n == 1) { *p += 1; } else { dcInc (p, n/2); dcInc (p + n/2, n/2); } } Base case Divide
Inlining Recursive Calls void dcIncI (int * p, int n) { if (n == 1) { *p += 1; } else { if (n/2 == 1) { *p += 1; } else { dcIncI (p, n/2/2); dcIncI (p + n/2/2, n/2/2); } if (n/2 == 1) { *(p + n/2) += 1; } else { dcIncI (p + n/2, n/2/2); dcIncI (p + n/2 + n/2/2, n/2/2); } } } Base case Divide
Conditional Fusion void dcIncI (int * p, int n) { if (n == 1) { *p += 1; } else { if (n/2 == 1) { *p += 1; } else { dcIncI (p, n/2/2); dcIncI (p + n/2/2, n/2/2); } if (n/2 == 1) { *(p + n/2) += 1; } else { dcIncI (p + n/2, n/2/2); dcIncI (p + n/2 + n/2/2, n/2/2); } } } void dcIncF (int * p, int n) { if (n == 1) { *p += 1; } else { if (n/2 == 1) { *p += 1; *(p + n/2) += 1; } else { dcIncI (p, n/2/2); dcIncI (p + n/2/2, n/2/2); dcIncI (p + n/2, n/2/2); dcIncI (p + n/2 + n/2/2, n/2/2); } } } Base case Divide
Reroll Second Unrolling Iteration void dcInc2 (int * p, int n) { if (n == 1) { *p += 1; } else { if (n/2 == 1) { *p += 1; *(p + n/2) += 1; } else { if (n/2/2 == 1) { *p += 1; *(p + n/2/2) += 1; *(p + n/2) += 1; *(p + n/2 + n/2/2) += 1 } else { dcIncI (p, n/2/2/2); dcIncI (p + n/2/2/2, n/2/2/2); dcIncI (p + n/2/2, n/2/2/2); dcIncI (p + n/2/2 + n/2/2/2, n/2/2/2); dcIncI (p + n/2, n/2/2/2); dcIncI (p + n/2 + n/2/2/2, n/2/2/2); dcIncI (p + n/2 + n/2/2, n/2/2/2); dcIncI (p + n/2 + n/2/2 + n/2/2/2, n/2/2/2); } } } } void dcInc2 (int * p, int n) { if (n == 1) { *p += 1; } else { if (n/2 == 1) { *p += 1; *(p + n/2) += 1; } else { if (n/2/2 == 1) { *p += 1; *(p + n/2/2) += 1; *(p + n/2) += 1; *(p + n/2 + n/2/2) += 1 } else { dcIncI (p, n/2/2/2); dcIncI (p + n/2/2/2, n/2/2/2); dcIncI (p + n/2/2, n/2/2/2); dcIncI (p + n/2/2 + n/2/2/2, n/2/2/2); dcIncI (p + n/2, n/2/2/2); dcIncI (p + n/2 + n/2/2/2, n/2/2/2); dcIncI (p + n/2 + n/2/2, n/2/2/2); dcIncI (p + n/2 + n/2/2 + n/2/2/2, n/2/2/2); } } } } void dcIncR (int * p, int n) { if (n == 1) { *p += 1; } else { if (n/2 == 1) { *p += 1; *(p + n/2) += 1; } else { if (n/2/2 == 1) { *p += 1; *(p + n/2/2) += 1; *(p + n/2) += 1; *(p + n/2 + n/2/2) += 1 } else { dcIncR (p, n/2); dcIncR (p + n/2, n/2); } } } } We need rerolling to ensure that the largest unrolled base case is always executed.
Algorithm Algorithm RecursionUnrolling (Proc f, Int m) funroll,0 = clone (f); for (i = 1; i <= m; ++i) funroll,i = RecusionInline (funroll,i-1, f); funroll,i = ConditionalFusion (funroll); freroll,m = RecursionRerolling (funroll,m, f); return freroll,m
Implementation details • Recursion unrolling • Standard procedure inlining • Increases the code size exponentially, must be used with care • Conditional fusion • Bottom up traversal of HTG + conditional match • Recursion rerolling • Replaces the unrolled procedure recursion block with the rolled procedure recursion block if the unrolled procedure conditional sequence implies the rolled procedure conditional sequence • Simple transformations !!!
Experiments • Programs: • Mul: divide and conquer matrix multiplication • 1 recursive procedure with 8 recursive calls • Base case size: 1 element • LU: divide and conquer LU decomposition • 4 mutually recursive procedures; main procedure has 8 recursive calls • Base case size: 1 element • Implementation: • C to C transformations in SUIF • Comparison: • Handcoded divide and conquer from Cilk benchmark set (designed for thread parallelization)
Conclusion • Recursion unrolling, similar with loop unrolling. • Basic recursion unrolling reduces the overhead of procedure call • Extra optimizations: • Conditional fusion: simplifies the control flow • Recursion rerolling: ensures the biggest unrolled base case is always executed • Optimized programs performance is close to that of handcoded programs