760 likes | 777 Views
This compilation presents an overview and analysis of design-driven algorithms for parallel computation, focusing on Divide and Conquer sorting methods. Two potential solutions are evaluated with examples and detailed algorithms for sorting and merging. Pointers, subarrays, merging, and recursively solving subproblems in parallel are discussed, emphasizing the importance of using a divide and conquer approach for efficient sorting. The text demonstrates how to recursively sort and combine results in parallel for effective data processing.
E N D
Design-Driven Compilation Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
Goal: Parallelization Computation + Fully Automatic Design Driven Overview Analysis Problems: Points-to Analysis, Region Analysis Two Potential Solutions Evaluation
Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 Combine
8 2 7 4 6 1 3 5 Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 Combine 1 2 3 4 5 6 7 8
Divide and Conquer Algorithms • Lots of Generated Concurrency • Solve Subproblems in Parallel
Divide and Conquer Algorithms • Lots of Recursively Generated Concurrency • Recursively Solve Subproblems in Parallel
Divide and Conquer Algorithms • Lots of Recursively Generated Concurrency • Recursively Solve Subproblems in Parallel • Combine Results in Parallel
“Sort n Items in d, Using t as Temporary Storage” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n);
“Recursively Sort Four Quarters of d” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); Divide array into subarrays and recursively sort subarrays
7 4 6 1 3 5 8 2 “Recursively Sort Four Quarters of d” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); Subproblems Identified Using Pointers Into Middle of Array d d+n/4 d+n/2 d+3*(n/4)
7 4 6 1 3 5 8 2 “Recursively Sort Four Quarters of d” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d d+n/4 d+n/2 d+3*(n/4)
4 7 1 6 3 5 2 8 “Recursively Sort Four Quarters of d” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); Sorted Results Written Back Into Input Array d d+n/4 d+n/2 d+3*(n/4)
4 1 4 7 1 6 6 7 3 2 5 3 2 5 8 8 “Merge Sorted Quarters of d Into Halves of t” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d t t+n/2
1 1 4 2 3 6 4 7 5 2 6 3 7 5 8 8 “Merge Sorted Halves of t Back Into d” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d t t+n/2
7 4 6 1 3 5 8 2 “Use a Simple Sort for Small Problem Sizes” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d d+n
7 4 1 6 3 5 8 2 “Use a Simple Sort for Small Problem Sizes” • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • sort(d,t,n/4); • sort(d+n/4,t+n/4,n/4); • sort(d+2*(n/2),t+2*(n/2),n/4); • sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • merge(d,d+n/4,d+n/2,t); • merge(d+n/2,d+3*(n/4),d+n,t+n/2); • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n); d d+n
Parallel Sort • void sort(int *d, int *t, int n) • if (n > CUTOFF) { • spawn sort(d,t,n/4); • spawn sort(d+n/4,t+n/4,n/4); • spawn sort(d+2*(n/2),t+2*(n/2),n/4); • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); • sync; • spawn merge(d,d+n/4,d+n/2,t); • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); • sync; • merge(t,t+n/2,t+n,d); • } else insertionSort(d,d+n);
What Do You Need To Know To Exploit This Form of Parallelism? Points-to Information (data blocks that pointers point to) Region Information (accessed regions within data blocks)
Information Needed To Exploit Parallelism d and t point to different memory blocks Calls to sort access disjoint parts of d and t Together, calls access [d,d+n-1] and [t,t+n-1] sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4), n-3*(n/4)); d d+n-1 t t+n-1 d d+n-1 t t+n-1 d d+n-1 t t+n-1 d d+n-1 t t+n-1
Information Needed To Exploit Parallelism d and t point to different memory blocks First two calls to merge access disjoint parts of d,t Together, calls access [d,d+n-1] and [t,t+n-1] merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4), d+n,t+n/2); merge(t,t+n/2,t+n,d); d d+n-1 t t+n-1 d d+n-1 t t+n-1 d d+n-1 t t+n-1
Information Needed To Exploit Parallelism • Calls to insertionSort access [d,d+n-1] • insertionSort(d,d+n); d d+n-1
What Do You Need To Know To Exploit This Form of Parallelism? Points-to Information (d and t point to different data blocks) Symbolic Region Information (accessed regions within d and t blocks)
How Hard Is It To Figure These Things Out? Challenging
How Hard Is It To Figure These Things Out? void insertionSort(int *l, int *h) { int *p, *q, k; for (p = l+1; p < h; p++) { for (k = *p, q = p-1; l <= q && k < *q; q--) *(q+1) = *q; *(q+1) = k; } } Not immediately obvious that insertionSort(l,h) accesses [l,h-1]
How Hard Is It To Figure These Things Out? void merge(int *l1, int*m, int *h2, int *d) { int *h1 = m; int *l2 = m; while ((l1 < h1) && (l2 < h2)) if (*l1 < *l2) *d++ = *l1++; else *d++ = *l2++; while (l1 < h1) *d++ = *l1++; while (l2 < h2) *d++ = *l2++; } Not immediately obvious that merge(l,m,h,d) accesses [l,h-1] and [d,d+(h-l)-1]
Issues • Heavy Use of Pointers • Pointers into Middle of Arrays • Pointer Arithmetic • Pointer Comparison • Multiple Procedures • sort(int *d, int *t, n) • insertionSort(int *l, int *h) • merge(int *l, int *m, int *h, int *t) • Recursion
Fully Automatic Solution • Whole-program pointer analysis • Context-sensitive, flow-sensitive • Rugina and Rinard, PLDI 1999 • Whole-program region analysis • Symbolic constraint systems • Solve by reducing to linear programs • Rugina and Rinard, PLDI 2000
Key Complication Need for sophisticated interprocedural analyses • Pointer analysis • Propagate analysis results through call graph • Fixed-point algorithm for recursive programs • Region analysis • Formulation avoids fixed-point algorithms • Single constraint system for each strongly connected component • Need to have whole program in analyzable form
Bigger Picture • Points-to and region information is (implicitly) part of the interface of each procedure • Programmer understands procedure interfaces • Programmer knows • Points-to relationships on entry • Effect of procedure on points-to relationships • Regions of memory blocks that procedure accesses
Idea Enhance procedure interface to make points-to and region information explicit • Points-to language • Points-to graphs at entry and exit • Effect on points-to relationships • Region language • Symbolic specification of accessed regions • Programmer provides information • Analysis verifies that it is correct
Points-to Language f(p, q, n) { context { entry: p->_a, q->_b; exit: p->_a, _a->_c, q->_b, _b->_d; } context { entry: p->_a, q->_a; exit: p->_a, _a->_c, q->_a; } }
p p q q p p q q Points-to Language f(p, q, n) { context { entry: p->_a, q->_b; exit: p->_a, _a->_c, q->_b, _b->_d; } context { entry: p->_a, q->_a; exit: p->_a, _a->_c, q->_a; } } Contexts for f(p,q,n) entry exit
p p q q p p q q Verifying Points-to Information One (flow sensitive) analysis per context f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
p p p q q q p p q q Verifying Points-to Information Start with entry points-to graph f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
p p q q p p q q Verifying Points-to Information Analyze procedure f(p,q,n) { . . . } Contexts for f(p,q,n) entry p q exit
p p q q p p p q q q Verifying Points-to Information Analyze procedure f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
p p q q p p p q q q Verifying Points-to Information Check result against exit points-to graph f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
p p q q p p q q Verifying Points-to Information Similarly for other context f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
p p p q q q p p q q Verifying Points-to Information Start with entry points-to graph f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
p p q q p p p q q q Verifying Points-to Information Analyze procedure f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
p p q q p p p q q q Verifying Points-to Information Check result against exit points-to graph f(p,q,n) { . . . } Contexts for f(p,q,n) entry exit
Analysis of Call Statements g(r,n) { . . f(r,s,n); . . }
Analysis of Call Statements Analysis produces points-graph before call g(r,n) { . . f(r,s,n); . . } r s
p p q q p p q q Analysis of Call Statements Retrieve declared contexts from callee g(r,n) { . . f(r,s,n); . . } Contexts for f(p,q,n) r entry s exit
p p q q p p q q Analysis of Call Statements Find context with matching entry graph g(r,n) { . . f(r,s,n); . . } Contexts for f(p,q,n) r entry s exit
p p q q p p q q Analysis of Call Statements Find context with matching entry graph g(r,n) { . . f(r,s,n); . . } Contexts for f(p,q,n) r entry s exit