1 / 38

Lecture 7-2 : Distributed Algorithms for Sorting

Lecture 7-2 : Distributed Algorithms for Sorting. Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14). Definitions of “Sorted”. Definition 1 Sorted list held in memory of a single machine Definition 2 (distributed memory algorithm)

emcneal
Download Presentation

Lecture 7-2 : Distributed Algorithms for Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 7-2 :Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)

  2. Definitions of “Sorted” • Definition 1 • Sorted list held in memory of a single machine • Definition 2 (distributed memory algorithm) • Portion of list in every processor’s memory is sorted • Value of last element on Pi’s list is less than or equal to value of first element on Pi+1’s list • We adopt Definition 2 • Allows problem size to scale with number of processors

  3. Parallel Quicksort • We consider the case of distributed memory • Each process holds a segment of the unsorted list • The unsorted list is evenly distributed among the processes • Desired result of a parallel quicksort algorithm: • The list segment stored on each process is sorted • The last element on process i’s list is smaller than the first • element on process i + 1’s list

  4. Parallel Quicksort • We randomly choose a pivot from one of the processes and broadcast it to every process • Each process divides its unsorted list into two lists: those smaller than (or equal) the pivot, those greater than the pivot • Each process in the upper half of the process list sends its “low list” to a partner process in the lower half of the process list and receives a “high list” in return • Now, the upper-half processes have only values greater than the pivot, and the lower-half processes have only values smaller than the pivot. • Thereafter, the processes divide themselves into two groups and the algorithm recurses. • After log P recursions, every process has an unsorted list of values completely disjoint from the values held by the other processes. • The largest value on process i will be smaller than the smallest value held by process i+1 • Each process can sort its list using sequential quicksort

  5. Parallel Quicksort 75, 91, 15, 64, 21, 8, 88, 54 P0 50, 12, 47, 72, 65, 54, 66, 22 P1 83, 66, 67, 0, 70, 98, 99, 82 P2 20, 40, 89, 47, 19, 61, 86, 85 P3

  6. Parallel Quicksort 75, 91, 15, 64, 21, 8, 88, 54 P0 50, 12, 47, 72, 65, 54, 66, 22 P1 83, 66, 67, 0, 70, 98, 99, 82 P2 20, 40, 89, 47, 19, 61, 86, 85 P3 Process P0 chooses and broadcasts randomly chosen pivot value

  7. Parallel Quicksort 75, 91, 15, 64, 21, 8, 88, 54 P0 50, 12, 47, 72, 65, 54, 66, 22 P1 83, 66, 67, 0, 70, 98, 99, 82 P2 20, 40, 89, 47, 19, 61, 86, 85 P3 Exchange “lower half” and “upper half” values”

  8. Parallel Quicksort 75, 15, 64, 21, 8, 54, 66, 67, 0, 70 P0 Lower“half” 50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61 P1 83, 98, 99, 82, 91, 88 P2 Upper “half” 89, 86, 85 P3 After exchange step

  9. Parallel Quicksort 75, 15, 64, 21, 8, 54, 66, 67, 0, 70 P0 Lower“half” 50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61 P1 83, 98, 99, 82, 91, 88 P2 Upper “half” 89, 86, 85 P3 Processes P0 and P2 choose and broadcast randomly chosen pivots

  10. Parallel Quicksort 75, 15, 64, 21, 8, 54, 66, 67, 0, 70 P0 Lower“half” 50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61 P1 83, 98, 99, 82, 91, 88 P2 Upper “half” 89, 86, 85 P3 Exchange values

  11. Parallel Quicksort 15, 21, 8, 0, 12, 20, 19 Lower “half” of lower “half” P0 Upper “half” of lower “half” 50, 47, 72, 65, 54, 66, 22, 40, 47, 61, 75, 64, 54, 66, 67, 70 P1 Lower “half” of upper “half” 83, 82, 91, 88, 89, 86, 85 P2 Upper “half” of upper “half” 98, 99 P3 Exchange values

  12. Parallel Quicksort 0, 8, 12, 15, 19, 20, 21 Lower “half” of lower “half” P0 Upper “half” of lower “half” 22, 40, 47, 47, 50, 54, 54, 61, 64, 65, 66, 66, 67, 70, 72, 75 P1 Lower “half” of upper “half” 82, 83, 85, 86, 88, 89, 91 P2 Upper “half” of upper “half” 98, 99 P3 Each processor sorts values it controls

  13. Analysis of Parallel Quicksort • Execution time dictated by when last process completes • Algorithm likely to do a poor job of load balancing number of elements sorted by each process • Cannot expect pivot value to be true median • Can choose a better pivot value

  14. Hyperquicksort • Each process starts with a sequential quicksort on its local list • Now we have a better chance to choose a pivot that is close to the true median • The process that is responsible for choosing the pivot can pick the median of its local list • The three next steps of hyperquicksort are the same as in parallel quicksort • broadcast • division of “low list” and high list” • swap between partner processes • The next step is different in hyperquicksort • For each process, the remaining half of local list and the received half-list are merged into a sorted local list • Recursion within upper-half processes and lower-half processes

  15. Hyperquicksort 75, 91, 15, 64, 21, 8, 88, 54 P0 50, 12, 47, 72, 65, 54, 66, 22 P1 83, 66, 67, 0, 70, 98, 99, 82 P2 20, 40, 89, 47, 19, 61, 86, 85 P3 Number of processors is a power of 2

  16. Hyperquicksort 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 19, 20, 40, 47, 61, 85, 86, 89 P3 Each process sorts values it controls

  17. Hyperquicksort 8, 15, 21, 54, 64, 75, 91, 88 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 19, 20, 40, 47, 61, 85, 86, 89 P3 Process P0 broadcasts its median value

  18. Hyperquicksort 8, 15, 21, 54, 64, 75, 91, 88 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 19, 20, 40, 47, 61, 85, 86, 89 P3 Processes will exchange “low”, “high” lists

  19. Hyperquicksort 0, 8, 15, 21, 54 P0 12, 19, 20, 22, 40, 47, 47, 50, 54 P1 64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99 P2 61, 65, 66, 72, 85, 86, 89 P3 Processes merge kept and received values.

  20. Hyperquicksort 0, 8, 15, 21, 54 P0 12, 19, 20, 22, 40, 47, 47, 50, 54 P1 64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99 P2 61, 65, 66, 72, 85, 86, 89 P3 Processes P0and P2 broadcast median values.

  21. Hyperquicksort 0, 8, 15, 21, 54 P0 12, 19, 20, 22, 40, 47, 47, 50, 54 P1 64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99 P2 61, 65, 66, 72, 85, 86, 89 P3 Communication pattern for second exchange

  22. Hyperquicksort 0, 8, 12, 15 P0 19, 20, 21, 22, 40, 47, 47, 50, 54, 54 P1 61, 64, 65, 66, 66, 67, 70, 72, 75, 82 P2 83, 85, 86, 88, 89, 91, 98, 99 P3 After exchange-and-merge step

  23. Observations about hyperquicksort • log P steps are needed in the recursion • The expected number of times a value is passed from one process to another is (log P)/2, therefore quite some communication overhead! • The median value chosen from a local segment may still be quite different from the true median of the entire list • Load imbalance may still arise, although beter parallel quicksort algorithm 1

  24. Another Scalability Concern • Our analysis assumes lists remain balanced • As p increases, each processor’s share of list decreases • Hence as p increases, likelihood of lists becoming unbalanced increases • Unbalanced lists lower efficiency • Would be better to get sample values from all processes before choosing median

  25. Parallel Sorting by Regular Sampling (PSRS Algorithm) • Each process sorts its share of elements • Each process selects regular sample of sorted list • One process gathers and sorts samples, chooses pivot values from sorted sample list, and broadcasts these pivot values • Each process partitions its list into p pieces, using pivot values • Each process sends partitions to other processes • Each process merges its partitions

  26. PSNR Algorithm 75, 91, 15, 64, 21, 8, 88, 54 P0 50, 12, 47, 72, 65, 54, 66, 22 P1 83, 66, 67, 0, 70, 98, 99, 82 P2 Number of processors does not have to be a power of 2.

  27. PSNR Algorithm 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 Each process sorts its list using quicksort.

  28. PSNR Algorithm 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 Each process chooses p regular samples.

  29. PSNR Algorithm 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 15, 54, 75, 22, 50, 65, 66, 70, 83 One process collects p2regular samples.

  30. PSNR Algorithm 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 15, 22, 50, 54, 65, 66, 70, 75, 83 One process sorts p2regular samples.

  31. PSNR Algorithm 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 15, 22, 50, 54, 65, 66, 70, 75, 83 One process chooses p-1 pivot values.

  32. PSNR Algorithm 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 15, 22, 50, 54, 65, 66, 70, 75, 83 One process broadcasts p-1 pivot values.

  33. PSNR Algorithm 8, 15, 21, 54, 64, 75, 88, 91 P0 12, 22, 47, 50, 54, 65, 66, 72 P1 0, 66, 67, 70, 82, 83, 98, 99 P2 Each process divides list, based on pivot values.

  34. PSNR Algorithm 8, 15, 21 12, 22, 47, 50 0 P0 54, 64 54, 65, 66 66 P1 75, 88, 91 72 67, 70, 82, 83, 98, 99 P2 Each process sends partitions to correct destination process.

  35. PSNR Algorithm 0, 8, 12, 15, 21, 22, 47, 50 P0 54, 54, 64, 65, 66, 66 P1 67, 70, 72, 75, 82, 83, 88, 91, 98, 99 P2 Each process merges p partitions.

  36. Assumptions • Each process ends up merging close to n/p elements • Experimental results show this is a valid assumption • Processor interconnection network supports p simultaneous message transmissions at full speed

  37. Advantages of PSRS • Better load balance (although perfect load balance can not be guaranteed) • Repeated communications of a same value are avoided • The number of processes does not have to be power of 2, which is required by parallel quicksort algorithm and hyperquicksort

  38. Summary • Three distributed algorithms based on quicksort • Keeping list sizes balanced • Parallel quicksort: poor • Hyperquicksort: better • PSRS algorithm: excellent

More Related