CSCI-455/552

CSCI-455/552 Introduction to High Performance Computing Lecture 11

Bucket Sort One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1. 4.9

Parallel Version of Bucket Sort Simple approach Assign one processor for each bucket. 4.10

Further Parallelization Partition sequence into m regions, one region for each processor. Each processor maintains p “small” buckets and separates numbers in its region into its own small buckets. Small buckets then emptied into p final buckets for sorting, which requires each processor to send one small bucket to each of the other processors (bucket i to processor i). 4.11

Another Parallel Version of Bucket Sort Introduces new message-passing operation-all-to-all broadcast. 4.12

“all-to-all” Broadcast Routine Sends data from each process to every other process 4.13

“all-to-all” routine actually transfers rows of an array to columns: Transposes a matrix. 4.14

Parallel Bucket and Sample Sort • The critical aspect of the above algorithm is one of assigning ranges to processors. This is done by suitable splitter selection. • The splitter selection method divides the n elements into p blocks of size n/p each, and sorts each block by using quicksort. • From each sorted block it chooses p – 1 evenly spaced elements. • The p(p – 1) elements selected from all the blocks represent the sample used to determine the buckets. • This scheme guarantees that the number of elements ending up in each bucket is uniformed (less than 2n/p).

Parallel Bucket and Sample Sort An example of the execution of sample sort on an array with 24 elements on three processes.

Parallel Bucket and Sample Sort • The splitter selection scheme can itself be parallelized. • Each processor generates the p – 1 local splitters in parallel. • All processors share their splitters using a single all-to-all broadcast operation. • Each processor sorts the p(p – 1) elements it receives and selects p – 1 uniformly spaces splitters from them.

Parallel Complexity Analysis 4.11

Numerical Integration Using Rectangles Each region calculated using an approximation given by rectangles: Aligning the rectangles: 4.15

Numerical Integration Using Trapezoidal Method May not be better! 4.16

Adaptive Quadrature Solution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas . 4.17

Adaptive Quadrature with False Termination. Some care might be needed in choosing when to terminate. Might cause us to terminate early, as two large regions are the same (i.e., C = 0). 4.18

CSCI-455/552

CSCI-455/552

Presentation Transcript

Welcome to CSCI 480, Computer Graphics

Job Scheduling: Market-based and Proportional Sharing

Working with Human Subjects

Introduction to C++ Programming

Models of Human Performance

Arrays and Strings

Foundations IV: Ontology Evolution and Knowledge Management Class Session 8

Starting Out with Java: Early Objects Third Edition

CSCI 4717/5717 Computer Architecture

OpenGL 1 Angel: Chapter 2 OpenGL Programming and Reference Guides, Angel, other sources.

Engineering Technology as a Career

CSCI 6365

CSci 6971: Image Registration Lecture 26: BSpline Transforms April 20, 2004

Information Visualization

The UNIX OS

CSCI-2500: Computer Organization

CG Architectures, Image Formation, and Models Angel, Chapter 1

CSCI 6360/4360

CSCI 6360/4360

Data Workflow Management, Data Preservation and Stewardship

CSCI 6365