1 / 12

Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM)

Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM). Alejandro Salinger Cheriton School of Computer Science University of Waterloo Joint work with Alejandro L ópez -Ortiz and Reza Dorrigiv. Multicore Challenge.

jadyn
Download Presentation

Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) Alejandro Salinger Cheriton School of Computer Science University of Waterloo Joint work with Alejandro López-Ortiz and Reza Dorrigiv

  2. Multicore Challenge • RAM model will no longer accurately reflect the architecture on which algorithms are executed. • PRAM facilitates design and analysis, however: • Unrealistic. • Difficult to derive work-optimal algorithms for Θ(n) processors. • 2, 4, or 8 cores per chip: low degree parallelism. • Thread-based parallelism. Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  3. Multicore Challenge • Design a model such that: • Reflects available degree of parallelism. • Multi-threaded. • Easy theoretical analysis. • Easy to program. “Programmability has now replaced power as the number one impediment to the continuation of Moore’s law” [Gartner] Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  4. The LoPRAM Model • Number of cores is not a constant: modeled as O(log n). • Similar to bit-level parallelism, w = O(log n)-bit word. LoPRAM: • PRAM with p = O(log n) processors running in MIMD mode. • Concurrent Read Exclusive Write (CREW). • Simplest form: high-level thread-based parallelism. • Semaphores and automatic serialization available and transparent to programmer. • p = O(log n) but not p = Θ(log n). Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  5. PAL-threads void mergeSort(int numbers[], int temp[], int array_size) { m_sort(numbers, temp, 0, array_size - 1); } void m_sort(int numbers[], int temp[], int left, int right) { int mid = (right + left) / 2; if (right > left) { m_sort(numbers, temp, left, mid); m_sort(numbers, temp, mid+1, right); merge(numbers, temp, left, mid+1, right); } } Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  6. PAL-threads void mergeSort(int numbers[], int temp[], int array_size) { m_sort(numbers, temp, 0, array_size - 1); } void m_sort(int numbers[], int temp[], int left, int right) { int mid = (right + left) / 2; if (right > left) { palthreads { // do in parallel if possible m_sort(numbers, temp, left, mid); m_sort(numbers, temp, mid+1, right); } // implicit join merge(numbers, temp, left, mid+1, right); } } pending active waiting Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  7. Work-Optimal Algorithms: Divide & Conquer • Recursive divide-and-conquer algorithms with time given by: • By the master theorem: Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  8. Divide & Conquer Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  9. Divide & Conquer • Parallel Master theorem in the LoPRAM: If we assume parallel merging the third case becomes Tp(n) = (f (n)/p). Optimal speedup [i.e. Tp(n) = T(n)/p ] so long as p = O(log n). Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  10. T(n)=7T(n/2)+O(n2) T(n)=O(n2.8) Tp(n)=O(n2.8/p) Matrix Multiplication Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  11. Dynamic programming • Generic parallel algorithm that exploits the parallelism of execution in the DAG. Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

  12. Conclusions • Computers have a small number of processors. • The assumption that p=O(log n) or even O(log2 n) will last for a while. • Designing work-optimal algorithms for a small number of processors is easy. Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM) - A. Salinger

More Related