1 / 23

Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder

Chapter 1: Introduction. Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder. Types of parallelism. Like an assembly line A call center Building a house. ILP – instruction level parallelism. (a+b)*(c+d). Computer types. Super computers Clusters

annamoore
Download Presentation

Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1:Introduction Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder

  2. Types of parallelism Like an assembly line A call center Building a house

  3. ILP – instruction level parallelism (a+b)*(c+d)

  4. Computer types Super computers Clusters Cloud servers Grid computers

  5. Concurrency vs Parallelism Similar but different.  Parallelism can enhance sequential execution

  6. Figure 1.1

  7. Figure 1.1 (cont.)

  8. Figure 1.2 Summing in sequence. The order of combining a sequence of numbers (7, 3, 15, 10, 13, 18, 6, 4) when adding them to an accumulation variable.

  9. Figure 1.3 Summing in pairs. The order of combining a sequence of numbers (7, 3, 15, 10, 13, 18, 6, 4) by (recursively) combining pairs of values, then pairs of results, and so on.

  10. Figure 1.4

  11. Figure 1.5 Organization of a multi-core computer system on which the experiments are run. Each processor has a private L1 cache; it shares an L2 cache with its “chip-mate” and shares an L3 cache with the other processors.

  12. Figure 1.6 Schematic diagram of data allocation to threads. Allocations are consecutive indices.

  13. Figure 1.7 The first try at a Count 3s solution using threads.

  14. Figure 1.7 The first try at a Count 3s solution using threads. (cont.)

  15. Figure 1.8 One of several possible interleavings of references to the unprotected variable count, illustrating a race condition.

  16. Figure 1.9 The second try at a Count 3s solution showing the count3s_thread() with mutex protection for the count variable.

  17. Figure 1.10 Performance of oursecond Count 3s solution.

  18. Figure 1.11 The count3s_thread() for our third Count 3s solution using private_count array elements.

  19. Figure 1.12 Performance resultsfor our third Count 3s solution.

  20. Figure 1.13

  21. Figure 1.14 The count3s_thread() for our fourth solution to the Count 3s computations; the private count elements are padded to force them to be allocated to different cache lines.

  22. Figure 1.15

  23. Figure 1.16 Performance for our fourth solution to the Count 3s problem on an array that does not contain any 3s suggests that memory bandwidth limitations are preventing performance gains for eight processors.

More Related