230 likes | 255 Views
Chapter 1: Introduction. Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder. Types of parallelism. Like an assembly line A call center Building a house. ILP – instruction level parallelism. (a+b)*(c+d). Computer types. Super computers Clusters
E N D
Chapter 1:Introduction Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder
Types of parallelism Like an assembly line A call center Building a house
ILP – instruction level parallelism (a+b)*(c+d)
Computer types Super computers Clusters Cloud servers Grid computers
Concurrency vs Parallelism Similar but different. Parallelism can enhance sequential execution
Figure 1.2 Summing in sequence. The order of combining a sequence of numbers (7, 3, 15, 10, 13, 18, 6, 4) when adding them to an accumulation variable.
Figure 1.3 Summing in pairs. The order of combining a sequence of numbers (7, 3, 15, 10, 13, 18, 6, 4) by (recursively) combining pairs of values, then pairs of results, and so on.
Figure 1.5 Organization of a multi-core computer system on which the experiments are run. Each processor has a private L1 cache; it shares an L2 cache with its “chip-mate” and shares an L3 cache with the other processors.
Figure 1.6 Schematic diagram of data allocation to threads. Allocations are consecutive indices.
Figure 1.7 The first try at a Count 3s solution using threads.
Figure 1.7 The first try at a Count 3s solution using threads. (cont.)
Figure 1.8 One of several possible interleavings of references to the unprotected variable count, illustrating a race condition.
Figure 1.9 The second try at a Count 3s solution showing the count3s_thread() with mutex protection for the count variable.
Figure 1.11 The count3s_thread() for our third Count 3s solution using private_count array elements.
Figure 1.12 Performance resultsfor our third Count 3s solution.
Figure 1.14 The count3s_thread() for our fourth solution to the Count 3s computations; the private count elements are padded to force them to be allocated to different cache lines.
Figure 1.16 Performance for our fourth solution to the Count 3s problem on an array that does not contain any 3s suggests that memory bandwidth limitations are preventing performance gains for eight processors.