90 likes | 116 Views
Data Parallel Pattern. 6c. 1. ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, Oct 22, 2012. Data Parallel Computations Same operation performed on different data elements simultaneously; i.e., in parallel. Fully synchronous. All processes operate in synchronism
E N D
Data Parallel Pattern 6c.1 ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, Oct 22, 2012
Data Parallel Computations Same operation performed on different data elements simultaneously; i.e., in parallel. Fully synchronous. All processes operate in synchronism Particularly convenient because: • Ease of programming (essentially only one program). • Can scale easily to larger problem sizes. • Many numeric and some non-numeric problems can be cast in a data parallel form. Has been used in vector supercomputers designs in the 1970s. Versions seen in Intel processors, SSE extensions Currently used a basis of GPU operations, see later. 6c.2
Example To add the same constant to each element of an array: for (i = 0; i < n; i++) a[i] = a[i] + k; Statement a[i] = a[i] + k; could be executed simultaneously by multiple processors, each using a different index i (0<i<=n). Vector supercomputers were designed to operate this way with single instruction multiple data model (SIMD) 6c.3
Using forall construct for data parallel pattern Could use forall to specify data parallel operations forall (i = 0; i < n; i++) a[i] = a[i] + k However, forall is more general – it states that the n instances of the body can be executed simultaneously or in any order (not necessarily executed at the same time). We shall see that a GPU implementation of data parallel patterns does not necessarily allow all instances to execute at the same time. Note forall does imply synchronism at its end – all instances must complete before continuing, which will be true in GPUs 6.4
Data Parallel Example Prefix Sum Problem Given a list of numbers, x0, …, xn-1, compute all the partial summations, i.e.: x0+ x1; x0 + x1 + x2; x0 + x1 + x2 + x3; x0 + x1 + x2 + x3 + x4; … Can also be defined with associative operations other than addition. Widely studied. Practical applications in areas such as processor allocation, data compaction, sorting, and polynomial evaluation. 6.5
Sequential code for (j = 0, j < log(n); j++) // at each step for (i = 2j; i < n; i++) // accumulate sum x[i] = x[i] + x[i + 2j]; Parallel code using forall notation for (j=0, j< log(n); j++) // at each step forall (i = 0; i < n; i++) // accumulate sum if (i >= 2j) x[i] = x[i] + x[i + 2j]; 6c.7
Matrix Multiplication Easy to make a data parallel version Change for’s to forall’s: forall(i = 0; i < n; i++) // for each row of A forall(j = 0; j < n; j++) { // for each column of B c[i][j] = 0; for (k = 0; k < n; k++) c[i][j] = c[i][j] + a[i][k] * b[k][j]; } Here the data parallel definition extended to multiple sequential operations on data items – each instance of the body is a separate thread Each instance executed in sequential order 6c.8
We will explore the data parallel pattern using GPUs for high performance computing, see next. Questions so far 6.9