260 likes | 393 Views
Design of parallel algorithms. Linear equations Jari Porras. Linear equations. a 0,0 x 0 + ... + a 0,n-1 x n-1 = b 0 ... a n-1,0 x 0 + ... + a n-1,n-1 x n-1 = b n-1 Ax = b Usually solved in 2 stages reduce into upper triangular system Ux = y back-substitution x n-1 ... x 0
E N D
Design of parallel algorithms Linear equations Jari Porras 1/26
Linear equations a0,0x0 + ... + a0,n-1xn-1 = b0 ... an-1,0x0 + ... + an-1,n-1xn-1 = bn-1 • Ax = b • Usually solved in 2 stages • reduce into upper triangular system Ux = y • back-substitution xn-1 ... x0 • Gaussian elimination 2/26
Gaussian elimination 3/26
Gaussian elimination 4/26
Gaussian elimination • Gaussian elimination requires • n2/2 divisions (line 6) • (n3/3) – (n2/2) subtractions and multiplications (line 12) • Sequential run time 2n3/3 • How is the gausian elimination peformed in parallel ? 5/26
Parallel Gaussian elimination • Row/column striping vs. chackerboarding ? • Block vs cyclic striped ? • Number of processors p < n, p = n, p > n • Active processors ? • Required steps ? 6/26
Analysis • 1st step • kth iteration requires n – k – 1 divisions at processor Pk • 2nd step • (ts + tw(n – k – 1)) log n time on hypercube • 3rd step • kth iteration requires n – k – 1 multiplications and subtractions at all processors Pi • Tp = 3/2 n(n-1) + tsnlog n + ½ twn(n-1)logn 8/26
Analysis • Not cost-optimal since pTp = (n3logn) • What is the main reason ? • Inefficient parallelization ? • What could be done ? 9/26
Analysis • Pipelined operation • all n steps are executed in parallel • last step starts in nth step and is completed in constant time (changes only the bottm right corner element) • (n) steps • Each step takes O(n) time • Thus parallel run time O(n2) and cost (n3) • Cost-optimal !! 12/26
p < n ? • Block striping • several rows / processor • Does the activity change ? • Block vs. cyclic striping 13/26
Analysis • With block striping • processor with all rows belonging to the active part performs (n – k – 1)n/p multiplications and subtractions • if the pipelined version is used the number of arithmetic operations (2(n-k-1)n/p) is higher than number of words communicated (n-k-1) • computation dominates • parallel run time n3/p 16/26
Checkeboard partitioning • Use n x n mesh • Same approach as before, but • requires two broadcasts (rowwise and columnwise) • Analyse the cost-optimality • How about the pipelining ? 17/26
Pipelined checkerboard 19/26
Pipelined checkerboard 20/26
p < n2 • Map matrix onto p x p mesh by usin block checkerboard partitioning • Remember the effect of active processors !! • Number of multiplications and subtractions n2/p and n/ p word communication • computation dominates ! 21/26
Partial pivoting • Basic algorithm fails if any elemnt on diagonal is zero • Partial pivoting helps • select row that has the largest element on the wanted column and exchange rows • What is the effect to the partitioning strategy ? • How about pipelining 24/26
Back-substitution • The second stage of solving linear equations • Back-substitution is used to determine vector x • Complexity n2 • use partitioning scheme that is suitable for Gaussian elimination 25/26
Back substitution 26/26