390 likes | 402 Views
Sorting. Quick Sort. Example. S={6, 5, 9, 2, 4, 3, 5, 1, 7, 5, 8}. 2, 4, 3, 1, 5, 5, 6, 9, 7, 8, 5. 2, 1, 3, 4, 5. 5. 1, 2. 3. 4, 5. 5, 6, 7, 8, 9. O(1). O(n). O(n). t(n/2). Quick Sort. Step 1. If n = 1 then return. Step 2. Find the median m of the input array A .
E N D
Quick Sort Example S={6, 5, 9, 2, 4, 3, 5, 1, 7, 5, 8} 2, 4, 3, 1, 5, 5, 6, 9, 7, 8, 5 2, 1, 3, 4, 5 5 1, 2 3 4, 5 5, 6, 7, 8, 9
O(1) O(n) O(n) t(n/2) Quick Sort Step 1. If n = 1 then return. Step 2. Find the median m of the input array A. Step 3. Use m to partition A into two subsequences B and C. Step 4. Quick sort B. Step 5. Quick sort C.
The time complexity of Quick sort t(n)= O(1) + O(n) + O(n) + 2t(n/2) = cn + 2t(n/2) = O(n log n)
Merging Networks Sorting Networks (1, 1)-merger (comparator):
Merging Networks (2, 2)-merger:
Running Time s(2) = 1, i = 1 s(2i) = s(2i-1) + 1, i > 1. There are log n stages in all. s(2i): the time required in the ith stage. s(2i) = i
t(n) = s(21) + s(22) + … + s(2log n) = 1 + 2 + … + log n = O(log2 n)
Number of Processors q(2i): the number of comparators required in the ith stage. q(2) = 1, i = 1 q(2i) = 2q(2i-1) + 2i-1 - 1, i > 1. q(2i) = (i-1)2i-1 + 1
q(2i) = 2q(2i-1) + 2i-1– 1 = 22q(2i-2) + 2i-1– 2 + 2i-1– 1 =23q(2i-3) + 2i-1– 22 + 2i-1– 21 + 2i-1– 20 … = 2i-1q(21) + 2i-1 + … + 2i-1 – (1 + 2 + … + 2i-2) = 2i-1 + (i – 1)2i-1– 2i-1 + 1 = (i – 1)2i-1 + 1
p(n) = 2(logn)-1q(21) + 2(logn)-2q(22) + … + 20q(2log n) = O(nlog2 n)
Cost c(n) = p(n) * t(n) = O(nlog2 n) * O(log2 n) = O(nlog4 n)
Procedure ODD-EVEN TRANSPOSITION (S) for j = 1 to ┌n/2┐ do (1) for i = 1, 3, …, 2└n/2┘ - 1 do in parallel if xi > xi+1 then xi ←→ xi+1 end if end for (2) for i = 2, 4, …, 2└(n-1)/2┘ do in parallel if xi > xi+1 then xi ←→ xi+1 end if end for end for
odd-even steps. Time: = O( ) Cost:
2 ways for reducing cost: (i) reduce running time (ii) reduce # of processors Reducing running time is hopeless since the lower bound for sorting on a linear array of n processors is . .
N processors are available, N < n. Each processor stores data elements. O((n/N)log(n/N)) Stage 1: Sort sequentially in each processor. Stage 2: Odd-even transposition sort. Each comparison-exchange is replaced with a merge-split. 用sequential merge, 每次O(n/N)的時間, 共有N/2個steps, 所以總時間為 . ┌ N/2┐O(n/N)
C(n) =p(n)*t(n) = The algorithm is cost optimal when . .
CRCW Sort 3 1 0 2 5 2 4 5 5 2 4 5
Procedure CRCW SORT(S) Step 1. for i = 1 to n do in parallel for j = 1 to n do in parallel if (si > sj) or (si= sj and i > j) then P(i, j) writes 1 in ci else P(i, j) writes 0 in ci end if end for end for Step 2. for i = 1 to n do in parallel P(i, 1) stores si in position 1 + ci of S end for O(1) O(1)
p(n) = n2 t(n) = O(1) c(n) = n2
CREW Sort(利用CREW MERGE) S={2, 8, 5, 10, 15, 1, 12, 6, 14, 3, 11, 7, 9, 4, 13, 16} N = 4 Step 1. P1: {2, 8, 5, 10} P2: {15, 1, 12, 6} P3: {14, 3, 11, 7} P4: {9, 4, 13, 16} Step 2. P1: {2, 5, 8, 10} P2: {1, 6, 12, 15} P3: {3, 7, 11, 14} P4: {4, 9, 13, 16} Step 3. P1, P2 : {1, 2, 5, 6, 8, 10, 12, 15} P3 , P4 : {3, 4, 7, 9, 11, 13, 14, 16} P1, P2 , P3 , P4 : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
Procedure CREW SORT(S) Step 1. for i = 1 to N do in parallel Processor Pi 1.1 reads a distinct subsequence Si of S of size n/N 1.2 QUICKSORT(Si) 1.3 S(i, 1) ← Si 1.4 P(i, 1)← {Pi} O(n/N log n/N)
Step 2. u ← 1 v ← N while v > 1 do for m = 1 to └v/2┘ do in parallel P(u+1, m) ← P(u, 2m-1) ∪ P(u, 2m) The processors in the set P(u+1, m) perform CREW MERGE(S(u, 2m-1), S(u, 2m), S(u+1, m)) end for if v is odd then P(u+1, [v/2]) ← P(u, v) S(u+1, [v/2]) ← S(u, v) end if u ← u + 1 v ← [v/2] end while 每次O(n/N + log n) 共做[log N]次, 總共所花時間為 [log N] * O(n/N + log n)
t(n) = O((n/N) log(n/N)) + O((n/N) + log n) *log N = O((n/N) log n – (n/N) log N) + O((n/N) log N + log n log N) = O((n/N) log n + log n log N) c(n) = p(n) * t(n) = N * O((n/N) log n + log n log N) = O(n log n + N log n log N) The algorithmis cost optimal whenN ≦ n/log N.
Sorting on the EREW Model Simulating Procedure CREW Sort: 用MULTIPLE BROADCAST來取代 Concurrent Read. 多花log N的時間. t(n) = O((n/N)log n + log n log N) * O(log N) = O([(n/N) + log N] log n log N) c(n) = O([(n/N) + log N] log n log N) * N = O((n +N log N) log n log N) which is not cost optimal.
Sorting by Conflict-Free Merging 用EREW MERGE取代CREW MERGE. 每次EREW MERGE所花的時間為 O((n/N) + log n log N). 要做log N次, 所以總共所花時間為 t(n) = O((n/N) log (n/N)) + O((n/N) + log n log N) * log N = O([(n/N) + log2N] log n) c(n) = O((n + N log2N) log n) which is cost optimal when N ≦ n/log2N.
原因稍後解釋 Parallel Quicksort S={5, 9, 12, 16, 18, 2, 10, 13, 17, 4, 7, 18, 18,1 1, 3, 17, 20, 19, 14, 8, 5, 17, 1, 11, 15, 10, 6} N = n 1-x = 27 1-x x ≒0.5 n = 27 N = 5 將 n 個data分成 21/x 塊,每塊有 n/21/x 個data. n/21/x = 27/4 ≒7 21/x = 21/0.5 = 4 所以PARALLEL SELECT第7, 14, 21大的數字, 分別另為m1, m2, 和m3. m1 = 6, m2 = 11, m3 =17.
4 2 7 m1 = 2, m2 = 4, m3 = 5 m1 = 8, m2 = 10, m3 = 11 m1 = 6, m2 = 11, m3 = 17 S1 = {5, 2, 4, 3, 5, 1, 6} S2 = {9, 10, 7, 8, 10, 11, 11} S3 = {12, 16, 13, 14, 15, 17, 17} S4 = {18, 18, 18, 20, 19, 17} n = 7 N = n 1-x = 7 1-0.5 ≒ 2 P3, P4: S2 = {9, 10, 7, 8, 10, 11, 11} P1, P2: S1 = {5, 2, 4, 3, 5, 1, 6} 將 n 個data分成 21/x 塊,每塊有 n/21/x 個data.
P1, P2: S = {5, 2, 4, 3, 5, 1, 6} P3, P4: S = {9, 10, 7, 8, 10, 11, 11} m1 = 2, m2 = 4, m3 = 5 m1 = 8, m2 = 10, m3 = 11 S1 = {7, 8} S2 = {9, 10} S3 = {10, 11} S4 = {11} S1 = {1, 2} S2 = {3, 4} S3 = {5, 5} S4 = {6}
4 2 7 m1 = 18, m2 = 18, m3 = 20 m1 = 13, m2 = 15, m3 = 17 m1 = 6, m2 = 11, m3 = 17 S1 = {5, 2, 4, 3, 5, 1, 6} S2 = {9, 10, 7, 8, 10, 11, 11} S3 = {12, 16, 13, 14, 15, 17, 17} S4 = {18, 18, 18, 20, 19, 17} n = 7 N = n 1-x = 7 1-0.5 ≒ 2 P3, P4: S4 = {18, 18, 18, 20, 19, 17} P1, P2: S3 = {12, 16, 13, 14, 15, 17, 17} 將 n 個data分成 21/x 塊,每塊有 n/21/x 個data.
P1, P2: S = {12, 16, 13, 14, 15, 17, 17} P3, P4: S = {18, 18, 18, 20, 19, 17} m1 = 18, m2 = 18, m3 = 20 m1 = 13, m2 = 15, m3 = 17 S1 = {17, 18} S2 = {18, 18} S3 = {19, 20} S4 = { } S1 = {12, 13} S2 = {14, 15} S3 = {16, 17} S4 = {17}
procedure EREW SORT (S) if then QUICKSORT (S) else (1) for i=1 to k-1 do PARALLEL SELECT (S, ) {Obtain } end for (2) (3) for i=2 to k-1 do end for (4) (5) for i=1 to k/2 do in parallel EREW SORT end for (6) for to k do in parallel EREW SORT end for end if
Why ? elements use processors. processors. elements use
time: c(n) = p(n)*t(n) = n1-x * nx log n = n logn which is cost optimal.