570 likes | 810 Views
CS 235102 Data Structures ( 資料結構 ). Lecture 7: Sorting Spring 2012. Why need sorting algo?. An aid is searching As a means for matching entries in list Eg : tax from employee and from employer A sorting algorithm is stable if it generates a permutation with the property
E N D
CS 235102 Data Structures (資料結構) Lecture 7: Sorting Spring 2012
Why need sorting algo? • An aid is searching • As a means for matching entries in list • Eg : tax from employee and from employer • A sorting algorithm is stable if it generates a permutation with the property • If ei = ejand ei precedes ej in the input list, then ei precedes ej in the sorted list
Two categories • Internal sort: • All data is in main memory • Purpose: Decreasing CPU time • insertion sort, merge sort, heap sort, radix sort • External sort: • All data is in secondary memory • Purpose: Decreasing block access time • Merge sort
Insertion sort • Assume we are given a sequence A[1], A[2],…A[n] • Idea: • Divide sequence into 2 parts: Left part: sequence sorted so far Right part: unsorted part • Take one element from right part and insert it into the correct position in the left part
Insertion sort(Cont’d) Eg: 44 55 12 42 94 18 6 67 44 55 12 42 94 18 6 67 44 55 12 42 94 18 6 67 12 44 55 42 94 18 6 67 12 42 44 55 94 18 6 67 …
Insertion sort(Cont’d) void insertion-sort /* index j is used to mark the partition*/ { for(j=2; j<=n;j++){ i = j-1; key = A[j]; while(i>0 and key<A[i]){ A[i+1] = A[i]; i = i-1; } A[i+1] = key; } }
Insertion sort(Cont’d) • Worst case running time • Outer loop: O(n) Inner loop: O(j) • Average case running time: • Stable sort
Quick sort A[r] A[i] ≤ A[r] A[j] › A[r] • Idea: divide and conquer • Pick an element A[r] at random as splitting element • Divide A[1]……A[n] into two sequences sublist are not sorted • Sort the two sublist recursively • Questions: • How to pick up the splitting element ? pick up the first element as splitting element
Quick sort(Cont’d) i j swap i j swap j i i › j stop Sublist 1 Sublist 2 • How to divide the sequence into two sublists ? Eg: 26 5 37 1 61 11 59 15 48 19 26 5 19 1 61 11 59 15 48 37 26 5 19 1 15 11 59 61 48 37 11 5 19 1 15 26 59 61 48 37 • Recursively sort sublist1 and sublist2
Quick sort(Cont’d) void quick-sort(element A[], int left, int right){ if left < right then{ splitting = A[left]; i = left+1; j= right; do{ while(A[i] ≤ splitting and i < right) i = i+1; while(A[i] > splitting and i > right) j = j-1; if i<j then swap(A[i], A[j]); } while(i<j) swap(A[left], A[j]); quick-sort(A, left, j-1); quick-sort(A, j+1, right); } }
Quick sort(Cont’d) A[r] ⁞ Time for dividing Subproblems A[r] n-1 n-2 • Running time • If the splitting element is in the middle T(n) ≤ C*n+2T(n/2) ≤ C*n + C*n + 4T(n/4) ≤ C* + n*T(1) = O( ) • Worst case Ex:1,2,3,4,5,6,7 a sorted list • Average case: O( )
Quick sort(Cont’d) • Find the splitting value: • Try to find the median one • Median{ first, middle, last} • Not stable
Heap sort • Idea: • Use the special data structure heap, to store all elements so that finding the largest element and deletion of the element can be done O( ) • Repeatly extract the largest element from the heap • Heap sort • Build max heap; • For( i = n; i ≤ 2; i--){ • swap(A[1],A[i]); • /*push the root to the correct position*/ • heapify(1, i-1); • }
Heap sort(Cont’d) • Time Complexity: • Worst case: O( ) • Average case: O( ) • Not stable
Merge sort • Idea: • Given two sorted lists • List merge (3, 6, 9, 18) M (2, 4, 5, 10) L Combine two sorted lists in O(M+L) • Use an algorithm similar to PADD (polynomial addition)
Merge sort: Merge Sorted A: merge Sorted SecondPart SortedFirstPart A: A[right] A[left] A[middle]
5 15 28 30 6 10 14 5 2 3 7 8 1 4 5 6 R: L: 3 5 15 28 6 10 14 22 Temporary Arrays Merge Sort: Merge Example A:
Merge Sort: Merge Example A: 3 1 5 15 28 30 6 10 14 k=0 R: L: 3 2 15 3 7 28 30 8 6 1 10 4 5 14 22 6 i=0 j=0
Merge Sort: Merge Example A: 2 1 5 15 28 30 6 10 14 k=1 R: L: 2 3 5 3 15 7 8 28 6 1 10 4 14 5 6 22 i=0 j=1
Merge Sort: Merge Example A: 3 1 2 15 28 30 6 10 14 k=2 R: L: 2 3 7 8 6 1 10 4 5 14 22 6 i=1 j=1
Merge Sort: Merge Example A: 1 2 3 6 10 14 4 k=3 R: L: 2 3 7 8 1 6 10 4 14 5 6 22 j=1 i=2
Merge Sort: Merge Example A: 1 2 3 4 6 10 14 5 k=4 R: L: 2 3 7 8 1 6 10 4 14 5 6 22 i=2 j=2
Merge Sort: Merge Example A: 6 1 2 3 4 5 6 10 14 k=5 R: L: 2 3 7 8 6 1 4 10 5 14 6 22 i=2 j=3
Merge Sort: Merge Example A: 7 1 2 3 4 5 6 14 k=6 R: L: 2 3 7 8 1 6 10 4 14 5 22 6 i=2 j=4
Merge Sort: Merge Example A: 8 1 2 3 4 5 6 7 14 k=7 R: L: 2 3 5 3 7 15 28 8 6 1 10 4 5 14 22 6 i=3 j=4
Merge Sort: Merge Example A: 1 2 3 4 5 6 7 8 k=8 R: L: 3 2 3 5 15 7 8 28 1 6 10 4 14 5 22 6 j=4 i=4
Iterative merge sort • Eg: 26 5 77 1 61 11 59 15 48 19 • Time Complexity O( ) – worst case, average case • stable 5, 26 1, 77 11, 61 15, 59 19, 48 1 , 5 , 26 , 77 11 , 15 , 59 , 61 1 , 5 , 11 , 15 , 26 , 59 , 61 , 77 1 , 5 , 11 , 15 , 19, 26 , 48 , 59 , 61 , 77
Recursive merge sort • Eg: 26 5 77 1 61 11 59 15 48 19 26 5 77 1 61 11 59 15 48 19 26 5 77 1 61 11 59 15 48 19 26 5 77 1 61 11 59 15 48 19 5, 26 77 1, 61 11, 59 15 19, 48 ⁞
Recursive merge sort 26 5 77 1 61 11 59 15 48 19 5, 26 77 1, 61 11, 59 15 19, 48 5, 26, 77 11, 15, 59 1, 5, 26, 61, 77 11, 15, 19, 48, 59 1 , 5 , 11 , 15 , 19, 26 , 48 , 59 , 61 , 77
Recursive merge sort(Cont’d) rmerge_sort(element A[], int l, int u) { if(l ≥ u) return l else{ middle = (l+u)/2; rmerge_sort(A, l, middle); rmerge_sort(A, middle+1, u); listmerge(list, l, middle+1, u); } }
Bin sort(Bucket sort, Radix sort) • Idea: suppose the elements of the sequence to be sorted come from a set of size m, say {1,2,…m} If m is not too large, we can sort as follows: • Create m buckets • Scan the sequence A[1] … A[n], and put A[i] element into the A[i]th bucket • Concatenate all buckets to get the sorted list
Bin sort(Cont’d) A[1] 1 A[2] 7 A[3] 4 A[4] 3 A[5] 6 m ≤ 8 B[1] 1 B[2] B[3] 3 B[4] 4 B[5] B[6] 6 B[7] 7 B[8] => 1, 3, 4, 6, 7 Eg:
Bin sort(Cont’d) • Time Complexity: Step1 = O(m) Step2 = O(n) Step3 = O(m) If m = O(n) then radix sort takes O(n) • Stable
Bin sort(Cont’d) • Sorting with two keys: • Key1: ♣ < ♦ < ♥ < ♠ (MSD) • Key2: 2 < 3 < 4 … J < Q < K < A (LSD) • MSD first: • Bin sort using key1 • Sort each individual piles using the second key • Combine piles by putting the piles on top of the other • LSD first: • Bin sort using key2 • Place the piles (3’s on top of 2’s, …) • Bin sort using key1 • Combine the piles by putting ♠, ♥, ♦, ♣
Bin sort(Cont’d) • Sorting for numbers with large value range • Multiple key bucket sort • Eg: 36 25 9 4 1 64 81 16 49 0 • Sort by LSD • 0, 1, 81, 4, 64, 25, 36, 16, 9, 49 Bin 0 1 2 3 4 5 6 7 8 9 0 1 4 25 36 9 81 64 16 49
Bin sort(Cont’d) Bin 0 1 2 3 4 5 6 7 8 9 0 16 25 36 49 64 81 1 4 9 • Sort by MSD 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 Time complexity = O(k*m) = O(m), k is a constant
Low Bound on Sorting • How fast can we get ? Avg case Worst case Stable Insertion sort O(n2) O(n2) yes Quick sort O(nlogn) O(n2) no Merge sort O(nlogn) O(nlogn) yes Heap sort O(nlogn) O(nlogn) no Bin sort O(n) O(n) yes
The property of sorting Insertion sort is good for small n and when the list is partially sorted Merge sort => O(nlogn) but need more storage Heap sort => large constant Quick sort has the best average case but the worst case is O(n2) Radix sort depends on the size of key
The property of sorting(Cont’d) The low bound on sorting: Ω(nlogn) If sorting is done by key comparison. Why we have the bin sorting algorithm of time complexity O(n) ? Bin sorting is not comparison-based algorithm.
External sort • The list to be sorted is so large that the whole list can not be contained in the M/M • A block of data is read from and written to a disk at one time • Can the following sorting algorithm be used for external sort ? • Insertion sort X • Do not know if the smallest element appears as the first element so that it can be output • Quick sort X • Heap sort X
External sort(Cont’d) • Only merge sort can be used • Segments of input lists are sorted using a good internal sort(the sorted segment called run) • The runs generated in phase one are merged together following the merge-tree pattern • Why merge sort works ? only require the leading records of two runs being merged to be appeared in M/M
External sort(using merge sort) Run 1 Run 2 Run 6 … 250 250 250 3 3 3 5 5 5 15 15 15 • Eg: M/M: 750 records block size: 250records 4500 records to be sorted • Internally sort three blocks at a time • M/M is divided into 3 sections(blocks) each is capable of holding 250 records 2 sections: input buffer 1 section: output buffer
External sort(using merge sort) M/M Disk Input buffer 3 Run 1 3 3 3 5 5 5 15 15 15 Input buffer 3 Run 2 merge Output buffer 3 New run • Merge run1 and run2: • The first blocks of run1 and run2 are read into input buffers • The merged data is written to output buffer • Output buffer full=> output to disk • Input buffer empty=> read in the new block
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 Run 1 Run 2 Run 3 Run 4 Run 5 Run 6
Time required • Let tio: time to input/output one block tis: time to sort 750 records ntm: time to merge n records from input buffer to output buffer • Internal sort of each run read in 18 blocks 18tio internal sort 6tis write out 18 blocks 18tio • Merge run1 to run6 read in 18 blocks 18tio merge 4500tm write out 18 blocks 18tio
Time required • Merge two runs of 6 blocks each 24tio + 3000tm • Merge the run of 12 blocks and one run of 6 blocks 36tio+ 4500tm Total: 132tio + 12000tm+ 6tis 132tio – I/O operation 12000tm + 6tis – CPU time 132tI/O (overlap I/O and CPU)
Optimal merging of runs(weighted external path) Internal node 15 2 4 5 15 5 2 4 External node Run of size 2 (A) (B) Run maybe of different size Different merge sequence may result in different cost
Optimal merging of runs(weighted external path) • Records in run1 3 merge in (A) 2 merge in (B) • In merge tree A • Cost = (2+4) + (2+4+5) + (2+4+5+15) = 2*3 + 4*3 + 5*2 + 15*1 = 43 • In merge tree B • Cost = 2*2 + 4*2 + 5*2 + 15*2 = 52
Weighted external path Sort the weights of leave nodes Take the least two weights and combine them into a tree Update the weight Repeat 2, 3 until only one tree left
Weighted external path 2 4 5 15 26 6 11 2 4 5 15 Weight on the external path (weight on the leaves * the path length) 2 4 5 15 2 4 5 15 • Eg: • The total merge time is Weighted external path length