1 / 40

Sorting

Sorting. Sung Yong Shin TC Lab. CS Dept., KAIST. Contents. 1. Introduction 2. Insertion Sort 3. Quick Sort 4. Merge Sort 5. Heap Sort 6. Shell Sort 7. Radix Sort Internal 8. External Sorting External Reading Assignment p149-222, Baase. 1. Introduction.

martha-horn
Download Presentation

Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sorting Sung Yong Shin TC Lab. CS Dept., KAIST

  2. Contents 1. Introduction 2. Insertion Sort 3. Quick Sort 4. Merge Sort 5. Heap Sort 6. Shell Sort 7. Radix Sort Internal 8. External Sorting External • Reading Assignment • p149-222, Baase

  3. 1. Introduction • 25 - 50 % of Computing time !!! • Reporting • Updating Sorting, Searching • Inquiring • Rich Results • Internal Sorting • The file to be sorted is small enough so that the entire sort can be carried out in main memory. • ( Time complexity  better ) • External Sorting • ( # of I/O operations  better )

  4. Record key 1 1 2 2 3 3 … … …… n n • SORT : Given a set of n real numbers, rearrange them in the increasing order. • Given a file of n records (R1, R2, R3, …, Rn ) with keys (k1, k2, …, kn ) find a permutation • such that note : keys are not nec. real numbers

  5. …………… …… Decision Tree (n!) leaves • Lower Bound Worst case • Stability A sorting method is stable if equal keys remain in the same relative order in the sorted list as they were in the original list. • In place An algorithm is said to be in place if the amount of extra space is constant with respect to input size. • Time complexity • Worst • Average TSORT =  ( log2n!) =  ( nlog2n )

  6. 2. Insertion Sort

  7. Algorithm (Insertion Sort) procedure InserionSort (var L : array; n : integer); • var x : Key; xindex , j : Index; • begin for xindex = 2to n do x := L(xindex) j := xindex - 1; while j > 0and L(j) > x do L(j+1) := L(j); j := j-1; end{while} L(j+1) := x; end{for} • end Correctness Proof Exercise. Hint : Loop invariant ( induction on xindex )

  8. TSORT(n) = ( nlogn ) • T(n) ? ( Worst case ) • xindex # of comparisons • 2 1 • 3 2 • 4 3 • … … • ii-1 • … … • nn-1  Total # of comparisons = • T(n) = O(n2) Far from Optimal !!! However, ……

  9. …… Average Behavior • Assumption : The ith element 1≤i ≤n is equally likely to be placed at each of i positions. • Keys are distinct. • Observation: • P( The ith element is in the jth position ) = 1/i, j = 1, 2, …, i • Ai(n) = the average # of comparisons for the ith element • A(n) = the sum of average numbers of comparisons for all i • A(n) = i possible positions

  10. 0 if i = 1 • Ai(n) = • A(n) = (See page 26) O(n2)

  11. Observation • Assumption : (1) Compare adjacent keys (2) Depending on the result, move the curent pair of compared keys locally. • What kind of sorting algorithms under these two assumptions? • Insertion Sort • Bubble Sort • ………… • What is lower bound in time complexity under these assumptions? x i-1 i i+1

  12. xi < xj  • {x1, x2, …, xn} •  : { 1, 2, …, n }  { 1, 2, …, n } (i) < (j)  • (i) means that the ith element is placed at the (i)th position. •  •  (i), 1in, is the final position of xi when the list is sorted.

  13. Now, •  : {1,2,…,n}{1,2,…,n}, where  is defined as • (i) = j, if xi is the jth smallest one • x1 x2 x3 x4 x5 x6 • L = ( 2.2, 5.3, 4.2, 6.6, 1.9, 3.8 ) •       • ( 2, 5, 4, 6, 1, 3 ) • ( (1), (2), (3), (4), (5), (6) )

  14. Def’n : An inversion of the permutation  is an ordered pair • ((i), (j)) such that i < j and (i) > (j) ((i), (j)) is an inversion  The ith and jth keys are left out of order. • How many inversions in L? (2, -) : 1 (5, -) : 3 (4, -) : 2 (6, -) : 2 (1, -) : 0 8 inersions (LOO’s) Given |L| = n, how many possible inversons in the worst case ? inversions !!! why? • 2.2, 5.3, 4.2, 6.6, 1.9, 3.8 • L = (x1 x2 x3 x4 x5 x6 ) •       • ( 2, 5, 4, 6, 1, 3 ) • ( (1), (2), (3), (4), (5), (6) )

  15. ((i), (j)) What does an inversion imply in sorting? • xiis required to follow xjin the sorted list !!! • How can you do this? • detection : comparisons (“Local”) • resolving : “Local” moves • How many inversions in the worst case? •  (n2) in worst case •  θ(n2) in the worst case if only local comparisons / moves are allowed.

  16. How about average case ? • Well, … • need to compute the average # of inversions !!! • : { 1, 2, …, n }  { 1, 2, …, n } (i) < (j)  xi < xj • n! permutations !!! • Assumption: P( = i) = 1/n!, i=1,2,…,n ( (1), (2), …, (n) )T = ( (n), (n-1), …, (1) ). Transpose of 

  17. For each (i,j), ((i), (j)) is an inversion in • either ( (1), (2), …, (n) ) or ( (n), (n-1), …, (1) ) • How many (, T) pairs? why? 1 : (1, 2, 3) 2 :(1, 3, 2)(, T) 3 : (2, 1, 3) (1, 6) 4 : (2, 3, 1) = 2T (2, 4) 5 : (3, 1, 2) = 3T (3, 5) 6 : (3, 2, 1) = 1T • How many (i,j) pairs? • How many inversions in total? 3! 2 3! 

  18. Average # of inversions • (n2) in the average case •  θ(n2) in the average case if only local comparisons / moves are allowed.

  19. Sort In Place? Stable? Insertion Selection Bubble yes yes yes yes no  why? yes • One comparison is needed to resolve each inversion !!! • O(# of inversions) + n Why? What if input data is almost sorted ? - easy to implement - good for small input

  20. Stability in Sorting • (1, 5) (2, 1) (1, 6) (1, 5) • (1, 6) (2, 2) (1, 5) (1, 6) • (1, 7) (2, 3) (1, 7) (1, 7) • (2, 1) (2, 4) (2, 3) (2, 1) • (2, 2) (1, 5) (2, 2) (2, 2) • (2, 3) (1, 6) (2, 4) (2, 3) • (2, 4) (1, 7) (2, 1) unstable (2, 4) stable • The equal keys preserve their relative order !!!

  21. Why is stability related to local swapping ? 1 2 3 • If tied, then stop there. • (to preserve the relative order)

  22. 26 26 26 26 26 (11) 5  5 5 5 5 5 37  o (19) 19 19 19 19 1 1  o 1  1 1 1 61 61 61  o (15) 15 15 11 11 11 11  o 11  t (26) 59 59 59 59  t 59  o 59 15 15 15  t (61) 61 61 48 48  t 48  48 48 48 19  t (37) 37 37 37 37 Basic Idea (1) Place xi in its final position. ( x1, x2,…, xj,…, xn )  xk < xi xi xk > xi k = 1,2,…,j - 1 k = j+1,j+2,…,n (2) Divide and Conquer !!! T(n) = T(j - 1) + T(n - j) +O(n) !!! 3. QuickSort

  23. Alternative method (Textbook) x o t : y  x x < x > x y case 1 o t x < x > x case 2 o t : y < x x < x y > x y o t x < x y > x y o t x <x y >x o t y <x x >x

  24. >x Time Complexity Worst case (when?) • T(n) = T(n-1) + c(n-1), c>0 • 0 x n-1 •  • P(0) P(n-1) • T(0) = 0 T(n-1)

  25. why? • i = 1 A(0) + A(n-1) • 2 A(1) + A(n-2) • …… • n-1 A(n-2) + A(1) • nA(n-1) + A(0) 0 0 0 0 • Average Case

  26. Good Performance !!! • (Practically) • In place? • no!!! why? • Stable? • no!!! why?

  27. Theorem : A(n) = O(nlogen) [Proof] From the previous lecture, • n = 1: • A(1) = 0 why? • nlogen = 1loge1 = 0 • Suppose that for

  28. n = k + 1 :  A(k+1)  c(k+1)loge(k+1) why?

  29. Alternative Proof • B(n) = O(logen) • A(n) = O(nlogen)

  30. Comment on QuickSort <x x >x L 1 2 … i … … n T(n) = T(i-1) + T(n-i) + cn very sensitive to i, hence x i) choosing x - random - median { L(1), L(n+1/2), L(n) } …………

  31. n n-i i-1 QuickSort …………… … … ko …………………… Other Sort … ii) Quicksort if n ko Other nonrecursive sort, otherwise !!! iii) Manipulate the stack explicitly why? iv) Put the larger subproblem in the stack !!! why?

  32. 4. Merge Sort Divide • [ 1 11 5 21 3 15 12 17 ] • [ [ 1 11 5 21 ] [ 3 15 12 17 ] ] • [ [ [ 1 11 ] [ 5 21 ] ] [ [ 3 15 ] [ 12 17 ] ] ] • [ [ [ [1] [11] ] [ [5] [21] ] ] [ [ [3] [15] ] [ [12] [17] ] ] ] • [ [ [ 1 11 ] [ 5 21 ] ] [ [ 3 15 ] [ 12 17 ] ] ] • [ [ 1 5 11 21 ] [ 3 12 15 17 ] ] • [ 1 3 5 11 12 15 17 21 ] Basic Idea “ Divide and Conquer” Merge P(n)

  33. T(n) = 2T(n/2) + cn  • Time required for dividing and merging • T(n) = O(nlogn) •  Optimal !!! • How about A(n)? • A(n) = O(nlogn) • why? • Is the mergesort optimal in the average case? • well, …

  34. Merging • 1 3 5 7 9 11 • 1 2 3 4 5 6 7 8 9 10 11 • 2 4 6 8 10 • How many comparisons? • 10 comparisons !!! • n+m-1 comparisons in general • Theorem : Any algorithm for merging two sorted lists, each containing n entities, does at least 2n-1 comparisons in the worst case. • [Proof] • (a1, a2, …, an) (a1, b1, a2 b2, …, an , bn) • (b1, b2, …, bn) • ai < bi < ai+1, i = 1,2,…,n-1 • Claim : bi must be compared with ai and ai+1 •  2n-1 comparisons. Why?

  35. Suppose that bi is not compared with ai a1 < b1 < … < ai-1 < bi-1 < ai < bi < ai+1 < … < an < bn a1 < b1 < … < ai-1 < bi-1 < bi < ai < ai+1 < … < an < bn • the same result # • Similarly, bi needs to be compared with ai+1!!!

  36. Is the Mergesort stable? • yes !!! • why? • Is the Mergesort in place? • Yes / no depending on how to deal with “copying”.

  37. …………… …… the average external path length from the root to a leaf Lower bound for SORT in average case epl • Def’n : The external path length of a tree is the sum of the length of all paths from the root to all leaves • Def’n : A binary tree is said to be a 2-tree if every node of the tree is of outdegree 0 or 2 • A decision tree is a 2-tree Decision tree # of leaf nodes

  38. Lemma : Among 2-trees with l leaves, the epl is minimized if all the leaves are on at most two adjacent levels. [Proof] (By Contradiction) Suppose that we have a 2-tree that has a leaf x at level k, where k d - 2 We can always rebuild a 2-tree with the same number of leaves and lower epl. # … k … X X ……………… ……………… … … d-1 … Y Y … d … full binary tree complete binary tree - ( k + 2d ) +( 2(k + 1) + d - 1 ) k + 1 -d < 0

  39. ……………… • Lemma : The minimum epl with l leaves is [Proof] • If l= 2k, kZn, then all the leaves are at level k !!! why? •  k = log2l • Supose that l 2kfor any kZn Then, • why? l  2d • From the previous lemma, all leaves is at level d-1 or d l(d-1) + 2(l-2d-1) why?

  40. Lemma : The average path length in a 2-tree with l leaves is at least log2l [Proof] • Theorem: The average # of comparisons done by an algorithm to sort n • items by comparison of keys is at least lnn! = (nlnn) [Proof] l  n! •  QuickSort and MergeSort are optimal in the average case

More Related