Basic and Advanced Sorting Algorithms: Ideas and Implementations

Algorithmsand data structures Sorting – some basic ideas, Basic algorithms for table sorting, Advanced algorithms for table sorting, Priority queues, Linear sorting, Order statistics

Sorting • Order of elements • Key – part of data that has defined order for • Problem complexity – complexity of best known algorithm • Sorting of arrays vs. files • Stablility • Median • Order statistic, percentile, quartile, decile

Indexes 1 0 1 7 2 7 2 3 2 7 4 4 1 8 8 11 11 4 9 7 6 5 2 5 • Direct element sorting • Sorting of index direct index

Classicalgorithms for arrays • Insertion Sort • Each time next element from unsorted part is inserted to the sorted sorted (at proper position) • Selection Sort • Each time the greatest (smallest) ellement from unsorted part is added at the end of the sorted part • Exchange Sort or Bubble Sort • Swap of neighbars (that are in reverse order) ultil no swap is possible N. Wirth „Algorytmy+Struktury danych = programy”

Bubble sort 1 7 4 2 2 7 7 4 8 11 7 2 7 11 11 2 I for j in range(0,n): for i in range(0,n-1): if(A[i] > A[i+1]): A[i],A[i+1] = A[i+1],A[i] II III IV

Bubble sort 1 1 7 7 4 2 4 7 2 2 2 2 7 8 7 7 7 8 4 4 8 8 11 11 7 2 7 2 7 7 11 11 11 11 2 2 for j in range(0,n): for i in range(0,n-1): if(A[i] > A[i+1]): A[i],A[i+1] = A[i+1],A[i] I II

Bubble sort //basic wariant for j in range(0,n): for i in range(0,n-1): if(A[i] > A[i+1]): A[i],A[i+1] = A[i+1],A[i] //improved variant: inner loop skips ordered part for j in range(n-1,-1,-1): for i in range(0,j): if(A[i] > A[i+1]): A[i],A[i+1] = A[i+1],A[i]

Bubble sort //improved variant – detection of sudden end for j in range(n-1,-1,-1): change= False for i in range(0,j): if(A[i] > A[i+1]): A[i],A[i+1] = A[i+1],A[i] change = True if not change: break

Bubble sort Properties: • Simple implementation • Worst case performace O(n2) • Short code – small proportional constants • Stable • Worst behavior between simple methods

Insertion Sort 1 7 2 8 11 4 17 22 • Stable • Worst case performace O(N2) • Less writings than bubble sort for i in range (1, n) x = A[i] p = find_place_for(x) A[p] = x 2 X = 1 X = 7 X = 2 7

Insertion Sort 1 2 7 8 11 4 17 22 • Stable • Worst case performace O(N2) • Less writings than bubble sort for i in range (1, n) x = A[i] p = find_place_for(x) A[p] = x X = 2 X = 8 X = 11 X = 4 4 7 8 11

Insertion Sort def InsertionSort(A, n):for i in range (1, n): x = A[i] j=iwhile j>0 and x<A[j-1]A[j] = A[j-1] j=j-1 A[j] = x

Shell Sort • Improved Insertion sort • Pick k = kp • Perform insertion sort for k subarrays (1, k+1, 2k+1, ...), (2, k+2, 2k+2, ...), ... • Decreasing k until it reaches 1 • Performace is determined by the kp and the way we decrease k • kp = 1 basic insertion sort • for ki = 2i - 1 (....., 31, 15, 7, 3, 1) rthe complexity is O(N1.2) • for ki = 2r3q (r,qN) (......, 20, 16, 12, 9, 8, 6, 4, 3, 2, 1)the complexity is O(N log2N) • Stable

Shell Sort 1 2 7 4 8 11 4 7 2 for k in [ ..., 31, 15, 7, 3, 1] : insertion_sort_for_k_subarrays(A) #A[0],A[k],A[2k] ... #A[1],A[k+1],A[2k+1] ... #... #A[k-1],A[2k-1],A[3k-1] ... • Stable • Worst case performace O(N1.2) k=7

Shell Sort 1 1 2 2 4 7 4 8 8 2 7 11 11 4 4 7 7 8 11 2 2 for k in [ ..., 31, 15, 7, 3, 1] : insertion_sort_for_k_subarrays(A) #A[0],A[k],A[2k] ... #A[1],A[k+1],A[2k+1] ... #... #A[k-1],A[2k-1],A[3k-1] ... • Stable • Worst case performace O(N1.2) k=3 k=7

Shell Sort def ShellSort (A, n):k = 2log2n - 1while k>= 1:for i in range(k,n): x = A[i]j = iwhile j >=k and x<A[j-k]:A[j] = A[j-k]j = j-kA[j] = x k = (k+1)/2 – 1

Quick Sort Improved exchange sort • Splitarray A[p..r] into two A1[p..q] and A2[q+1..r],where,each element of A1is not greater than each element of A2 • Sort A1 i A2

Quick Sort - implementation def QuickSort(A, p, r):if (p<r) : q = Partition(A, p, r) QuickSort (A, p, q) QuickSort (A, q+1, r) QuickSort(Tablica, 0, N-1)

Partition • Pick element called Pivot • Going up find first element >= Pivot • Going down find first element <= Pivot • Swap found elements. • Finish when searches come after it • What if Piviot is min/max?

Partition - Implementation def Partition(A, l, r): x = A[l] l_m = l-1 r_m = r+1 while True: while True: l_m = l_m +1 if A[l_m] >= x : break while True: r_m = r_m -1 if A[r_m] <= x : break if l_m < r_m : A[l_m],A[r_m] = A[r_m],A[l_m] else: return r_m

Randomized partition • It prevents the intentional preparing the worst case scenario (however such a data still can occur) def RandomPartition(A, l, r):i = Random(l, r)A[i],A[l] = A[l],A[i]return Partition(A, l, r)

Quick Sort - properties • Averrage performance – O(N log N) • Worst case performance - O(N2) • Simple algorritm – small overhead • Worst case:Partition every time splits array into subaarays k-1 and 1 • Randomized Partition – not vunerable for attack • To limit the memeory reqirements (stack, worst case ~N) it is a good idea to sort smaller subarray first and than the bigger one (at most log N) • Stability depends on the Partition algorithm and implementation

Heap 1 3 2 7 5 6 4 15 9 9 10 8 13 8 1 7 10 4 2 3 Every element is not less(greater)than its children A[Parent(i)] >= A[i] 15 9 7 10 2 13 8 1 4 3 def Parent(i): return i//2 def Left(i): return i*2 def Right(i): return i*2+1 WARNING index = 1..size

Heapify A: the heap property is satisfied for all the elements except for root 1) Start_from_root (W=Root) 2) if W_is_not_smaler_than_its_soons: stopelseswap_W_with_greater_soon repeat 2) for the W (in the new place)

Heapify 1 1 1 15 3 3 3 2 2 2 5 14 15 7 7 7 5 5 5 6 6 6 4 4 4 10 3 4 9 10 14 9 9 9 10 10 10 8 8 8 15 8 2 1 5 3 4 9 10 14 8 2 1 8 3 4 9 5 2 1

Heapify - implementation def Heapify(A, i, size):L = Left(i)R =Right(i) if L<=sizeand A[L-1]>A[i-1]: maxps = Lelse: maxps = iif R<=size and A[R-1]>A[maxps-1]: maxps = Rif maxps != i:A[i-1],A[maxps-1] = A[maxps-1],A[i-1] Heapify(A, maxps, size)

Heap Sort • Heap building • Starting from the end, performing Heapify for the subsequent heaps • Repeatedly removing the largest element from the heap (root) and putting in sorted part • Root is swapped with the last element of current heap • Size of heap is decreased • The heapify is execued

Building of Heap 5 1 11 7 10 11 10 8 3 2 3 2 4 6 8 7 6 4 5 1 1 1 1 1 1 5 11 3 3 3 3 3 2 2 2 2 2 1 10 3 7 7 7 7 7 7 5 6 5 5 5 5 6 6 6 6 4 4 4 4 4 2 4 11 7 6 4 8 3 9 9 9 9 9 8 8 8 10 10 10 10 8 10 8 6 10 8 5 2 1 5 5 5 1 3 11 7 1 7 10 4 11 7 10 4 8 3 10 4 11 3 6 2 8 6 2 1 6 2 8

Sorting of Heap 8 6 1 3 2 7 5 4 10 11 11 10 8 3 2 7 6 4 5 1 1 11 3 2 10 7 7 5 6 4 6 4 8 3 9 8 10 5 2 1 1 1 1 1 10 1 5 8 3 3 3 3 2 2 2 2 10 8 7 7 8 6 7 7 7 7 7 7 5 5 5 5 6 6 6 6 4 4 4 4 6 6 4 4 1 8 3 3 6 5 4 4 1 1 3 3 9 9 8 8 8 8 5 5 2 2 11 11 10 10 2 2 11 11

Heap Sort - implementation def BuildHeap(A, size): for i in range(Parent(size),0,-1): Heapify(A, i, size) def HeapSort(A, size): BuildHeap(A, size) for i in range(size,1,-1) : A[i-1], A[0] = A[0], A[i-1] Heapify(A, 1, i-1)

Merge Sort • Does not require random access (i.e. files, lists) Split phase A -> B,C Merge B+C -> sequence of pars Split of sequence of pars - > B2,C2 Merge B2,C2 -> sequence of fours Split of sequence of fours - > B2,C2 Merge B4,C4 -> sequence of eights ….

Merge Sort we: 40, 60, 10, 45, 90, 20, 09, 72 split: (40) (60) (10) (45) (90) (20) (09) (72) merge: (40 90) (20 60) (09 10) (45 72) split: (40 90) (20 60) (09 10) (45 72) merge: (09 10 40 90) (20 45 60 72) split : (09 10 40 90) (20 45 60 72) merge: (09 10 20 40 45 60 72 90)

Priority Queue Example of ussage: processing tasks, orders Required operations: • Add new element with priority X • Get element with greater priority • Sneak element with greater priority (without removing) Implementation with heap: Put:new operation add to heap Odczyt:return A[0] i rebuild heap

Addding to Heap 1) add_new_element_W_at_te_end_of_heap 2) if heap_property_is_satisfied_\ for_W_and_its_father or\ W_is_the_root: stop else: swap_W_with_its_father repeat_2_for_w_in_the_new_place

Adding to Heap 13 13 11 10 10 7 3 3 2 2 7 7 6 6 4 4 5 5 1 1 11 1 1 13 13 3 3 2 2 11 10 7 7 7 7 7 5 5 6 6 4 4 6 6 4 4 7 10 3 3 9 9 8 8 10 10 5 5 2 2 1 1 11 7

Get 1) tmp = root 2) root = last_element & size = size-1 3) Heapify(Root) 4) return tmp

Priority Queue - implementation def HeapInsert(A, size,newElement): if size>=MAXSIZE: ERROR „przepelnienie kopca” else: size=size+1 i = size while i>1 and A[Parent(i)-1]<newElement : A[i-1] = A[Parent(i)-1] i = Parent(i) A[i-1] = newElement return size

Priority Queue - implementation def HeapGetMax(A, size): if size<1: ERROR "kopiec pusty" else: element max = A[0] A[0] = A[size-1] size = size-1 Heapify(A, size) return max,size

Linearsort? Sometimes some additional knowledge allows sorting in linear time

CountingSort • Priority Queue - implementation • Keys are 0..MaxKey • MaxKey is reasonable (in comparison with the number of elements) 1) counts (in array A) the number of occurences of particular keys 2) write the sequence of keys (base on A)

Bucket Sort • Performance - O(N)? • Assumption : keys are uniformly distributed in range x0 .. x1 • Split key domain into n subintervals of lengthp = (x1 - x0) / n, where i-th subinterval is < x0+(i-1)*p, x0+i*p), for i = 1..n-1, and the n-th is < x1-p, x1 > • Create n lists (buckets) • Distribute all the elements among list – each list is associated with particular subinterval • Sort all the lists (any algorithm) • Write content of particular lists in order 1,2 .. n

Max (or Min) def Minimum(A, size):m = A[0]for i in range(1,size):if (m>A[i]): m=A[i] return m

Max and Min def MinMax(A, size): mi= A[0]ma= A[0] for i in range(2,size+2,2) :if A[i]<A[i-1]: mi = min (A[i], mi) ma = max (A[i-1], ma) else: mi = min (A[i-1], mi) ma = max (A[i], ma)return mi,ma

Returning multiple values (C++) struct MM { ELEMENT min, max; MM(ELEMENT emin, ELEMENT emax) {// constructor) Min = emin; Max = emax; } }; MM Minmax(ELEMENT A[], INDEX size) { ..... return MM(min, max); }

I-th order statistic • Averrage performance = O(n) • Worst case performance = O(n2) • RandomizedPartition could be used as well def SearchStatistics(A, i, l,r): if l == r: return A[p] q = Partition(A, l, r) k = q–l+1 if i<=k: return SearchStatistics(A, p, q, i) else: return SearchStatistics(A, q+1, r, i-k)

I-th order statistic • Worst case performance - O(n) ModifiedPartition • Split emements l .. r into n/5 5 element grups O(n) • Compute mean of each n/5 grups, by sorting (O(1) per group). If the last one contains even number of elements pick the greater median of two. O(n) • X = median of median of median etc. Computed recursivelly O(n). • Execute modified partition against Pivot = X - O(n)

Median of median of median ... x Notes: • At least half of medians (5th elements groups) is greater or equal to x • At least 0.5* n/5 grups contains 3 elements greater or equal to x • At least one of these groups contains x, and one can contain less than 5 elements • Thus the number of elemets greater or equal to X is at least 3 * (0.5*n/5 - 2)  3n/10 – 6 • For elements smaller or equal to X proof is similar

Basic and Advanced Sorting Algorithms: Ideas and Implementations

Basic and Advanced Sorting Algorithms: Ideas and Implementations

Presentation Transcript

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Algorithms and Data Structures

DATA STRUCTURES AND ALGORITHMS

Algorithms and Data Structures

Data Structures and Algorithms

Data Structures and Algorithms

Algorithms and Data Structures

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Data Structures and Algorithms

Algorithms and Data Structures