130 likes | 145 Views
Distribution Sorts Records are partially sorted into “clusters” (sometimes called “bins” or “buckets”) which are then sorted and catenated into the sorted file. Merge Sort
E N D
Distribution Sorts Records are partially sorted into “clusters” (sometimes called “bins” or “buckets”) which are then sorted and catenated into the sorted file. Merge Sort First sort each subfile and then “shuffle” them in the right order, two at the time. Depending on how subfiles are chosen initially, have variations. Straight Merge: Initial subfiles are given records. Natural Merge(one-way, two-way) Subfiles are largest chunks of ordered records in given file.
Radix Sort Keys are assumed to be strings of same length. Subfiles are chosen on lexicographic ordering andthe process is iterated through last symbol in the key, after which sorting the array is sorted. Merge Sort A subclass of Divide and Conquer 1. Easy division ( a half) 2. Hard combination and merge! Steps: A/ Divide into two halves (A and B) B/ Sort both sections C/ Merge the two ordered lists.
Then the original problem merge for (m+k) is reduced to (m+k -1) by recursions. ==THE ALGORITHM IS COMPLETE MEMORY; extra space to copy to C1!
Merge Sort (contd) Worst Case Scenario:W(n) = n-1 (each key in C has 1 comparison except the last ) This is also optional in a sense: When merge 2 arrays k=m=n/2 (same size) any algorithm does atleast n-1 steps (in the worst case)Merge SortW(n) = 2W([n/2]) + (n-1) = (n-1) + 2 [n/2 – 1 + W….(check) =……… ==N log n order NB: This worst of MS is same order as QS(check)
COMPARE Quick Sort Average [ A(n) ~ l.36nlogn – 2.84n] Merge Sort Worst [W(n) ~ 1. n log n – 0.91n] WMS (n) = 70% AQS (n) i.e Merge Sort worst case is almost 70% better than Quick Sort average case. OPTIMALITY – Lower Bounds Is O(n log n) best possible ? on sorting(comparison = based)
OPTIMALITY - Lower BoundsIs O(n log n) best possible ? on sorting (comparison = based) Associate to each sorting algorithm a binary tree ( a node per comparison). The leaves are labeled with a permutation of keys. A run on an instance is a path from root to the right leaf indicating the permutation needed to sort the file.X1, X2, X3 There must be n! leaves. Can assume paths never followed don’t exist (check). Thus Cn = MAX != comps = depth of tree d since != nodes in tree <= 2 (power d), it follows that (the equation is provided below the figure)
HEAP SORT A heap of size n is an almost complete binary tree of n nodes such that the content of eachnode is less than or equal than content of its parent. Thus the root of the heap is the largest element of all nodes. The heap sort proceeds in two phases: 1, Build a heap whose nodes contain input file in induced “array order” 2.In order to arrive at a heap which will contain sorted file in “array order”, successively 2.1 Swap the largest nonsorted element (at root) and highest indexed nonsorted record. 1. 2.2 Read just arising tree so that it remains a heap.
BUCKET SORTDesired: UNIFORM DISTRIBUTION OF BUCKETS: N1 = N2 = ...= Nn Then =
RADIX SORTING (introduction) until now : elementary operation is comparison of 2 keys. This section: additional info available! Ex: range of keys : 26 letters, 10 digits... STEPS: (BUCKET SORT) A. Distribution Keys (k buckets) ~ Theta (n) B. Sort buckets separately (equation) C. Combine buckets O(n)
RADIX SORT (contd) -strings of digital numbers - sort digit by digit – start from the end! - # of buckets: radix or base number. Analysis: Distribution into buckets ~ n - combine ~ n - sort ~ n ------------------ ~ linear in n - Extra Space theta(n)