290 likes | 386 Views
Analyzing sorting algorithms. Recall that T(n) represents the time required for an algorithm to process input of size n. For a sorting algorithm, T(n) represents the time needed to sort. this time may be the worst-case time, the average-case time, or something else .
E N D
Analyzing sorting algorithms • Recall that T(n) represents the time required for an algorithm to process input of size n. • For a sorting algorithm, T(n) represents the time needed to sort. • this time may be the worst-case time, the average-case time, or something else
Comparison-based sorting algorithms • Many sorting algorithms are comparison based – they depend on being able to compare pairs of elements. • this constraint is naturally represented in Java by the Comparable and Comparator interfaces • In this case we often let T(n) be the number of element comparisons.
Divide-and-conquer sorting algorithms • For divide-and-conquer, we define T(n) in terms of the value of T on smaller arguments. • For example, selection sort applies divide-and-conquer to the output, as follows: • find a piece of size 1 of the output sequence (find the smallest item and swap with the first item) • sort the remaining n-1 elements • Here T(n) = n-1 + T(n-1), where T(1) = 0 • since the first phase uses n-1 comparisons, and the second phase takes time T(n-1)
Insertion sort • The insertion sort algorithm has the following form: • insert a piece of size 1 (usually, the last element) • into the result of sorting a piece of size n-1 • This gives the same constraint as above for T(n) • if T(n) is interpreted as the worst-case number of comparisons • note that this worst-case can be achieved
Time complexity of insertion sort and selection sort • We need to solve the recurrence relation T(n) = n-1 + T(n-1); T(1) = 0 • But this is the relation that describes the sum of the first n nonnegative integers • cf. the first S expression on p.5 of Weiss, but with index bounds 0 and n-1 • So T(n) = (n-1)n/2 • And selection sort is Q(n2) • And insertion sort is Q(n2) in theworst case
Other divide-and-conquer sorting algorithms • Mergesort: • sort two pieces of size n/2 • merge the sorted pieces • Quicksort constructs the output in a divide-and-conquer way: • preprocess the input so that small items are to the left and large items are to the right • sort both pieces
Sorting algorithms we’ve seen • Binary search tree (BST) sort • insert all items into a BST, and traverse • time complexity is Q(n2) in worst case • and Q(n log n) in best case • Heapsort • build a heap, then delete all items • building a heap takes time Q(n) • deletions take time Q(n log n) in the worst case • so sorting also has worst-case time Q(n log n)
Nonrecursive sorting algorithms • Insertion sort, selection sort, and mergesort are easy to formulate nonrecursively. • Quicksort can be formulated recursively by using an explicit stack • some optimization is possible by doing so.
Bottom-up mergesort • Mergesort is easy to state in a bottom-up manner. • Initial sorted subsequences (runs) may be created in any convenient manner • or may simply be taken to be sequences of size 1 • A single pass merges pairs of adjacent runs may into larger runs • passes may copy data alternately into and output a temporary array
Space use in mergesort • Top-down: all recursive calls can share the same temporary array • those subarrays that overlap in time don’t overlap in space • In the bottom-up version, passes may copy data alternately into and out of a temporary array
Time complexity of mergesort • In the bottom-up version, there are Q(log n) passes, each requiring time Q(n), for Q(n log n) time altogether. • The data flow here (and in also in the top-down case) may be modeled by a binary merge tree of height Q(log n). • where each level takes time Q(n) to process • The relevant recursion T(n) = 2T(n/2) + cn has solution T(n) = Q(n log n)
Merge tree (bottom up, n = 19) XXXXXXXXXXXXXXXXXXX / \ XXXXXXXXXXXXXXXX XXX / \ | XXXXXXXX XXXXXXXX XXX / \ / \ | XXXX XXXX XXXX XXXX XXX /\ /\ /\ /\ / \ XX XX XX XX XX XX XX XX XX X
Merge tree (top down, n = 19) XXXXXXXXXXXXXXXXXXX / \ XXXXXXXXX XXXXXXXXXX / \ / \ XXXX XXXXX XXXXX XXXXX / \ / \ / \ / \ XX XX XX XXX XX XXX XX XXX XX / \ / \ / \ X XX X XX X XX
Quicksort • Recall that for quicksort, a preprocessing step is needed • to get small elements to the left of the array • and large elements to the right of the array • A partitition function performs this step. • Partition compares each array element to a pivot element. • pivot elements usually come from the input array • If so, partition can put them between the small and large items
Quicksort details • Small input needn’t be sorted recursively • another sorting algorithm can be used • “small” means of size less than about 10 or 20 • Partition typically works in terms of two index variables i and j • i starts at the left and moves right, looking for large values to move right • j starts at the right and moves left, looking for small values to move left • the loops moving i and j stop when i and j cross
Quicksort issues • how to choose the pivot • the pivot element should be unlikely to be large or small (even for nonrandom input) • how to initialize i and j • what if i or j finds a copy of the pivot? • how to keep i and j from passing the end of the array • where does the pivot element go?
Weiss suggests: • letting the pivot be the median of the left, center, and right elements • sorting these three values in place • swapping the pivot with the element in position right-1
Weiss also says: • i and j should be advanced first, and then referenced • When either i or j sees the pivot element, it should stop • Explicit tests shouldn’t be needed to keep i and j from running off the end fo the array • instead, sentinels should be available • The pivot element should be swapped into position i
Bucket sort and radix sort • There's an important family of sorting algorithms that don’t depend on comparing pairs of elements • If the elements being sorted needn't all be distinct, these algorithms can run in time faster than n log n
Conditions for bucket sort • Bucket sort can be used when there is a function f that assigns indices to input elements so that if A <= B, f(A) <= f(B). • Here f is similar to a hash function. • It's used as an index into a table, where the table locations are called buckets. • However f is supposed to preserve regularity, while a hash function is supposed to destroy it.
Two special cases • For a character string s, f(s) can be the first character of s • or its character code • For an integer i, f(i) can be the leftmost digit of i • provided that integers are padded with leading 0s.
Bucket sort • The top-down bucket sort algorithm is then very simple: • assign elements to buckets • sort the buckets (perhaps recursively) • append the sorted buckets • For both strings and integers, recursive sorting of the buckets is possible • by ignoring the first character(s) or digit(s)
Radix sort • There’s also a bottom-up version of bucket sort called radix sort, which is easiest to state for character strings of the same length p: • for i from p down to 1 • for each string s, assign s to the bucket corresponding to its ith character • concatenate the buckets into an output list • clear each bucket • For b buckets, the time is Q(b+n) per iteration and thus Q(p(b+n)) overall
Radix sort details • Concatenation is easiest if linked lists are used for the individual buckets. • It is important that distribution into buckets be stable – elements should appear in the buckets in the order of the original input. • If strings have different lengths, they can be padded (explicitly or implicitly) with nulls on the right
Radix sort analysis • Note that if p and b are independent of n, then radix sort has Q(n) time complexity • However if p is independent of n, then there can be at most Q(bp) distinct strings. • So if all strings are distinct, then n is O(bp), so p is W(log n). • And thus the time complexity is W(n log n)
Selection using bucket sort • Top-down bucket sort can easily be converted to a selection algorithm • To find the kth smallest item, distribute the items into buckets, counting the number of buckets • Then select recursively from the appropriate bucket, replacing k by a value that depends on the counts of the preceding buckets
Radix sort example • To sort: • 123, 12, 313, 321, 212, 112, 221, 132, 131 • Pass 1 assignment to buckets: • 0: • 1: 321, 221, 131 • 2: 12, 212, 112, 132 • 3: 123, 313 • Concatenated result • 321, 221, 131, 12, 212, 112, 132, 123, 313
Pass 2 • From previous pass • 321, 221, 131, 212, 112, 132, 123, 313 • Pass 2 assignment to buckets: • 0: • 1: 12, 212, 112, 313 • 2: 321, 221, 123 • 3: 131, 132 • Concatenated result • 12, 212, 112, 313, 321, 221, 123, 131, 132
Pass 3 • From previous pass • 12, 212, 112, 313, 321, 221, 123, 131, 132 • Pass 3 assignment to buckets: • 0: 12 • 1: 112, 123, 131, 132 • 2: 212, 221 • 3: 313, 321 • Concatenated result • 12, 112, 123, 131, 132, 212, 221, 313, 321