240 likes | 473 Views
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1. David Kaplan davek@cs. Outline. Outline. Sorting: The Problem Space Sorting by Comparison Lower bound for comparison sorts Insertion Sort Heap Sort Merge Sort Quick Sort External Sorting
E N D
CSE 326: Data StructuresLecture 23Spring Quarter 2001Sorting, Part 1 David Kaplan davek@cs
Outline Outline • Sorting: The Problem Space • Sorting by Comparison • Lower bound for comparison sorts • Insertion Sort • Heap Sort • Merge Sort • Quick Sort • External Sorting • Comparison of Sorting by Comparison
Sorting: The Problem Space • General problem Given a set of N orderable items, put them in order • Without (significant) loss of generality, assume: • Items are integers • Ordering is (Most sorting problems map to the above in linear time.)
Lower Bound for Sorting by Comparison • Sorting by Comparison • Only information available to us is the set of N items to be sorted • Only operation available to us is pairwise comparison between 2 items • What is the best running time we can possibly achieve?
Max depth of decision tree • How many permutations are there of N numbers? • How many leaves does the tree have? • What’s the shallowest tree with a given number of leaves? • What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm?
Lower Bound for log(n!) Stirling’s approximation:
Insertion Sort Basic idea After kth pass, ensure that first k+1 elements are sorted On kth pass, swap (k+1)th element to left as necessary Start After Pass 1 After Pass 2 After Pass 3 What if array is initially sorted? What if array is initially reverse sorted?
Why Insertion Sort is Slow • Inversion: a pair (i,j) such that i<j butArray[i] > Array[j] • Array of size N can have (N2) inversions • average number of inversions in a random set of elements is N(N-1)/4 • Insertion Sort only swaps adjacent elements • only removes 1 inversion!
HeapSortSorting via Priority Queue (Heap) Worst Case: Best Case: Basic idea: Shove items into a priority queue, take them out smallest to largest.
MergeSort (Table [1..n]) Split Table in half Recursively sort each half Merge two sorted halves together MergeSort Merge (T1[1..n],T2[1..n]) i1=1, i2=1 Whilei1<n, i2<n IfT1[i1] < T2[i2] Next is T1[i1] i1++ Else Next is T2[i2] i2++ End If End While Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first.
MergeSort Analysis • Running Time • Worst case? • Best case? • Average case? • Other considerations besides running time?
QuickSort Picture from PhotoDisc.com Basic idea: Pick a “pivot”. Divide into less-than & greater-than pivot. Sort each side recursively.
QuickSort Partition Pick pivot: Partition with cursors < > 2 goes to less-than < >
QuickSort Partition (cont’d) 6, 8 swap less/greater-than < > 3,5 less-than 9 greater-than Partition done. Recursively sort each side.
Analyzing QuickSort • Picking pivot: constant time • Partitioning: linear time • Recursion: time for sorting left partition (say of size i) + time for right (size N-i-1) T(1) = b T(N) = T(i) + T(N-i-1) + cN where i is the number of elements smaller than the pivot
QuickSort Worst case Pivot is always smallest element. T(N) = T(i) + T(N-i-1) + cN T(N) = T(N-1) + cN = T(N-2) + c(N-1) + cN = T(N-k) + = O(N2)
Optimizing QuickSort • Choosing the Pivot • Randomly choose pivot • Good theoretically and practically, but call to random number generator can be expensive • Pick pivot cleverly • “Median-of-3” rule takes Median(first, middle, last). Works well in practice. • Cutoff • Use simpler sorting technique below a certain problem size (Weiss suggests using insertion sort, with a cutoff limit of 5-20)
QuickSort Best Case Pivot is always middle element. T(N) = T(i) + T(N-i-1) + cN T(N) = 2T(N/2 - 1) + cN
QuickSortAverage Case • Assume all size partitions equally likely, with probability 1/N details: Weiss pg 278-279
External Sorting • When you just ain’t got enough RAM … • e.g. Sort 10 billion numbers with 1 MB of RAM. • Databases need to be very good at this • MergeSort Good for Something! • Basis for most external sorting routines • Can sort any number of records using a tiny amount of main memory • in extreme case, only need to keep 2 records in memory at any one time!
External MergeSort • Split input into two tapes • Each group of 1 records is sorted by definition, so merge groups of 1 to groups of 2, again split between two tapes • Merge groups of 2 into groups of 4 • Repeat until data entirely sorted log N passes
Better External MergeSort • Suppose main memory can hold M records. • Initially read in groups of M records and sort them (e.g. with QuickSort). • Number of passes reduced to log(N/M) • k-way mergesort reduces number of passes to logk(N/M) • Requires 2k output devices (e.g. mag tapes) But wait, there’s more … • Polyphase merge does a k-way mergesort using only k+1 output devices (plus kth-order Fibonacci numbers!)
Sorting by ComparisonSummary • Sorting algorithms that only compare adjacent elements are (N2) worst case – but may be (N) best case • HeapSort - (N log N) both best and worst case • Suffers from two test-ops per data move • MergeSort - (N log N) running time • Suffers from extra-memory problem • QuickSort - (N2) worst case, (N log N) best and average case • In practice, median-of-3 almost always gets us (N log N) • Big win comes from {sorting in place, one test-op, few swaps}! • Any comparison-based sorting algorithm is (N log N) • External sorting: MergeSort with (log N/M) passes