1 / 24

CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1

CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1. David Kaplan davek@cs. Outline. Outline. Sorting: The Problem Space Sorting by Comparison Lower bound for comparison sorts Insertion Sort Heap Sort Merge Sort Quick Sort External Sorting

kineta
Download Presentation

CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 326: Data StructuresLecture 23Spring Quarter 2001Sorting, Part 1 David Kaplan davek@cs

  2. Outline Outline • Sorting: The Problem Space • Sorting by Comparison • Lower bound for comparison sorts • Insertion Sort • Heap Sort • Merge Sort • Quick Sort • External Sorting • Comparison of Sorting by Comparison

  3. Sorting: The Problem Space • General problem Given a set of N orderable items, put them in order • Without (significant) loss of generality, assume: • Items are integers • Ordering is  (Most sorting problems map to the above in linear time.)

  4. Lower Bound for Sorting by Comparison • Sorting by Comparison • Only information available to us is the set of N items to be sorted • Only operation available to us is pairwise comparison between 2 items • What is the best running time we can possibly achieve?

  5. Decision Tree Analysis of Sorting by Comparison

  6. Max depth of decision tree • How many permutations are there of N numbers? • How many leaves does the tree have? • What’s the shallowest tree with a given number of leaves? • What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm?

  7. Lower Bound for log(n!) Stirling’s approximation:

  8. Insertion Sort Basic idea After kth pass, ensure that first k+1 elements are sorted On kth pass, swap (k+1)th element to left as necessary Start After Pass 1 After Pass 2 After Pass 3 What if array is initially sorted? What if array is initially reverse sorted?

  9. Why Insertion Sort is Slow • Inversion: a pair (i,j) such that i<j butArray[i] > Array[j] • Array of size N can have (N2) inversions • average number of inversions in a random set of elements is N(N-1)/4 • Insertion Sort only swaps adjacent elements • only removes 1 inversion!

  10. HeapSortSorting via Priority Queue (Heap) Worst Case: Best Case: Basic idea: Shove items into a priority queue, take them out smallest to largest.

  11. MergeSort (Table [1..n]) Split Table in half Recursively sort each half Merge two sorted halves together MergeSort Merge (T1[1..n],T2[1..n]) i1=1, i2=1 Whilei1<n, i2<n IfT1[i1] < T2[i2] Next is T1[i1] i1++ Else Next is T2[i2] i2++ End If End While Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first.

  12. MergeSort Analysis • Running Time • Worst case? • Best case? • Average case? • Other considerations besides running time?

  13. QuickSort Picture from PhotoDisc.com Basic idea: Pick a “pivot”. Divide into less-than & greater-than pivot. Sort each side recursively.

  14. QuickSort Partition Pick pivot: Partition with cursors < > 2 goes to less-than < >

  15. QuickSort Partition (cont’d) 6, 8 swap less/greater-than < > 3,5 less-than 9 greater-than Partition done. Recursively sort each side.

  16. Analyzing QuickSort • Picking pivot: constant time • Partitioning: linear time • Recursion: time for sorting left partition (say of size i) + time for right (size N-i-1) T(1) = b T(N) = T(i) + T(N-i-1) + cN where i is the number of elements smaller than the pivot

  17. QuickSort Worst case Pivot is always smallest element. T(N) = T(i) + T(N-i-1) + cN T(N) = T(N-1) + cN = T(N-2) + c(N-1) + cN = T(N-k) + = O(N2)

  18. Optimizing QuickSort • Choosing the Pivot • Randomly choose pivot • Good theoretically and practically, but call to random number generator can be expensive • Pick pivot cleverly • “Median-of-3” rule takes Median(first, middle, last). Works well in practice. • Cutoff • Use simpler sorting technique below a certain problem size (Weiss suggests using insertion sort, with a cutoff limit of 5-20)

  19. QuickSort Best Case Pivot is always middle element. T(N) = T(i) + T(N-i-1) + cN T(N) = 2T(N/2 - 1) + cN

  20. QuickSortAverage Case • Assume all size partitions equally likely, with probability 1/N details: Weiss pg 278-279

  21. External Sorting • When you just ain’t got enough RAM … • e.g. Sort 10 billion numbers with 1 MB of RAM. • Databases need to be very good at this • MergeSort Good for Something! • Basis for most external sorting routines • Can sort any number of records using a tiny amount of main memory • in extreme case, only need to keep 2 records in memory at any one time!

  22. External MergeSort • Split input into two tapes • Each group of 1 records is sorted by definition, so merge groups of 1 to groups of 2, again split between two tapes • Merge groups of 2 into groups of 4 • Repeat until data entirely sorted log N passes

  23. Better External MergeSort • Suppose main memory can hold M records. • Initially read in groups of M records and sort them (e.g. with QuickSort). • Number of passes reduced to log(N/M) • k-way mergesort reduces number of passes to logk(N/M) • Requires 2k output devices (e.g. mag tapes) But wait, there’s more … • Polyphase merge does a k-way mergesort using only k+1 output devices (plus kth-order Fibonacci numbers!)

  24. Sorting by ComparisonSummary • Sorting algorithms that only compare adjacent elements are (N2) worst case – but may be (N) best case • HeapSort - (N log N) both best and worst case • Suffers from two test-ops per data move • MergeSort - (N log N) running time • Suffers from extra-memory problem • QuickSort - (N2) worst case, (N log N) best and average case • In practice, median-of-3 almost always gets us (N log N) • Big win comes from {sorting in place, one test-op, few swaps}! • Any comparison-based sorting algorithm is (N log N) • External sorting: MergeSort with (log N/M) passes

More Related