Sorting

Sorting Chapter 7

Motivation of Sorting • The term list here is a collection of records. • Each record has one or more fields. • Each record has a key to distinguish one record with another. • For example, the phone directory is a list. Name, phone number, and even address can be the key, depending on the application or need.

Sorting • Two ways to store a collection of records • Sequential • Non-sequential • Assume a sequential list f. To retrieve a record with key f[i].key from such a list, we can do search in the following order: • f[n].key, f[n-1].key, …, f[1].key => sequential search

Insertion Sort template <class T> void InsertionSort(T *a, constint n) { // 把a[1: n] 排序成遞增順序。 for (int j = 2; j <= n ; j++){ T temp = a[j]; Insert(temp, a, j – 1); } }

Insertion sort Sorted “in place”即n+θ(1)space 例: 5,2,4,6,1,3 … 1,2,4,5,6,3 (key)

Insertion Sort Variations • Binary insertion sort: • Using binary search to reduce the number of comparisons in an insertion sort. • The number of records moves remains the same. • List insertion sort: • The number of record moves becomes zero because only the link fields require adjustment.

Quick Sort • Quicksort的基本想法如下: (Hoare 1962年) 先在A1,A2,…,An中隨機地取出一個元素Ap，然後以Ap為準(叫pivot element)把此數列中，凡是比Ap小的移到左方，比Ap大的移到右方 (比Ap小)………Ap………(比Ap大) 然後recursive的sort左邊及右邊的subsequence再合併即得。 • Hoare同時也想出一個方法可以”in place”處理quicksort

Example i 5,3,2,6,4,1,3,7 j pivot element x=Ap=5 j往前移，若碰到的element比Ap=5大，則不理，繼續往前移，直到碰到的element比Ap=5小則暫停。j暫停時就將i往後移，若碰到比Ap小的則不理，繼續往後移，直到碰到比Ap大的(或等於)又暫停這時「交換」i和j所指的element，然後就該j動作如此循環repeat直到i和j碰頭就partition好了

Example 7.3

Space complexity: O(n) Time complexity? 若固定取頭一個元素作pivot，則worst case T(n)=T(n-1)+θ(n) 發生在input already sorted  O(n2) 如何避免worst case? 如何加速此algorithm? 儘量能找到一個較中間的middle element 例如先在sequence中挑三五個sample取其中位數作為pivot elt. 例如先花點工夫找整個sequence的中位數用它作為pivot elt.

Partition around “Random”Element 希望”average”下表現不錯其實pivot elt.不必是「正中間」，只要左右各佔一個「比例」即可例如若能「每次」 (佔99%n)………Ap………(佔1%n) 則T(n)=T(0.99n)+T(0.01n)+n  T(n)=O(n lgn)

Randomized Version of Quick Sort 在挑選pivot elt.時，用一random number generator在1~n中隨機取一個p，就用Ap作為pivot elt. 如此在分析average case time complexity時就可假設A1…An的n!種大小排列都是equally likely 其實就是用pivot elt. Ap是equally likely to be any of the Ai’s i=1 to n來達到此目的

Decision Tree Model a1:a2 > ≤ a1:a3 a2:a3 > ≤ > ≤ <1,2,3> a1:a3 <2,1,3> a2:a3 ≤ > ≤ > <1,3,2> <3,1,2> <2,3,1> <3,2,1> How many leaves does a decision tree have? 3! 任何一個comparison sorting algorithm都可以想像其對應的decision tree 而此algorithm之worst case # of comparison就是上述decision tree之height

Theorem Any decision tree that sorts n elements has height Ω(n lg n). Proof:There are n! possible outcomes. Thus there are at least n! leaves in the decision tree.A binary tree of height h has at most 2h leaves.Thus, 2h ≥ n!. We have h≥ lg(n!) ≥Θ(n lg n).(The 2nd inequality by Stirling’s approximation: n!>(n/e)n.)

Merge Sort template <class T> voidMergeSort(T *a,constintn) {// 將陣列a[1:n] 排序成非遞減順序 T *tempList = new T[n+1]; // l是目前合併中的子串列之長度 for (intl =1; l < n; l*= 2) { MergePass(a, tempList, n, l); l*=2; MergePass(tempList, a, n, l); // 交換a與tempList的角色 } delete [] tempList; }

Example 7.5

Recursive Merge Sort template <class T> intrMergeSort(T* a, int* link, constintleft, constintright) {// 要排序的是a[left:right]。對於所有i，link[i] 初始化為0。 // rMerge回傳排序好的鏈的第一個元素之索引值。 if (left >= right) returnleft; intmid = (left + right) /2; returnListMerge(a, link, rMergeSort(a, link, left, mid), // 排序左半邊 rMergeSort(a, link, mid + 1, right)); // 排序右半邊 }

Natural Merge Sort

Heap Sort • Heap sort在理論分析上，提供了很多的觀念，在實際上不只是一個有用的sorting algorithm，而且也提供了一個有用的data structure • Heap: A: A nearly complete binary tree with “heap property” A[Parent(i)] ≥ A[i] 大小小

A 16 14 10 8 7 9 3 2 4 1 所以實際實行一個heap就可以簡化很多，不必造binary tree，只需用一個array 就可清楚地執行heap 上面的array A是一heap

16 2 3 14 10 A 4 5 6 7 8 7 9 3 8 9 10 2 4 1 1 2 3 14 10 4 5 6 7 8 7 9 3 8 9 10 如果有了一個array已經形成heap，則有些operation可以很容易處理(意謂在O(lgn) time step內可執行完畢)，而利用它作sorting也快速了例如: (a) 2 4 16

14 1 2 3 8 10 2 3 8 10 4 5 6 7 4 7 9 3 4 5 6 7 4 7 9 3 8 9 10 2 1 16 8 9 10 2 14 16 (b) HEAPIFY

10 2 2 3 8 9 2 3 8 9 4 5 6 7 4 7 1 3 4 5 6 7 4 7 1 3 8 9 10 2 14 16 8 9 10 10 14 16 (c) HEAPIFY

排好且”in place” 問是否stable? (d) HEAPIFY (e) HEAPIFY HEAPIFY (j) HEAPIFY

Heapify(A,2): Heapify(A,4): 1 1 16 16 2 3 2 3 14 10 4 10 4 5 6 7 4 5 6 7 7 9 3 14 7 9 3 4 8 9 10 8 9 10 2 8 1 2 8 1 Heapify(A,9): 1 HEAPIFY(A,n)之time complexity? 直觀worst case就是O(此tree之高) 即O(lgn) 16 2 3 14 10 4 5 6 7 8 7 9 3 8 9 10 2 4 1

Build Heap 1 2 3 4 5 6 7 8 9 10 A 4 1 3 2 16 9 10 14 8 7 Top-down(由左向右) 先將A[1]造成heap，然後A[1]、A[2] A[1] A[2] A[3]都OK後再A[1] A[2] A[3] A[4] 1 4 2 3 4 1 3 4 5 6 7 2 3 2 3 2 16 i 9 10 4 8 9 10 1 i 14 8 7 1 再A[1]~A[5] 發現A[5]較大就把大的元素 “floating up” 1 4 4 1 16 2 3 2 3 2 3 2 3 i 16 3 i 4 3 i 4 5 4 5 4 5 1 16 1 2 1 2

16 2 3 4 9 4 5 6 1 2 i 3 16 2 3 4 10 4 5 6 7 1 2 3 9 A[8]加入，”float up” A[9]加入，”float up” A[10]加入，”float up” 下一步輪到A[6]加入，”float up”得 A[7]加入，”float up”得

Radix Sort • In radix sort, we decompose the sort key using some radix r. • The number of bins needed is r. • Assume the records R1, R2, …, Rn to be sorted based on a radix of r. Each key has d digits in the range of 0 to r-1.

Radix Sort • Assume each record has a link field. Then the records in the same bin are linked together into a chain: • f[i], 0 ≤ i ≤ r (the pointer to the first record in bin i) • e[i], (the pointer to the end record in bin i) • The chain will operate as a queue. • Each record is assumed to have an array key[d], 0 ≤ key[i] ≤ r, 0 ≤ i ≤ d.

Radix Sort 329 457 657 839 436 720 355 720 355 436 457 657 329 839 720 329 436 839 355 457 657 329 355 436 457 657 720 839 每一column之sort必須是stable sort才行! Idea: Sort the least significant digit first.

Sorting

Sorting

Presentation Transcript

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting

Sorting