External Sorting

External Sorting Chapter 13

Why Sort? • A classic problem in computer science! • Data requested in sorted order • e.g., find students in increasing gpa order • Sorting is first step in bulk loading B+ tree index. • Sorting useful for eliminating duplicate copies in a collection of records (Why?) • Sort-merge join algorithm involves sorting. • Problem: sort 1Gb of data with 1Mb of RAM. • why not virtual memory?

External Sort • Definition: When data is larger than memory • External sorting can not be done in one step • Minimize the number of disk accesses! • We concluded that most in-memory sort algs won’t be effective for external sorting.

Principle • Multiple Steps • Split the data into pieces that fit into main memory • Sort the pieces with conventional sorting algorithms • Merge those so called runs and built the completely sorted data set

2-way merge sort example

2-way merge sort example II

2-way merge sort example III

2-Way External Merge Sort: Memory? • Pass 0: Read a page, sort it, write it. • How many buffer pages needed? • Pass 1, 2, 3, …, etc.: • How many buffer pages needed? INPUT 1 OUTPUT INPUT 2 Main memory buffers Result of Pass k+1 Result of Pass k

2-Way Sort: Requires 3 Buffers • Pass 1: Read a page, sort it, write it. • only one buffer page is used • Pass 2, 3, …, etc.: • Read two pages, merge them, and write merged page • three buffer pages used. INPUT 1 OUTPUT INPUT 2 Main memory buffers Disk Disk

2-Way External Merge Sort: Passes? • Assume file is N pages • Run = sorted subfile • What happens in pass Zero? • How many runs are produced? • What is the cost in I/Os? • What happens in pass 1? • What happens in pass i? • How many passes are required? • What is the total cost?

Two-Way External Merge Sort: Cost 6,2 2 Input file 3,4 9,4 8,7 5,6 3,1 PASS 0 1,3 2 1-page runs 3,4 2,6 4,9 7,8 5,6 • Each pass we read + write each page in file. • N pages in the file => the number of passes • So toal cost is: • Idea:Divide and conquer: sort subfiles and merge PASS 1 4,7 1,3 2,3 2-page runs 8,9 5,6 2 4,6 PASS 2 2,3 4,4 1,2 4-page runs 6,7 3,5 6 8,9 PASS 3 1,2 2,3 3,4 8-page runs 4,5 6,6 7,8 9

General External Merge Sort • More than 3 buffer pages. How can we utilize them? • Key Insight #1: We can merge more than 2 input buffers at a time… affects fanout -> base of log! • Key Insight #2: The output buffer is generated incrementally, so only one buffer page is needed for any size of run!

General External Merge Sort • More than 3 buffer pages. How can we utilize them? • To sort a file with N pages using B buffer pages: • Pass 0: use B buffer pages. Produce sorted runs of B pages each. • Pass 2, …, etc.: merge B-1 runs. INPUT 1 . . . . . . INPUT 2 . . . OUTPUT INPUT B-1 Disk Disk B Main memory buffers

Cost of External Merge Sort • Number of passes: • Cost = 2N * (# of passes) • E.g., with 5 buffer pages, to sort 108 page file • Number of passes is (1 +  log4 108/5 ) = 4 • Cost is 2(108)*4 • Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) • Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) • Pass 2: 2 sorted runs, 80 pages and 28 pages • Pass 3: Sorted file of 108 pages

Number of Passes of External Sort

How much memory is needed to sort a file in one or two passes? • For example, if pages are 4KBytes, a 4GByte file can be sorted in Two Passes with ? buffers.

How much memory is needed to sort a file in one or two passes? • N = number of data pages, B = memory pages available • To sort in One pass: N  B • To Sort in Two passes: 1+logB-1N/B 2 • N/B  (B-1)1 • Approximating B-1 by B, this yields • N  B • For example, if pages are 4KBytes, a 4GByte file can be sorted in Two Passes with ? buffers.

Sorting in 2 passes: graphical proof Each run, 1xB pages • File is B pages wide • Each run is B pages • File is x pages high • Merge x runs in pass 1 • xB since we must merge x runs in B pages of memory • So N=xB BB or  NB The File, N pages x B

Memory required for 2-pass sort Assuming page size of 4K

Can we always sort in 1 or 2 passes? • Assume only one query is active, and there is at least 1 gig of physical RAM. • Yes: DB2,Oracle, SQLServer, MySQL • They allocate all available memory to queries • Tricky to manage memory allocation as queries need more memory during execution • No: Postgres • Memory allocated to each query, for sort and other purposes, is fixed by a config parameter. • Sort memory is typically a few meg, in case there may be many queries executing.

Extremes of Sorting One Pass: N =B, 1-pass sort, cost = N N B N Original Data Sorted Data Quick-sort Two Pass: N = B2, 2-pass sort, cost = 3N B B N B 12..B B Sorted Data Original Data B Runs Merge Quicksort into B runs, length B

Extreme Problems • Most sorting of large data falls between these two extremes • If we apply the intergalactic standard sort-merge algorithm, in every textbook, the cost for any dataset with B<N<B2 will be 3N. • Must we always pay that large price? • Might there be an algorithm that is a hybrid of the two extremes?

Hybrid Sort when N  3B • The key idea of hybrid sort is don’t waste memory. • Here is an example of the hybrid sort algorithm when N is approximately 3B. One+ Pass: N  3B, 2-pass sort, cost = 3N – 2B B B N Sorted Data B B Original Data Runs Merge the runs on disk with the run in memory Quicksort into runs, length B, leaving the last run in memory

B B N Sorted Data 12..K B B 12..K Original Data -1 Merge the runs on disk with the run in memory Quicksort into runs, length B, leaving the last run in memory Runs B-k pages Hybrid Sort in general • Let k = N/B • Arrange R so the last run is B-k pages • Cost is N + 2(N – (B-k)) = 3N -2B + 2(N/B) = 3N – 2( B-N/B) • When N=B, cost is N+1. When N=B2, cost is 3N

Maximizing initial runs • Defn: making initial runs as long as possible. • Why is it helpful to maximize initial runs? • Conventional run creation: maximum size of a run is limited by the available size of main memory • Modification (replacement selection): records are replaced in memory to form even longer runs than memory is available. • If initial run size is doubled, what is the time savings? • How can you maximize initial runs?

Replacement Selection

Replacement Selection II

Internal Sort Algorithm • Quicksort is a fast way to sort in memory. • An alternative is “tournament sort” (a.k.a. “heapsort”) • Top:Read in B blocks • Output:move smallest record to output buffer • Read in a new record r • insert r into “heap” • if r not smallest, then GOTO Output • else remove r from “heap” • output “heap” in order; GOTO Top

B More on Heapsort • Fact: average length of a run in heapsort is 2B • The “snowplow” analogy • Worst-Case: • What is min length of a run? • How does this arise? • Best-Case: • What is max length of a run? • How does this arise? • Quicksort is faster, but ...

Imagine a snowplow moving around a circular track on which snow falls at a steady rate. • At any instant, there is a certain amount of snow B on the track. Some falling snow comes in front of the plow, some behind. • During the next revolution of the plow, all of this is removed, plus 1/2 of what falls during that revolution. • Thus, the plow removes 2B amount of snow.

Rectangle is cut in half by the line representing the actual snow level • Level of existing snow represents records in main memory • The volume of snow removed in one circle (namely the length of a run) is twice the amount that is present on the track at any time.

I/O for External Merge Sort • … longer runs often means fewer passes! • Actually, do I/O a page at a time • In fact, read a blockof pages sequentially! • Suggests we should make each buffer (input/output) be a blockof pages. • But this will reduce fan-out during merge passes! • In practice, most files still sorted in 2-3 passes.

Number of Passes of Optimized Sort • Block size = 32, initial pass produces runs of size 2B.

External Sorting Double Buffering • It is often the case that a system will maintain at least two input and two output buffers: • While one input buffer is being read, the previous one is being processed • While one output buffer is being filled, the other is being written

Double Buffering • To reduce wait time for I/O request to complete, can prefetch into `shadow block’. • Potentially, more passes; in practice, most files still sorted in 2-3 passes. INPUT 1 INPUT 1' INPUT 2 OUTPUT INPUT 2' OUTPUT' b block size Disk INPUT k Disk INPUT k' B main memory buffers, k-way merge

Sorting Records! • Sorting has become a blood sport! • Parallel sorting is the name of the game ... • Datamation: Sort 1M records of size 100 bytes • Typical DBMS: 15 minutes • World record: 3.5 seconds • 12-CPU SGI machine, 96 disks, 2GB of RAM • New benchmarks proposed: • Minute Sort: How many can you sort in 1 minute? • Dollar Sort: How many can you sort for $1.00?

Using B+ Trees for Sorting • Scenario: Table to be sorted has B+ tree index on sorting column(s). • Idea: Can retrieve records in order by traversing leaf pages. • Is this a good idea? • Cases to consider: • B+ tree is clusteredGood idea! • B+ tree is not clusteredCould be a very bad idea!

Clustered B+ Tree Used for Sorting • Cost: root to the left-most leaf, then retrieve all leaf pages (Alternative 1) • If Alternative 2 is used? Additional cost of retrieving data records: each page fetched just once. Index (Directs search) Data Entries ("Sequence set") Data Records • Always better than external sorting!

Unclustered B+ Tree Used for Sorting • Alternative (2) for data entries; each data entry contains rid of a data record. In general, one I/O per data record! Index (Directs search) Data Entries ("Sequence set") Data Records

External Sorting vs. Unclustered Index • p: # of records per page • B=1,000 and block size=32 for sorting • p=100 is the more realistic value.

Summary • External sorting is important; DBMS may dedicate part of buffer pool for sorting! • External merge sort minimizes disk I/O cost: • Pass 0: Produces sorted runs of size B(# buffer pages). Later passes: merge runs. • # of runs merged at a time depends on B, and block size. • Larger block size means less I/O cost per page. • Larger block size means smaller # runs merged. • In practice, # of runs rarely more than 2 or 3.

Summary, cont. • Choice of internal sort algorithm may matter: • Quicksort: Quick! • Heap/tournament sort: slower (2x), longer runs • The best sorts are wildly fast: • Despite 40+ years of research, we’re still improving! • Clustered B+ tree is good for sorting; unclustered tree is usually very bad.

External Sorting