60 likes | 207 Views
Lecture 6 : External Sorting. Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University. External Sorting. Sorting algorithm that can handle massive amounts of data (using external memory) Required when data does not fit into main memory
E N D
Lecture 6 : External Sorting Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University
External Sorting • Sorting algorithm that can handle massive amounts of data (using external memory) • Required when data does not fit into main memory • out-of-core algorithm vs in-core algorithm
Motivation • Sometimes the data to sort are too large to fit in memory (Why not virtual memory?) • Use external memory (disk) • Disk performance • seek time (major factor) • rotational latency • Transfer • Primary rule for disk access • Minimize the number of disk accesses • Assume external(secondary) memory is divided into equal sized blocks (ex. 1KB, 4KB, …) • Block : unit where data is stored and retrived
External Merge Sort : Idea • EX) sorting 900MB of data using only 100MB of RAM: • Read 100 MB of the data in main memory and sort by some conventional method (usually quicksort). • Write the sorted data to disk. • Repeat steps 1 and 2 until all of the data is sorted in 100 MB chunks, which now need to be merged into one single output file. • Read the first 10 MB of each sorted chunk (call them input buffers) in main memory (90 MB total) and allocate the remaining 10 MB for output buffer. • Perform a 9-way merging and store the result in the output buffer. If the output buffer is full, write it to the final sorted file. If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100 MB sorted chunk or otherwise mark it as exhausted if there is no more data in the sorted chunk and do not use it for merging.
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 S10 S1 S2 S3 S4 S5 S6 S7 S8 S9 T4 T5 T1 T2 T3 U3 U2 U1 V2 V1 2-way merge sort • # of passes : 5 W1
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 S1 S2 S3 S4 5-way merge sort T1 • we can reduce # of passes