230 likes | 348 Views
Learning Objectives. Co-sequential processing matching merging K-way merge Overlapping I/O with Processing heap sort. Co-sequential processing. Co-sequential operations are operations applied to problems that involve : union intersection other complex operations
E N D
Learning Objectives • Co-sequential processing • matching • merging • K-way merge • Overlapping I/O with Processing • heap sort CPSC 231 Co-sequentail processing (D.H.)
Co-sequential processing • Co-sequential operations are operations applied to problems that involve : • union • intersection • other complex operations on two or more sorted input files to produce one or more output file. CPSC 231 Co-sequentail processing (D.H.)
Matching as an example of co-sequential processing • Suppose that we want to create a file that consists of names common to the two lists of names (or two secondary index files) List 1 List 2 Beethoven Beethoven Chopin Brahms Dvorak Corea Mozart Prokofiev Prokofiev Stravinsky Rimsky-Korsakov CPSC 231 Co-sequentail processing (D.H.)
Matching example • What should be the output file for the above example? List 1 List 2 Output Beethoven Beethoven Beethoven Chopin Brahms Prokofiev Dvorak Corea Mozart Prokofiev Prokofiev Stravinsky Rimsky-Korsakov CPSC 231 Co-sequentail processing (D.H.)
An algorithm for matching two sorted lists • Initialize item1 from list 1 and item2 from list 2 • if item1 is less than item2, then get the next item from list 1 • if item1 is greater than item2, then get the next item from list 2 • if the items are the same, then output the items and get the next items from the two lists. CPSC 231 Co-sequentail processing (D.H.)
Matching algorithm • When should the above algorithm terminate? CPSC 231 Co-sequentail processing (D.H.)
Merging as an example of co-sequential processing • Suppose that we want to create a file that consists of a union of names in the two lists of names. CPSC 231 Co-sequentail processing (D.H.)
Merging example List 1 List 2 Output Beethoven Beethoven Beethoven Chopin Brahms Brahms Dvorak Corea Chopin Mozart Prokofiev Corea Prokofiev Stravinsky Dvorak Rimsky-Korsakov Mozart Prokofiev Rimsky-Korsakov Stravinsky CPSC 231 Co-sequentail processing (D.H.)
An algorithm for merging two sorted lists • Initialize item1 from list 1 and item2 from list 2 • if item1 is less than item2, then output item1and get the next item from list 1 • if item1 is greater than item2, then output item2 and get the next item from list 2 • if the items are the same, then output the items and get the next items from the two lists. CPSC 231 Co-sequentail processing (D.H.)
Merging algorithm • When should the merging algorithm terminate? CPSC 231 Co-sequentail processing (D.H.)
Co-sequential processing essential components • Initialization (start from the lowest key valued records). • Main loop (one main loop is needed until there are available records) • Compare the keys of read items, write the output as needed and advance pointers in files as needed.) CPSC 231 Co-sequentail processing (D.H.)
A K-way Merge • The most common application of co-sequential processing requiring more than two input files is a K-way merge. • K-way merge is a merge in which k input files are merged to produce one output file. CPSC 231 Co-sequentail processing (D.H.)
A K-way Merge AlgorithmAssumptions • Suppose that we want to merge k lists: list0, list1, list2, listk-1 • Lists list0…listk-1 are sorted. CPSC 231 Co-sequentail processing (D.H.)
A K-way Merge Algorithm Initialize all lists to the point to the first item (call this item item(listi)) Loop until there are items to be processes. Select the minimum item (minItem) among the k elements. Output this item Loop from i=0 to k-1 if minItem = item(listi) then advance listi EndLoop EndLoop CPSC 231 Co-sequentail processing (D.H.)
K-way Merge Performance • K-way merge performs well if K is no larger than 8. • Merging a larger than 8 number of lists may prove very time consuming. • Solution: • Use selection tree (see figure 8.15 p.311) CPSC 231 Co-sequentail processing (D.H.)
Selection tree • The use of selection tree is an example of the classic time-versus-space trade-off. • We reduce the time required to find the key with the lowest value by using a data structure to save information about the relative key values. • Since the selection tree is a binary tree than its depth islog2K CPSC 231 Co-sequentail processing (D.H.)
Finding Minimum by Using Selection Tree • Thus the number of comparisons needed to find a minimum using a selection tree is log2K • How many comparisons are needed on the average if the search linear? CPSC 231 Co-sequentail processing (D.H.)
Sorting Performed Sequentially • In the past when we discussed sorting of files in main memory we described three separate steps: • reading the entire file from disk into main memory • sorting the file in main memory • writing the file back to the disk CPSC 231 Co-sequentail processing (D.H.)
Overlapping I/O and Processing • The total time taken to sort the entire file is the sum of the times for the above three steps. • Can we improve on the time it takes for this memory sort by overlapping the file reading, with in memory sorting and with file writing? • Yes, e.g. use the heap sort. CPSC 231 Co-sequentail processing (D.H.)
Heap Sort Properties - Review • Heap sort keeps all keys in a binary tree called a heap. • Each node in a heap has a single key, and that key is >= the key at its parent node. • Heap is a complete binary tree (leaves are on at most two levels and in leftmost nodes) • Heap is kept as an array with the root node having index 1 and indexes of children of node i are 2i and 2i+1. • See example of a heap fig. 8.16 p.313 CPSC 231 Co-sequentail processing (D.H.)
Using a Heap for Sorting • Heap sort requires the following steps: • A loop for building a heap by inserting nodes to it. • A loop for removing an item a the root of the heap and rearranging the heap. • Can these steps partially be performed while we are reading or writing the file? CPSC 231 Co-sequentail processing (D.H.)
Building the Heap While Reading the File • How to overlap file reading and building the heap? • Loop Read a block of records from a file to an input buffer (beginning of the array) Heap sort the records in the buffer Read the next block of records to the next buffer (next part of the array) Heap sort the new records performing insert of each element. End Loop (see fig. 8.19, p.316) CPSC 231 Co-sequentail processing (D.H.)
Writing to File a Sorted List While Reordering a Heap • Similarly, we can remove an item from a heap and write to file while the rest of the heap is being reordered. • We can write first to an output buffer and once the buffer is full we can write it back to the disk. This operation can overlap with writing the next part of the list to another output buffer. CPSC 231 Co-sequentail processing (D.H.)