1 / 30

제 7 장 Cosequential Processing and the Sorting of Large Files

제 7 장 Cosequential Processing and the Sorting of Large Files. Cosequential Operations. Coordinated processing of two or more sequential lists to produce a single output list Kinds of Operations merging, union matching intersection combination of above. Matching Operation.

Download Presentation

제 7 장 Cosequential Processing and the Sorting of Large Files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 제 7 장Cosequential Processing and the Sorting of Large Files

  2. Cosequential Operations • Coordinated processing of two or more sequential lists to produce a single output list • Kinds of Operations • merging, union • matching • intersection • combination of above

  3. Matching Operation • Output the names common to the two lists • Matching or an intersection • Four step 1. Initializing 2. Synchronizing 3. Handling end-of-file conditions 4. Recognizing errors

  4. Matching Operation (2) • Algorithm • p261 Figure 7.2 • three-way conditional statement if NAME_1 < NAME_2 read the next from LIST_1 if NAME_1 > NAME_2 read the next from LIST_2 else output the name read the next from both list

  5. Matching Operation (3) • Key of algorithm • always return to the head of the main loop • End-of-file condition • test MORE_NAMES_EXIST flag • until either of two list reaches end-of-file

  6. Merging Two Lists • Based on matching operation • p264 Figure 7.5 • Difference • must read each of the lists completely • change MORE_NAMES_EXIST behavior • HIGH_VALUE • comes after all legal input values in the file’s ordered sequence

  7. Assumptions Two or more input files are processed in a parallel fashion Each file is sorted Comments Output may be the same as one of the input files Not necessary that all files have the same record structures Cosequential Processing Model

  8. Assumptions must exist a high key and a low key value records are in logical sorted order Comments not necessary, but decreases complexity physical ordering can have a large impact on processing Cosequential Processing Model (2)

  9. Assumptions for each file, only one current record records should be manipulated only in internal memory Comments not prohibits looking ahead or looking back, but such operations should be restricted to subprocedures cannot alter a record Cosequential Processing Model (3)

  10. Cosequential Processing Model (4) • Components • Initialization • read from the first record in the files • Synchronization loop • as long as relevant records remain • Selection in main synchronization loop • Use high values as end-of-file condition • no special code to deal with end-of-file

  11. Cosequential Processing Model (5) • Components - cont’d • I/O and error detection are to be relegated to subprocesses • hide details • Simple and robust • Example: General Ledger Program • pp. 268~276

  12. Multiway Merging • K-way merge • merge K input lists to create a single, ordered output list • p277 Figure 7.16 • less then 8 or so

  13. Multiway Merging (2) • Selection Tree • K-way merge • set of comparisons becomes expensive • time vs space trade-off • a kind of tournament tree • each higher-level node represents the winner of the two descendent keys • the depth of tree is log2 K

  14. Selection Tree

  15. Sorting in RAM • Can we improve on the time of RAM sort? • perform some of parts in parallel • selection tree is good but cannot used to sort entire file • Heapsort • sorting and reading can occur in parallel • keeping all of the keys in heap

  16. Heapsort • Heap • 자식 노드는 부모노드보다 크거나 같다. • 노드 i의 자식 노드는 2i와 2i+1 • Fig 7.20, Fig 7.21 • Processing overlap with I/O • use more than one buffer • p284 Figure 7.22 • fill buffer while building heap • Procedure for outputting : Fig 7.23

  17. Sorting Large Files on Disk • Keysort shortcomings • cost of seeking • cannot sort really large file • all key/pointer pairs in RAM • Multiway merge algorithm • run: sorted subfile

  18. Sorting Large Files on Disk (2)

  19. Sorting Large Files on Disk (3) • Multiway merging • can be extended to files of any size • reading during the run creation step • no seeking due to sequential reading • reading and writing during merging • sequential • I/O overlap using heapsorting • tape can be used

  20. How Much Time Does a Merge Sort Take? • Merge Sort vs Key Sort • pp. 287~290 (10분대 5시간) • 4 Steps • reading records and forming runs • writing sorted runs • reading sorted runs for merging • writing sorted file

  21. Sorting a Very Large File • Kinds of I/O • sort phase • sequential if using heapsort • no improvement • merge phase • random access(run의 개수에 비례) • Ways to improve performance • cut down the number of random access in the merge phase

  22. Cost of Increasing the File Size • For a K-way merge of K runs, • the buffer size for each of the runs 1/K * size of RAM = 1/K * size of each run • merge operation requires K2 seeks • Merge sort is O(K2) operation

  23. Cost of Increasing the File Size (2) • Ways to reduce time • more hardware • merge more than one step • reducing the order of each merge • increasing the buffer size for each run • Increase the length of the initial sorted runs • Overlap I/O operations

  24. Hardware-based Improvements • Possible configuration • increasing the amount of RAM • increasing the number of disk drives • increasing the number of I/O channels

  25. Multiple-Step Merging • Break the original set of runs into small groups and merge the runs in these groups separately • Fewer seeks, but extra transmission time in second pass • Read every record twice • to form the intermediate runs and to form the final sorted file

  26. Multiple-Step Merging (2) • Essence of multiple-step merging • increase the available buffer space for run • extra pass vs random access decreasing • More than two steps? • reduced seek and rotational times vs transmission times

  27. Increasing Run Lengths • A longer initial run • fewer total runs • bigger buffers • fewer seeks • Replacement selection

  28. Replacement Selection • Idea • aways select the key from memory that has the lowest value • output the key • replacing it with a new key from the input list • Implementation: p299 • p300 Figure 7.27

  29. Replacement Seletion (2) • What about a key arriving in memory too late to be output into its proper position? • use of second heap • p301 Figure 7.28

  30. Replacement Selection (4) • Two questions • Given P locations in memory, how long a run can we expect replacement selection to produce, on the average? • pp. 301~302 • What are the costs of using replacement selection? • pp. 303~304 • less than 1/3 as many seeks as RAM sorting

More Related