1 / 14

15.8 Algorithms using more than two passes

15.8 Algorithms using more than two passes. Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University. Multipass Algorithms . Previously , most of algorithms are required two passes. There is a case that we need more than two passes.

rowdy
Download Presentation

15.8 Algorithms using more than two passes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University

  2. Multipass Algorithms • Previously , most of algorithms are required two passes. • There is a case that we need more than two passes. • Case : Data is too big to store in main memory. • We have to hash or sort the relation with multipass algorithms.

  3. Agenda • 1. Multipass Sort-Based Algorithm • 2.Multipass Hash-Based Algorithm

  4. Multipass sort-based algorithm. • M: Number of Memory Buffers • R: Relation • B(R) : Number of blocks for holding relation. • BASIS: • 1. If R fits in M block (B (R) <= M). • 2. Reading R into main memory. • 3. Sorting R in the main memory with any sorting algorithm. • 4. Write the sorted relation to disk.

  5. Multipass sort-based algorithm. • INDUCTION: (B(R)> M) • 1. If R does not fit into main memory then partitioning the blocks hold R into M groups, which call R1, R2, …, RM • 2.Recursively sorting Ri from i =1 to M • 3.Once sorting is done, the algorithm merges the M sorted sub-lists.

  6. Performance: Multipass Sort-Based Algorithms 1) Each pass of a sorting algorithm: 1.Reading data from the disk. 2. Sorting data with any sorting algorithms 3. Writing data back to the disk. 2-1) (k)-pass sorting algorithm needs 2k B(R) disk I/O’s 2-2)To calculate (Multipass)-pass sorting algorithm needs = > A+ B A: 2(K-1 ) (B(R) + B(S) ) [ disk I/O operation to sort the sublists] B: B(R) + B(S)[ disk I/O operation to read the sorted the sublists in the final pass] Total: (2k-1)(B(R)+B(S)) disk I/O’s

  7. Multipass Hash-Based Algorithms • 1. Hashing the relations into M-1 buckets, where M is number of memory buffers. • 2. Unary case: • It applies the operation to each bucket individually. • 1.Duplicate elimination (δ) and grouping (γ). • 1) Grouping:Min, Max, Count , Sum , AVG , which can group the data in the table • 2) Duplicate elimination: Distinct Basis: If the relation fits in M memory block, -> Reading relation into memory and perform the operations. • 3. Binary case: It applies the operation to each corresponding pair of buckets. • Query operations: union, intersection, difference , and join • If either relations fits in M-1 memory blocks, • -> Reading that relation into main memory M-1 blocks • -> Reading next relation to 1 block at a time into the Mth block • Then performing the operations.

  8. INDUCTION • If Unary and Binary relation does not fit into the main memory buffers. • Hashing each relation into M-1 buckets. • Recursively performing the operation on each bucket or corresponding pair of buffers. • Accumulating the output from each buckets or pair.

  9. Hash-Based Algorithms : Unary Operatiors

  10. Perfermance: Hash-Based Algorithms • R: Realtion. • Operations are like δ and γ • M: Buffers • U(M, k): Number of blocks in largest relation with k-pass hashing algorithm.

  11. Performance:Induction Induction: 1. Assuming that the first step divides relation R into M-1 equal buckets. 2. The buckets for the next pass must be small enough to handle in k-1 passes 3.Since R is divided into M-1 buckets , we need to have (M-1)u(M, k-1).

  12. Sort-Based VS Hash-Based 1. Sort-based can produce output in sorted order. It might be helpful to reduce rotational latency or seek time 2. Hash-based depends on buckets being of equal size. For binary operations, hash-based only limits size of smaller relation. Therefore, hash-based can be faster than sort-based for small size of relation.

  13. THANKS

More Related