100 likes | 202 Views
Query execution hashing based Two pass algorithms. —Gaurang Patel— (205). Agenda. Terminology Hash Basics Partitioning relation by hashing Hash-based Grouping and Aggregation Union, Intersection and Difference Hash-Join Algorithm Conclusion. Terminology. Query optimization
E N D
Query executionhashing based Two pass algorithms —Gaurang Patel— (205)
Agenda • Terminology • Hash Basics • Partitioning relation by hashing • Hash-based Grouping and Aggregation • Union, Intersection and Difference • Hash-Join Algorithm • Conclusion
Terminology • Query optimization -- Logical Query plan -- Physical plan • Query execution: -- Query processor- group of DBMS components -- Converts user queries into database operations • Operation • Relation- arguments of operation
Basics of Hash • Large data • Hash functions to store large relations Memory buffers • Gain factor of M in the size of relations
Partitioning Relations by Hashing Algorithm:
Hash-based Grouping and Aggregation • Tuples from same block foes to same bucket • Hash key depends on grouping attributes • First pass: Process each bucket in turn. • Second pass: Only one record per group.
Union, Intersection and Difference • Binary operation- same hash function for both arguments • Union: R U S • First Pass -- 2(M-1) buckets -- Avoid duplicates • Same for R ∩ S, R – S..
Union, Intersection and Difference • I/O operations needed: -- B(R) + B(S) -- 2 more for hashing -- Total: 3(B(R) + B(S)) • For, two pass algorithm: -- min(B(R),B(S)) ≤ M2
Hash-Join Algorithm • R(X,Y) ►◄ S(Y,Z) • Same as other binary operations • Only difference in hash key, Y • I/O operations: -- 3(B(R)+B(S)) -- Two pass require min(B(R),B(S)) ≤ M2 -- Further techniques to reduce number of I/O operations