260 likes | 382 Views
Generalized Hash Teams for Join and Group-By. Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany. Outline. Motivating Example Standard Hash Teams Generalized Hash Teams for Joins Generalized Hash Teams for Joins/Grouping False Drops Analysis
E N D
Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany
Outline • Motivating Example • Standard Hash Teams • Generalized Hash Teams for Joins • Generalized Hash Teams for Joins/Grouping • False Drops Analysis • Application Examples (TPC-D) • Performance Evaluation A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Traditional Join Plan R T S R A Result S A T A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
T R S A A A A Traditional Hash Team Join Plan[Graefe, Bunker, Cooper: VLDB 98] R A R.A Result S A S.A T T.A A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
T R S A B Generalized Hash Teams A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
T T R R S S A A B B Generalized Hash Teams 6 mod 5 =1 Partition on B odd: yellow even: green A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Generalized Hash Team for Grouping/Aggregation • select c.City, sum(o.Value)from Customer c, Order owhere c.C# = o.C#group by c.City Join and grouping team Agg Agg Ptn on City Bit- maps (BM) Ptn on BM Ptn on C# Ptn on C# Ptn on City Order Customer Customer Order A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
City C# Group (Customer Order ) Partition on City and generate bitmaps for C# City Customer C# Partition with bitmaps for C# Order C# A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Group (Customer Order Lineitem) City C# O# Partition on City and generate bitmaps for C# City Customer C# Partition with bitmaps for C# and generate bitmaps for O# C# Order O# O# Partition with bitmaps for O# Lineitem A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
T T R R S S A A B B False Drops A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
(Customer Order Lineitem) C# O# Overlapping Partitions Partition on B and generate bitmaps for A T Customer Partition on C# and generate bitmaps for O# S Order Partition based on the bitmaps for A Lineitem R Partition with Bitmaps A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Applicability ofGeneralized Hash Teams Partition on B • for partitioning hierarchical structures A B • but it is also correct for non-strict hierarchies A B (but performance deteriorates) Partition on bitmaps for A A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
T S R T T R R S S A A B B Non-strict hierarchyA B A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
False Drops Estimation b: cardinality of the bitmaps n: number of partitions probability that some s sets a bit leading to a false drop of an r into a particular partition: total number of false drops: conservative approximation: A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Implementation Details:Fine Tuning the Partitioning Bloom-Filter [Bratbergsengen] [Valduriez] Bitmaps A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
City C# Implementation Details:Teaming up Join and Grouping Group (Customer Order ) Partition on City and generate bitmaps for C# City Customer C# C# Order Partition with bitmaps for C# A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Teaming Up Join and Grouping: Build Phase 5 25 13 23 PA M A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Teaming Up Join and Grouping: Probe Phase 5 25 13 23 10 1 PA M A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
City C# Performance Comparison: Group (Customer Order ) Memory [MB] A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
False Drops Estimation and Measurement A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Group (Customer Order Lineitem) City C# O# Performance Comparison: A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams Memory [MB]
False Drops Estimation and Measurement A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Conclusion and Future Work • Look-Ahead Partitioning for Joins and Grouping • Applicable for hierarchical data structures • correctness does not depend on strict hierarchies • Applicable for several TPC-D (TPC-H and TPC-R) queries: e.g., Q5, Q10, Q18 • Combining Generalized Hash Teams and Order Preserving Hash Joins (OHJ) A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
TPC-D Q5 SELECT N_NAME, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT)) AS REVENUE FROM CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = '[region]' AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 1 YEAR GROUP BY N_NAME ORDER BY REVENUE DESC; A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
TPC-D Q10 SELECT C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE, C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT FROM CUSTOMER, ORDER, LINEITEM, NATION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 3 MONTH AND L_RETURNFLAG = 'R' AND C_NATIONKEY = N_NATIONKEY GROUP BY C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS, C_COMMENT ORDER BY REVENUE DESC; A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
Indirectly Partitioning a Hierarchical Structure Customer City C# C# Order O# O# Lineitem Partition 1 Partition 2 Partition 3 A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams