450 likes | 646 Views
A First Look at Materialized Query Table (MQT) in DB2 LUW. Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com. Agenda. Motivation Sample scenario How MQT can help Different types of MQTs & Maintenance MQT Matching – what queries can benefit from MQTs Design Considerations Other Uses of MQT.
E N D
A First Look at Materialized Query Table (MQT) in DB2 LUW Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com
Agenda • Motivation • Sample scenario • How MQT can help • Different types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT
Database Schema for Our Sample Scenario STORE Star Schema Store_id Region_id … PRODUCT SALES Product_id Class_id Group_id Family_id Line_id Division_id … Product_id Store_id Date_id Amount Quantity … TIME Date_id Month_id Quarter_id Year_id
Dimensions Hierarchy Product Dimension Division Level 5 Line Level 4 Time Dimension Family Level 3 Year Quarter Group Level 2 Store Dimension Region Month Class Level 1 Day Store Product Level 0 Sales Fact
Sample Query Q9: Total sales of each product group at each store for December 2008. SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = 200812 GROUP BY t.month_id, f.store_id, p.group_id
Q9 Q9 Month=200812 Month=200812 Joe’s Query Joe’s Q Bob’s Q Group=Laser Printer Group=Laser Printer Store=McLean Compute Once & Reuse GB GB GB GB JOIN JOIN JOIN JOIN JOIN JOIN JOIN Dim2 Dim2 Dim2 JOIN Dim2 Bob’s Query Fact Fact Fact Dim1 Dim1 Dim1 Store=McLean MQT Fact Dim1 Benefits of MQTs
What is an MQT? • MQT stands for Materialized Query Table. • Basic Idea: • Query results caching - pre-compute and store (in database tables) the results of frequently issued queries • Query optimizer automatically and transparently replace portions of future queries with direct references to the pre-computed (i.e., materialized) results. This is referred to as MQT routing or matching. • Standard technique in data warehousing, also known as aggregate tables and aggregate (re-)routing, or alternatively, as materialized views.
Usages of MQTs – At a High Level • Aggregate data over one or more dimensions • Joins and aggregate data over a group of tables • Data from a commonly accessed subset of data, that is, from a "hot" horizontal or vertical database partition • Repartitioned data from a table, or part of a table, in a partitioned database environment
How query can be routed to MQT? MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, f.store_id, p.group_id Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = 200812 GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = 200812 Route Query to MQT
MQT Creation CREATE TABLE MQT_SalesByMthStrGrp AS • (SELECT t.month_id, • f.store_id, • p.group_id, • sum(amount) as total_sales • FROM sales f, product p, time t • WHERE f.date_id = t.date_id AND • f.product_id = p.product_id • GROUP BY t.month_id, f.store_id, p.group_id) DATA INITIALLY DEFERRED REFRESH DEFERRED; REFRESH TABLE MQT_SalesByMthStrGrp; RUNSTATS ON TABLE petrus.MQT_SalesByMthStrGrp WITH DISTRIBUTION AND DETAILED INDEXES ALL; -- Confirm that MQT is being used by -- (a) setting CURRENT REFRESH AGE to ANY -- (b) explaining the query used as -- the MQT’s definition • MQTs are created using the CREATE TABLE statement • Immediately after creation, the MQT is empty and not accessible. It is populated using the REFRESH TABLE statement • After running the REFRESH, the MQT becomes accessible
Agenda • Motivation • Sample scenario • How MQT can help • Types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT
Types of MQTs MQT User Maintained System Maintained Refresh DEFERRED Refresh IMMEDIATE WITHOUT staging table WITH staging table
MQT Maintenance • Process of synchronizing the MQT data with changes to the underlying tables • Incremental Maintenance • Immediate Propagate Immediate Applied (IPIA) • Full Maintenance • Hybrid • Immediate Propagate Deferred Applied (IPDA)
Combine old and new values Pair Delta to MQT base base base BaseTables tables tables tables Incremental Maintenance Immediate Propagate Immediate Apply • the ability to maintain the MQT without the need to access the modified base table, only using the delta information (newly inserted/updated/deleted data) and the MQT itself. • REFRESH IMMEDIDATE MQTs are maintained incrementally during: • INSERT/UPDATE/DELETE of base table processing • REFRESH TABLE processing after LOAD INSERT into the base table. • Immediate Propagate, Immediate Apply • For performance reasons, only incrementally maintainable MQTs can be defined as REFRESH IMMEDIATE (hence more restrictions in MQT definition) DeltaAggregate MQT Delta select/join Primary-key: grouping columns
base base base BaseTables tables tables tables Full Maintenance MQT • MQTs are fully rebuilt by re-evaluating the query in the MQT definition • REFRESH DEFERRED MQTs (without staging table): • not maintained during INSERT/UPDATE/DELETE operations on the base table. • synchronized with base table using the REFRESH TABLE statement Aggregate select/join
Combine old and new values Pair Delta to AST base base tables tables Hybrid - Immediate Propagate, Deferred Applied Immediate Propagate Deferred Apply • REFRESH DEFERRED MQTs with staging table • Delta changes to MQT are computed upon INSERT/UPDATE/DELETE of base tables • Delta aggregate is recorded in a separate staging table • Subsequent REFRESH of MQT can be done incrementally • Immediate Propagate, Deferred Apply • One staging table per MQT CREATE TABLE SALES_BY_YEAR_STG FOR SALES_BY_YEAR PROPAGATE IMMEDIATE; DeltaAggregate AST Net-effect Delta select/join BaseTables base tables Primary-key: grouping columns StagingTables Staging
User Maintained MQT • User is responsible for maintaining MQT content • REFRESH TABLE disallowed • Insert/update/delete on MQT allowed • Must be REFRESH DEFERRED • Scenarios where user maintained MQTs are • For efficiency reasons, when the user is convinced that (s)he can implement MQT maintenance far more efficiently than the mechanisms used by DB2. • For leveraging existing “user maintained” summaries, where the user wants DB2 to automatically consider them for optimization for new ad hoc queries being executed against the base tables.
Agenda • Motivation • Sample scenario • How MQT can help • Types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT
MQT Matching MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id • Process to automatically use the MQT to derive results for a query for improved performance Q9: Original Query SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND t.month_id = 200812 GROUP BY t.month_id, f.store_id, p.group_id Q9: Rewritten Query SELECT m.month_id, m.store_id, m.group_id, m.total_sales FROM MQT_SalesByMthStrGrp m WHERE m.month_id = 200812 Route Query to MQT
What queries can benefit from an MQT? • Queries • that matches an MQT perfectly! • with predicates that are a subset of those in MQT • with more table joins than MQT (re-join) • with fewer table joins than MQT (need RI constraint) • with columns not in MQT definition (need functional dependencies) • at a higher aggregation level than an MQT
Q9 Q9 Month=200812 Month=200812 Joe’s Query Joe’s Q Bob’s Q Group=Laser Printer Group=Laser Printer Store=McLean Compute Once & Reuse GB GB GB GB JOIN JOIN JOIN JOIN JOIN JOIN JOIN Dim2 Dim2 Dim2 JOIN Dim2 Bob’s Query Fact Fact Fact Dim1 Dim1 Dim1 Store=McLean MQT Fact Dim1 Query with predicates that are a subset of those in MQT
Query with more table joins than MQT If the extra table join is through a column in group-by list, match and re-join with tables not in MQT Rewritten Query: GB GB Original Query: Re-join Store MQT Store Time GB MQT: Sales Product Time Sales Product
Query with more table joins than MQT (Example) Rewritten Query SELECT m.month_id, s.region_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, store s WHERE m.stored_id = s.store_id GROUP BY m.month_id, s.region_id, m.group_id Original Query SELECT t.month_id, s.region_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t, store s WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND f.store_id = s.store_id GROUP BY t.month_id, s.region_id, p.group_id Route Query to MQT MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id
Queries with fewer joins than MQT • In general, we can’t route queries to an MQT that has more table joins because the extra joins affects the MQT content • However, if the extra joins in MQT are RI-joins, i.e. joins with RI parent table with join predicate of the form: child.fk = parent.pk They are “look-up” joins that do not add or delete rows • RI-join is common between Fact and Dimension tables. • Need to declare RI constraints (even just informational constraints) to tell DB2 about it GB MQT: Query: GB RI-join Product Sales Time Sales Time
Queries with fewer joins than MQT (Example) Rewritten Query SELECT m.month_id, m.store_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m GROUP BY m.month_id, m.store_id Original Query SELECT t.month_id, f.store_id, sum(amount) as total_sales FROM sales f, time t WHERE f.date_id = t.date_id GROUP BY t.month_id, f.store_id Route Query to MQT MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id alter table sales add constraint fk_product foreign key (product_id) references product; Sales (product_id int NOT NULL)
Importance of NOT NULL Constraint on Foreign Key Columns • Declaring NOT NULL constraint on the foreign key column of the child table is important! • If a row of column SALES.product_id has null value: • the row will not qualify the join predicate f.product_id = p.product_id. • The MQT would be missing some rows that are needed by the query, hence incorrect to route. MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id
Queries at higher aggregation level than an MQT Product Dimension • MQT for the Store-Month-Group slice • Queries that access data at or above that slice can be satisfied by the MQT with some additional aggregating. • Querying more detailed data below the slice, such as Day or Product, cannot be routed to this MQT. Level 5 Division Level 4 Line Time Dimension Level 3 Family Year Level 2 Quarter Group Store Dimension Level 1 Region Month Class Day Store Level 0 Product Sales Fact
Queries at a higher level aggregation - Example MQT_SalesByMthStrGrp SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.month_id, s.store_id, p.group_id Functional Dependency TIME (month_id→quarter_id) Original Query SELECT t.quarter_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_id GROUP BY t.quarter_id, f.store_id, p.group_id Rewritten Query SELECT t.quarer_id, m.store_id, m.group_id, sum(m.total_sales) FROM MQT_SalesByMthStrGrp m, time t WHERE m.month_id = t.month_id GROUP BY t.quarter_id, m.store_id, m.group_id Route Query to MQT
Functional Dependencies Product Dimension • X functionally determines Y iff each X value is associated with precisely one Y value. • By definition, all columns of a relation are functionally dependent on the unique/primary key. • De-normalization introduces functional dependencies between non-key columns. • Between ID and descriptive columns • Example: group_id → group_desc • Between different levels of the hierarchy • Example: group_id → (family_id, line_id, division_id) Division Line Family Group Class Product
Functional Dependencies in DB2 • User can help DB2 identifies the functional dependencies among non-key columns by declaring NOT ENFORCED FD constraints: ALTER TABLE PRODUCT ADD CHECK((group_desc) DETERMINED BY group_id) NOT ENFORCED ENABLE QUERY OPTIMIZATION • During MQT routing, DB2 will re-join with the table to pick up the “dependents”
(year-region-division) GB FD re-join (month_id, year_id) TIME FD re-join (group_id, division_desc) FD re-join (store_id, region_desc) MQT (month-store-group) PRODUCT PRODUCT Functional Dependencies - Example • Query at year-region-division level: • Select t.year_id, s.region_desc, p.division_desc, • sum(amount) as total_sales • ... • GROUP BY t.year_id, s.region_desc, p.division_desc; MQT at the month-store- group aggregate level: • SELECT t.month_id, • t.year_id, • f.store_id, • s.region_id, s.region_desc, • p.group_id, • p.division_id, p.division_desc, • sum(amount) as total_sales • FROM sales f, product p, time t • ,stored s • WHERE f.date_id = t.date_id • AND f.product_id = p.product_id • AND f.store_id = s.store_id • GROUP BY t.month_id, • t.year_id, • s.store_id, • s.region_id, s.region_desc, • p.group_id • ,p.division_id, p.division_desc FD: TIME: month_id -> year_id UK: STORE store_id -> (region_id, region_desc) FD: PRODUCT: group_id -> (division_id, division_desc)
Functional Dependencies in MQT • It is a good practice to declare functional dependencies to keep the minimum amount of dimensional information in the MQT. This will: • keep the size of the MQT small • Keep the MQT unaffected by dimension table updates - minimize the need for maintenance
MQT Matching – Insurance Policy SQL Query • MQT matching performed during Query Rewrite • Rewrite transformations mostly rule/heuristic based • Insurance Policy: • Optimizer Cost based comparison between: • Access plan with MQT routing • Access plan WITHOUT MQT routing • Cheapest plan wins - against choosing a wrong heuristic Parser Query Graph Semantics With and without Aggregate MQT Query Rewrite Optimizer Code Generator
Agenda • Motivation • Sample scenario • How MQT can help • Types of MQTs & Maintenance • MQT Matching – what queries can benefit from MQTs • Design Considerations • Other Uses of MQT
Product Dim. Level 5 Division Level 4 Line Time Dim. Level 3 Family Year Level 2 Quarter Group Store Dim. Region Level 1 Month Class Day Store Level 0 Product Sales Fact Design Considerations • On usage of MQT • Space requirement – MQTs are materialized tables • On MQT definition • Level of aggregation vs. query processing time • Too high level – few queries can route to it • Too low level – costly compensation time • Workload-based vs. Model-based • Numbers of MQTs
Refreshing with finer granularity MQTs • DB2 9 allows the use of existing MQTs to be used by higher level MQTs during REFRESH • Matching done during REFRESH • It is very important to aggregate in the right order • Example : • MQT1 : GROUP BY DATE, MONTH, YEAR • MQT2 : GROUP BY MONTH, YEAR • REFRESH MQT1 • REFRESH MQT2 uses MQT1
Store Store Store Another Use of Replicated MQTs • In partitioned database, colocation of rows involved in different tables => efficient join (colocated joins) • In a typical star schema setup, the fact table can be co-located with at most one dimension table. • The remaining (smaller) dimension tables are frequently replicated to all nodes of the fact table • Replicated MQTs are used to achieve co-location of data for efficient join processing in partitioned database environment Store Product ProdID3 Product ProdID2 Product ProdID1 Sales ProdID3 Sales ProdID2 Sales ProdID1
Summary • An MQT is a table that stores pre-computed query results. • Queries can be matched to an MQT to improve performance. • Different types of MQTs and their maintenance: • immediate, deferred (with/without staging table), user maintained • replicated • Queries can be matched to an MQT that has: • Predicates that are superset of those in queries • Finer grained aggregation level • More/few table joins • Not null constraint, foreign key constraints and functional dependencies can be defined to help MQT matching
References • Redbooks (http://www.redbooks.ibm.com) • DB2 UDB's High Function Business Intelligence in e-Business (SG24-6546-00) • Related Papers • Wolfgang Lehner, Richard Sidle, Hamid Pirahesh, Roberta Cochrane: Maintenance of Automatic Summary Tables. SIGMOD Conference 2000: 512-513 • Markos Zaharioudakis, Roberta Cochrane, George Lapis, Hamid Pirahesh, Monica Urata: Answering Complex SQL Queries Using Automatic Summary Tables. SIGMOD Conference 2000: 105-116