R* Optimizer Validation and Performance Evaluation for Distributed Queries

R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert, Guy M. Lohman IBM Almaden Research Center 1986 presented by: KORHAN KOCABIYIK KORHAN KOCABIYIK

Paper Layout • Overview of distributed Compilation and Optimization • Instrumentation • General Measurements • Distributed Join Results • Suggestions for Improving their Performance • Conclusions • Suggested Readings KORHAN KOCABIYIK

About Paper • It is about the performance evaluation of R* to distributed queries (that has tables at multiple sites) • R* message, I/O and CPU resources consumed and corresponding costs estimated by the optimizer • A number of tests were run over wide variety of dynamically-created test databases, SQL queries, and system parameters. • Different speed networks show different performance for distributed queries KORHAN KOCABIYIK

What is System R? • System R is a database system built as a research project at IBM Almaden Research Center in the 1970's. • System R introduced the SQL language and also demonstrated that a relational system could provide good transaction processing performance. • Non-procedural user interface. User specify only what data is desired, leaving the system optimizer to choose how to access that data • Optimizer plays a central role regarding system performance KORHAN KOCABIYIK

R* • R* is inherited and extended to a distributed environment, the optimization algorithms of System R • Each site is autonomous as possible. • No central scheduler, no central deadlock detection, no central catalog, etc. • R* uses snapshot data – a “copy of a relation(s) in which the data is consistent but not necessarily up to date.” used to provide a static copy of the database • The Unit of Distribution in R* is a table and each table is stored at one and only one site KORHAN KOCABIYIK

Optimization Should Answer • Under what circumstances (regions of parameter space) does the optimizer choose a suboptimal plan, or, worse, a particularly bad plan? • To which parameters is the actual performance most sensitive? • Are these parameters being modeled accuretly by the optimizer? • What is the impact of variations from the optimizer’s simplifying assumptions? • Is it possible to simplfy the optimizer’s model (by using heuristics, for example) to speed up optimization? • What are the best database statistics to support optimization? KORHAN KOCABIYIK

Distributed Compilation and Optimization • Distributed Query: Any SQL data manipulation statement that references tables at sites other than the query site • Query Site: The site to which the application program is submitted for compilation • Master Site: The site coordinates the optimization of all SQl statements embedded in that program • Apprentice Site: For each query, sites other than the master site that store a table referenced in the query. • Join Site: The site at which each join takes place. And there are two methods for trnsfering a copy of the inner table to the join site: • Ship Whole: A copy of the entire table is transferred • Fetch Matches: Only matching tuples are transferred KORHAN KOCABIYIK

Fetch Matches • 1) Project the outer table tuple and ship this value to the site of the inner table • 2) Find those tuples in the inner inner table that match the value sent and project them to the columns needed • 3) Ship a copy of the projected matching inner tuples back to the join site • 4) Join the matches to the outer table KORHAN KOCABIYIK

Optimization in R* • Compilation hence optimization is truly distributed in R* • The master’s optimizer makes all inter-site decisions, such as the site at which inter-site joins takes place, and the method and order of transfers between sites • Intra-site decisions are only suggested by the master planner KORHAN KOCABIYIK

R* Cost Structure • The optimizer tries to minimize a cost function that is a linear combination of four components: CPU, I/O, the number of messages and the total number of bytes transferred at each message • Cost = Wcpu(#_instructions) + Wi/o(#ios) + Wmsg(#_MSGs) + Wbyt(#_byt) • Wmsg for some constant penalty for sending a communication over the network • Wbyt penalty for message length. KORHAN KOCABIYIK

CPU cost • Both local and distributed queries found CPU cost to be important. • CPU costs high in sorts • 1.allocating temp disk space for partially sorted strings • 2. Encoding and decoding sorted columns • 3. Quicksorting individual pages in mem. • CPU costs are also high in scans • Although “CPU cost is significant . . . [it] is not enough to affect the choice of the optimizer” KORHAN KOCABIYIK

Instruments • 1. Added three statements to SQL language for test control and performance monitoring(EXPLAIN, COLLECT COUNTERS, FORCE OPTIMIZER) • 2. Developed pre-compiled applications for autonmatically • (a) testing queries using the SQL statements of 1 • (b) analyzing the data collected by step a • 3. Stored output of SQL statements of (1) and application programs of (2) in DB tables to establish interface between (1), (2a) and (2b) KORHAN KOCABIYIK

Distributed EXPLAIN • The EXPLAIN command writes to user-owned PLAN_TABLE information describing the access plan chosen by the optimizer for a given SQL statement • No single site has the complete access plan for a distributed query • Master-site has the inter-site access plan and each apprentice has its local access plans • R* EXPLAIN command was augmented to store each apprentice-site’s plan in a local PLAN_TABLE and the test application program was altered to retrieve that information from each apprentice’s PLAN_TABLE KORHAN KOCABIYIK

Distributed COLLECT COUNTERS • This new SQL statement collects and stores in a user-owned table the values of 40 internal counters RSS* (counts of disk reads, writes, and lookups) and some newly implemented counters of the communications component DC* • By a process run in the master and child processes run in the apprentices these counters are collected and inserted into the COUNTER_TABLE in the master-site KORHAN KOCABIYIK

Distributed FORCE OPTIMIZER • To measure the cost of suboptimal plans they had to force the optimizer to use a specified plan. • The FORCE OPTIMIZER statement chooses the plan for the next SQL statement • Apprentice optimization can be forced by simply telling each apprentice to utilize the optimization decisions recommended by the master-site KORHAN KOCABIYIK

Experimental Envirenment • 2 identical IBM 4381 m/c, 40 buffer page of 4K each, • 3000 random integer values for the join column • 4 Integer & 5 fixed size char columns • Each tuple is exactly 66 bytes long • System stored 50 tuples in one page • Join tests resulted with 100 to 6000 tupled result sets KORHAN KOCABIYIK

Costs Measured KORHAN KOCABIYIK

Distributed Join • Focused on distributed joins rather than single table queries • In R*, n-table joins are executed as a sequence of n-1 two-table joins • Intermediate results of these joins are called composite tables • For simplicity they used only two-table joins KORHAN KOCABIYIK

Distributed Joins (Cont’) • There are two different join methods: • Merge Scan Joins (-M-) • Nested Loop Joins (-N-) • There are two different transfer methods: • Whole table transfer (W) • Fetch only match (F) • AIW-M-B means Indexed outer table A is wholly shipped to inner table B with merge scan sort KORHAN KOCABIYIK

Inner Table Transfer Strategy • If we ship the inner table B as a whole • best plan is AI-M-BIW • If we fetch the matching inner tuples • best plan is AI-M-BIF KORHAN KOCABIYIK

Transfer Trade-offs • Option W – transfer the whole inner table. • Negatives • No indexes can be shipped with it. • May ship inner tuples that have no matching outer tuples • Positives • Predicates applied before shipping may reduce size • Only one transfer is needed for the join which may result in lower overall network costs KORHAN KOCABIYIK

Transfer Trade-offs (Cont.) • Option F – Fetch matching tuples only • Idea – Send outer tuple join column values, match with inner tuples, then send these inner tuples over network • Negatives • Multiple rounds of sending congest network • May have to send whole table anyway – W better • Positives • Tables may have few actual join values in common, can eliminate need to send many inner tuples. KORHAN KOCABIYIK

Use W or F Option? • In W – Network costs only 2.9% of total • Strategy F handy when: • Cardinality of outer table <0.5 the # of messages required to ship the inner table as a whole. • Idea behind rule – beat plan W in theory by sending few join values from outer table that will weed out most inner tuples • The join cardinality < inner cardinality • Idea behind 2nd rule – since most inner tuples will not be needed, we can beat plan W by sending only outer join values and eliminating most inner tuples. KORHAN KOCABIYIK

Optimizer performance (Local) • Optimizer has trouble modeling unclustered indexes on smaller tables. In such cases, Optimizer actually picks worst plan, thinking it is the best and thinks the best is actually the worst. • Why? • Tuple order unimportant to nested loop join and index on outer table may clutter buffer. • A highly selective predicate may eliminate the need for a scan in which case the index is important. • Adding an index can increase the cost - The Optimizer models each table independently, ignoring competition for buffer space amongst two tables being joined KORHAN KOCABIYIK

Distributed vs. Local Joins • Total resources consumed in Distributed joins higher than in local joins • Response time in Distributed joins less than in Local Joins • What does this mean? • We have more machines doing work in a distributed join so they can do work in parallel- more work is done but since more machines are doing work, the result takes less time. KORHAN KOCABIYIK

Distrib vs. Local Joins Cont. • Response time improvement for distributed queries has 2 reasons • 1. Parallelism • 2. Reduced Contention – accessing tables using unclustered indexes benefit greatly from larger buffers – n machines = n buffer size. • Negatives of Distributed – Slow network speeds make reverse true, then local joins are faster. KORHAN KOCABIYIK

Alternative Join methods • R* has also introduced some alternative join methods • joins using dynamically-created indexes • semijoins • joins using hashing (Bloom) filters (Bloomjoins) • S is a table at site 1 and T is a table at site 2 • S.a = T.b where a is column of S and b is a column of T • Two cases • Both S and T have an (unclustered) index on their join column(s) • Neither have an (unclustered) index on their join column(s) KORHAN KOCABIYIK

Dynamically Create Temporary Index on Inner Table • Since we cannot send an index, we can try to make one • Cost structure may be high • Scan entire table and send to site 1 • Store table and create a temporary index on it at site1 • Execute best local join plan KORHAN KOCABIYIK

Semijoin • Sort S and T on the join column. Produce S’, T’ • Send S’ ‘s join column values to site T • match against T’ and send these actual tuples to site S. • Merge-join T’ ‘s tuples and S’ ‘s tuples to get answer. KORHAN KOCABIYIK

Bloom join • Use Bloom filter – bit string sort of like hash table where each bit represents a bucket like in a hash table. All bits start off 0. If a value hashes to bit x, turn x to 1. • Generate a Bloom filter for table S and send to T. • Hash T using the same hash function and ship any tuples that hash to a 1 in S’s Bloom filter • At site S, join T’s tuples with table S. KORHAN KOCABIYIK

Comparison of join methods • Bloom joins generally outperform other methods • Semijoins advantageous when both data and index (unclustered) pages of inner table fit into the buffer so that efficient use of these tables keep semijoins procesing costs low. If not, constant paging of unclustered index results in poor performance. KORHAN KOCABIYIK

Why are Bloom Joins better? • Message costs of Semi and Bloom comparable • Semijoin incurs higher local processing costs to perform a “second join”, ie once send S’ ‘s join column to T’, join, then send this result to S’ and join these T’ values with S’. KORHAN KOCABIYIK

Conclusions • Shipping the entire inner table to the join site and storing it there dominates the fetch-matches strategy • R* optimizer’s modeling of message costs was very accurate (a necessary condition for picking the correct join site) • Even for medium-speed, long networks linking geographically disparsed hosts, local costs for CPU and I/O are significant • Bloom-join dominates the other distributed join methods in all cases investigated • They did not test joins for very large tables and for more than 2 tables, so they may be characterised differently • They used an environment that all sites are have the same capabilities and a uniform network link. So adapting the optimizer to real-life may be rather difficult KORHAN KOCABIYIK

Questions? Thanks! presented by: KORHAN KOCABIYIK KORHAN KOCABIYIK

R* Optimizer Validation and Performance Evaluation for Distributed Queries