170 likes | 330 Views
GSLPI: a Cost-based Query Progress Indicator. Jiexing Li # , Rimma Nehme * , Jeff Naughton # # University of Wisconsin-Madison * Microsoft Jim Gray Systems Lab. Progress indicator (PI).
E N D
GSLPI: a Cost-based Query Progress Indicator Jiexing Li#, RimmaNehme*, Jeff Naughton# #University of Wisconsin-Madison *Microsoft Jim Gray Systems Lab
Progress indicator (PI) • A PI provides feedback to users on how much of the task has been completedor when the task will finish. • It is useful for workload management, admission control, skew handling, etc.
Main results • Existing PIs generate inaccurate results, if a query has phases with different performance characteristics. • A new PI that provides more accurate predictions by using • optimizer’s cost estimates, and • deeper analysis of query pipelines.
Execution plan • Blocking operator: no outputs until the operator has consumed at least one of its inputs (e.g., sort and hash match) • Pipeline: a subtree of concurrently executing operators delimited by blocking operators Hash Match Hash Match Pipeline: P2 Filter Filter Table Scan [B] Table Scan [B] Pipeline: P1 Table Scan [A] Table Scan [A] An execution plan
Basic ideas of previous PIs • Pipelines are processed one after another. • Each of them processes a certain number of tuples. • Speed: how fast tuples can be processed. • Estimated cardinalities • Current speed (future speed is unknown) Time P1 Pn P2 … Cardinality N1 Cardinality Nn Cardinality N2 Estimated remaining time
Limitations of previous PIs • select count(distinct orderkey) from lineitem Compute Scalar P2 Hash Match (orderkey) Table Scan [lineitem] P1 P1 P2 Execution plan Test results for TPC-H 10GB data
Limitations of Previous PIs (cont.) • select count(distinct orderkey) from lineitem • Before P2 starts, only the processing speed for P1 is known. • Over estimates the execution time for P2 by using the current speed of P1 for future pipeline P2. P1 P2
GSLPI improvement • Different phases have different processing speeds. • How does GSLPI estimate the processing speeds for future pipelines? P2 P1 Cardinality N1 Cardinality N2 Speed S1 Speed S2 ?
GSLPI: a cost-based progress indicator P1 Pn P2 … Cardinality N1 Cardinality Nn Cardinality N2 Further division (uniform speed) P1 P2 Pn P’2 … N’2 N2 N1 Nn ? Cost: work to process a tuple Cost C1 Cost Cn Cost C’2 Cost C2 ? Speed: adjusts based on the cost Speed S1 Speed S2 Speed S’2 Speed Sn Estimates the remaining time Estimated T1 Estimated T2 Estimated T’2 Estimated Tn
GSLPI: Cost • Redistributes the cost for operators on the boundaries. • P1 scans a tuple and inserts it into the hash table. • P2 fetches a tuple from the hash table and counts. • Splits the cost for Hash Match operator. • Identifies bottleneck cost. • Due to concurrent execution. select count(distinct orderkey) from lineitem Estimated [CPU, IO] Compute Scalar [0, 0] P2 Hash Match (orderkey) [275, 0] Table Scan [lineitem] P1 [66, 767] Execution plan
GSLPI: Speed • select count(distinct orderkey) from lineitem • P2 should be able to process its tuples much faster than P1. • Adjusts the speeds for future pipelines based on how difficult it is to process a tuple. Compute Scalar P2 Hash Match (orderkey) Table Scan [lineitem] P1 Execution plan
Experimental evaluation • Database: 10GB TPC-H. • Setup: Intel Core 2 Duo CPU and 4GB RAM. • Tested on 22 TPC-H queries. • They usually contain 6 to 16 operators, which may be divided into 3 to 9 different pipelines. • Estimated the remaining time in every 10 seconds.
Example of progress estimation P5 P4 P6, P7 P1, P2, P3 Experiment results for TPC-H Q5
Error metric • Similar to Q5, 20 out of 22 TPC-H queries process tuples at different speeds during their executions. • Estimated percentage of completion: • Error: the distance between fi and the actual percentage of completion. • For each query, we calculated the average error and the maximum error. fi
GSLPI accuracy Due to cardinality estimation error!
Conclusions • Investigated the limitations with previous PIs. • Developed a new progress indicator: GSLPI. • Evaluated their performance. • More accurate than previous PIs. • Remaining progress estimation error for TPC-H queries is mostly due to cardinality estimation error!