210 likes | 380 Views
Update-Pattern-Aware Modeling and Processing of Continuous Queries. Lukasz Golab University of Waterloo, Canada lgolab@uwaterloo.ca Joint work with M. Tamer Özsu. Introduction. Relational algebra and queries
E N D
Update-Pattern-Aware Modeling and Processing of Continuous Queries Lukasz Golab University of Waterloo, Canada lgolab@uwaterloo.ca Joint work with M. Tamer Özsu
Introduction • Relational algebra and queries • Each operator consumes one or more relation instances and outputs a relation instance • Blocking computations • Some operators have non-blocking variants • aggregation, join Lukasz Golab
What is a continuous query? • Expression composed of non-blocking ``relational’’ operators that operate on streams • Streams may be bounded by sliding windows • Q(t) = answer of a continuous query Q at time t • = output of corresponding one-time relational query Q’ whose inputs are the current states of the streams, windows, and tables referenced in Q Lukasz Golab
Example of a continuous query s s SUM Output Inputs Lukasz Golab
What is an update pattern? • Update pattern does not refer to individual tuples • stream = append-only • Update pattern refers to changes in the answer of a continuous query (insertions/deletions) • Deletions? Aren’t streams append-only? • Queries over an append-only database don’t necessarily produce append-only output Lukasz Golab
Non-append-only output • Select stocks whose price this hour is greater than their price in the previous hour • Select all stock prices reported in the last 5 minutes Company X 8am $1.00Company X 9am $1.50Company X 10am $1.25 Update Pattern? FIFO Update Pattern 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Lukasz Golab
Monotonic queries • Query Q is monotonic (over an append-only database) if Q(t) Q(t`) for all t ≤ t` • Queries over sliding windows are non-monotonic because all of their results eventually expire as the windows slide forward • Some queries are non-monotonic over an append-only database (stream) • Stock quotes whose price is higher than last hour • But others become non-monotonic due to windowing • Select all stock quotes – monotonic Lukasz Golab
Problem definition • Motivation • Two possible reasons for non-monotonic behaviour of continuous queries • Problem statement • Divide non-monotonic queries into classes • Analyze the update patterns of each class • Use the knowledge of update patterns in query processing and optimization Lukasz Golab
Outline • Update patterns of sliding window queries • Classification • Advantages of update-pattern awareness • Modeling (query semantics) • Processing (query execution) Lukasz Golab
Sliding window operators • When a tuple falls out of its window, it also expires from the output and from operator state x z y z y DISTINCT x z x z x z z x y z x z x z z x y oldest S1 f a d a c f d a a c S2 c f g d a undo Lukasz Golab
Calculating expiration times • Time-based windows – predictable expiration times • Assign a timestamp, ts, upon arrival • Expiration time = ts + window_size FIFO • For joins: min(expiration times of the joined tuples) • Predictable, but is it stillFIFO? • Count-based windows, non-monotonic queries over infinite streams - unpredictable • Expiration time depends on stream arrival rates or the data arriving on the stream neednegative tuples Lukasz Golab
Classification of update patterns • Monotonic: answers never expire • selection, join, duplicate elimination, over infinite streams • Weakest non-monotonic: answers expire in FIFO order, negative tuples are not necessary • operators over time-based windows that don’t reorder incoming tuples during processing • Weak non-monotonic: order is not FIFO, but negative tuples are not needed • Time-based window join, duplicate elimination • Strict non-monotonic: unpredictable expiration order • negation, queries over count-based windows Lukasz Golab
Outline • Update patterns of sliding window queries • Classification • Advantages of update pattern awareness • Modeling (query semantics) • Processing (query execution) Lukasz Golab
Update-pattern-aware semantics of continuous queries • How are updates of relational tables different from insertions and deletions caused by the movement of the windows? • Join of two infinite streams is monotonic • Join of two windows is weak non-monotonic • Join of a window and a table: weakest (easier), weak (same), or strict non-monotonic (harder)? Lukasz Golab
Update-pattern-aware modeling of continuous queries, cont. • Harder: allow arbitrary table updates • Strict non-monotonic because we can’t predict when and how the table will be changed • Easier: don’t allow retroactive updates • Non-retroactive relation (NRR) – table updates don’t affect previously arrived stream tuples • Weakest non-monotonic Lukasz Golab
Example • Stream: stock quotes • Table: mapping between stock symbols and company names • Query: select company name and price over a (time-based) window • Company goes bankrupt: delete its previous quotes (relation) or not (NRR) • Company changes name: update the name in previous quotes (relation) or not (NRR) • New company: no prior stock quotes Lukasz Golab
Update-pattern-aware query processing • Annotate query plan with update patterns of each sub-query • Use appropriate data structures for storing state • Use appropriate physical operators Delete Insert partition by expiration time Strict non-monotonic Weakest or weak non-monotonic DISTINCT DISTINCT Lukasz Golab
Update-pattern-aware query optimization • Cost model • Per-unit-time cost of executing operators, maintaining state, and processing negative tuples • Update-pattern-aware heuristic • Strict NM pull-up, weakest NM push-down • operator and state implementations are simpler with weakest and weak NM Lukasz Golab
Update-pattern-aware query optimization, cont. STR s STR STR WK WKS STR WKS WKS WKS s s s WKS WKS Stream 1 Stream 2 Stream 3 Stream 1 Stream 2 Stream 3 Lukasz Golab
Summary • Monotonic vs. non-monotonic classification is not precise enough • Fails to distinguish between predictable (due to windowing) and unpredictable update patterns • Our update-pattern classification • Clarifies the semantics of continuous queries that reference tables alongside streams/windows • Forms the basis of our update-pattern-aware query processor Lukasz Golab
Future work • Extend update-pattern-aware query optimization • Investigate the update patterns of periodically re-executed queries • Sub-divide queries over count-based windows • For now, strict non-monotonic Lukasz Golab