450 likes | 553 Views
PSoup. Kevin Menard CS 561 4/11/2005. Slides are modified versions of the following original presentation:. Streaming Queries over Streaming Data. Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin. VLDB 2002. Result. Query. Psoup Insight #1.
E N D
PSoup Kevin Menard CS 561 4/11/2005
Slides are modified versions of the following original presentation: Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin VLDB 2002
Result Query Psoup Insight #1 • Queries and data are duals • Store new queries, apply to data that arrived earlier • Store new data, apply to queries that arrived earlier Index Index Data Queries • Multiquery Processing = “join” of query and data • Supports all three types of queries: queries over the past, (landmark and sliding window) continuous, and hybrid Sirish Chandrasekaran
Data Result Psoup Insight #1 • Queries and data are duals • Store new queries, apply to data that arrived earlier • Store new data, apply to queries that arrived earlier Index Index Data Queries • Multiquery Processing = “join” of query and data • Supports all three types of queries: queries over the past, (landmark and sliding window) continuous, and hybrid Sirish Chandrasekaran
Motivation? • Why another model for continuous queries? • What is wrong with how Aurora and STREAM supply responses? Sirish Chandrasekaran
Motivation: Disconnected Operation • Previous solutions stream out answers immediately Not feasible/suitable for all applications • Intermittent Connectivity: e.g., Applications on hand-held devices (as in this morning’s keynote address) • Even if connected: Not always interested in streaming answers Sirish Chandrasekaran
Invoke } Register Psoup Insight #2 • Separate computation from delivery • Query answers continuously generated in background • Apply windows on-demand to transmit “current” results Query Data Queries ID Predicate ID R.a R.b T F T T T T T F Data F F F F F F T T Results Structure • Efficient support for disconnected operation • Low response time, Shared computation and storage across invocations Sirish Chandrasekaran
PSoup Query Model SELECT select_list FROM from_list WHERE where_clause BEGIN begin_time END end_time • Where clause: conjunction of boolean factors • BEGIN-END clause: system clock or sequence numbers • (begin_time, end_time): • (constant, constant) – snapshot query • (constant, variable) – landmark window query • (variable, variable) – sliding window query Sirish Chandrasekaran
Query Registration } SELECT select_list FROM from_list WHERE where_clause BEGIN begin_time END end_time Standing Query Clause (SQC) to the Symmetric Join } to the Windows_Table • QueryID: handle for future query invocations Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Query Specification Query Store Data Store ID Predicate ID R.a R.b 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 50 3 8 23 R.a=4 and R.b=3 51 0 0 52 8 4 PSoup (a) Initial State Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Query Specification Query Store Data Store ID Predicate ID R.a R.b 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 50 3 8 23 R.a=4 and R.b=3 51 0 0 52 8 4 Select * From R Where R.a<=4 and R.b>=3 PSoup New query (b) Arrival of new Query Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Query Specification Query Store Data Store ID Predicate ID R.a R.b 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 50 3 8 23 R.a=4 and R.b=3 51 0 0 24 R.a<=4 and R.b>=3 52 8 4 BUILD PSoup (c) Building Query Store Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Query Specification Query Store Data Store ID Predicate ID R.a R.b match 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 match 50 3 8 23 R.a=4 and R.b=3 PROBE 51 0 0 24 R.a<=4 and R.b>=3 52 8 4 PSoup (d) Probing Data Store Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Query Specification Queries 20 21 22 23 24 48 4 3 48 ? 49 ? Data Results 50 3 8 50 ? 51 ? 52 ? Results Structure (e) Inserting Results Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Query Specification Queries 20 21 22 23 24 48 4 3 48 T 49 F Data Results 50 3 8 50 T 51 F 52 F Results Structure (e) Inserting Results Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Data Query Store Data Store ID Predicate ID R.a R.b 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 50 3 8 23 R.a=4 and R.b=3 24 R.a<=4 and R.b>=3 51 0 0 52 8 4 PSoup (a) Initial State Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Data Query Store Data Store ID Predicate ID R.a R.b 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 50 3 8 23 R.a=4 and R.b=3 24 R.a<=4 and R.b>=3 51 0 0 52 8 4 PSoup New data 53 3 6 (b) Arrival of new Data Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Data Query Store Data Store ID Predicate ID R.a R.b 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 50 3 8 23 R.a=4 and R.b=3 24 R.a<=4 and R.b>=3 51 0 0 52 8 4 53 3 6 BUILD PSoup (c) Building Data Store Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Data Query Store Data Store match ID Predicate ID R.a R.b 20 0<R.a<=5 48 4 3 21 R.a>4 and R.b=3 49 7 3 22 0>R.b>4 50 3 8 match 23 R.a=4 and R.b=3 24 R.a<=4 and R.b>=3 51 0 0 PROBE 52 8 4 53 3 6 PSoup (d) Probing Query Store Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Data Queries 20 21 22 23 24 48 20 0<R.a<=5 49 Data Results 50 51 24 R.a<=4 and R.b>=3 52 53 ? ? ? ? ? Results Structure (e) Inserting Results Sirish Chandrasekaran
Selections over Single Stream: Arrival of New Data Queries 20 21 22 23 24 48 20 0<R.a<=5 49 Data Results 50 51 24 R.a<=4 and R.b>=3 52 53 T F F F T Results Structure (e) Inserting Results Sirish Chandrasekaran
Query Invocation • System returns the results corresponding to the current value of the BEGIN-END clause BEGIN begin_time END end_time Queries 20 21 22 23 24 48 T 49 F Data 50 T } Current Window 51 F 52 F 53 T F F F T Results Structure Sirish Chandrasekaran
Joins over R and S: Arrival of New Query Specification S-Data Store ID S.a S.b 21 2 2 25 3 3 36 4 4 49 5 5 Query Store R-Data Store ID Predicate 20 R.a=5 and R.b<S.b ID R.a R.b 21 R.a>4 and R.b<S.b and S.a<10 10 2 5 22 R.b=4 and R.a+5>S.a and S.b>2 14 3 3 31 4 1 48 9 7 PSoup (a) Initial State Sirish Chandrasekaran
Joins over R and S: Arrival of New Query Specification S-Data Store ID S.a S.b 21 2 2 25 3 3 36 4 4 49 5 5 Query Store R-Data Store ID Predicate 20 R.a=5 and R.b<S.b ID R.a R.b 21 R.a>4 and R.b<S.b and S.a<10 10 2 5 22 R.b=4 and R.a+5>S.a and S.b>2 14 3 3 31 4 1 48 9 7 New query PSoup 23 R.a<5 and R.a>S.a and S.b>1 (b) Arrival of new Query Sirish Chandrasekaran
Joins over R and S: Arrival of New Query Specification S-Data Store ID S.a S.b 21 2 2 25 3 3 36 4 4 49 5 5 Query Store R-Data Store ID Predicate 20 R.a=5 and R.b<S.b ID R.a R.b 21 R.a>4 and R.b<S.b and S.a<10 10 2 5 22 R.b=4 and R.a+5>S.a and S.b>2 14 3 3 31 4 1 23 R.a<5 and R.a>S.a and S.b>1 48 9 7 BUILD PSoup (c) Building Query Store Sirish Chandrasekaran
Joins over R and S: Arrival of New Query Specification S-Data Store ID S.a S.b 21 2 2 25 3 3 36 4 4 49 5 5 Query Store Matches R-Data Store ID Predicate 20 R.a=5 and R.b<S.b ID R.a R.b 21 R.a>4 and R.b<S.b and S.a<10 10 2 5 } 22 R.b=4 and R.a+5>S.a and S.b>2 14 3 3 PROBE 31 4 1 23 R.a<5 and R.a>S.a and S.b>1 48 9 7 PSoup (d) Probing R-Data Store Sirish Chandrasekaran
Joins over R and S: Arrival of New Query Specification S-Data Store Hybrid Structs ID S.a S.b R.ID Q.ID Q.Predicate 21 2 2 10 23 2>S.a and S.b>1 25 3 3 14 23 3>S.a and S.b>1 36 4 4 31 23 4>S.a and S.b>1 49 5 5 Query Store R-Data Store ID Predicate Matches 20 R.a=5 and R.b<S.b ID R.a R.b 21 R.a>4 and R.b<S.b and S.a<10 10 2 5 } 22 R.b=4 and R.a+5>S.a and S.b>2 14 3 3 31 4 1 23 R.a<5 and R.a>S.a and S.b>1 48 9 7 PSoup (e) Constructing Hybrid Structs Sirish Chandrasekaran
Joins over R and S: Arrival of New Query Specification S-Data Store Hybrid Structs Results ID S.a S.b Matches R.ID Q.ID Q.Predicate R,S,Q { 21 2 2 10 23 2>S.a and S.b>1 ? 25 3 3 PROBE 14 23 3>S.a and S.b>1 ? 36 4 4 31 23 4>S.a and S.b>1 ? 49 5 5 Query Store R-Data Store ID Predicate 20 R.a=5 and R.b<S.b ID R.a R.b 21 R.a>4 and R.b<S.b and S.a<10 10 2 5 22 R.b=4 and R.a+5>S.a and S.b>2 14 3 3 31 4 1 23 R.a<5 and R.a>S.a and S.b>1 48 9 7 PSoup (f) Probing S-Data Store Sirish Chandrasekaran
Joins over R and S: Arrival of New Query Specification S-Data Store Hybrid Structs Results ID S.a S.b Matches R.ID Q.ID Q.Predicate R,S,Q { 21 2 2 10 23 2>S.a and S.b>1 14,21,23 25 3 3 PROBE 14 23 3>S.a and S.b>1 31,21,23 36 4 4 31 23 4>S.a and S.b>1 31,25,23 49 5 5 Query Store R-Data Store ID Predicate 20 R.a=5 and R.b<S.b ID R.a R.b 21 R.a>4 and R.b<S.b and S.a<10 10 2 5 22 R.b=4 and R.a+5>S.a and S.b>2 14 3 3 31 4 1 23 R.a<5 and R.a>S.a and S.b>1 48 9 7 PSoup (f) Probing S-Data Store Sirish Chandrasekaran
Joins over R and S: Arrival of New Data S-Data Store ID S.a S.b 48 4 4 49 5 3 52 3 2 R-Data Store Query Store ID Predicate ID R.a R.b 20 R.a=5 and R.b<S.b 47 4 3 21 R.a>4 and R.b<S.b and S.a<10 50 5 3 22 R.b=4 and R.a+5>S.a and S.b>2 51 3 8 23 R.a<4 and R.b<S.b PSoup (a) Initial State Sirish Chandrasekaran
Joins over R and S: Arrival of New Data S-Data Store ID S.a S.b 48 4 4 49 5 3 52 3 2 R-Data Store Query Store ID Predicate ID R.a R.b 20 R.a=5 and R.b<S.b 47 4 3 21 R.a>4 and R.b<S.b and S.a<10 50 5 3 22 R.b=4 and R.a+5>S.a and S.b>2 51 3 8 23 R.a<4 and R.b<S.b PSoup New data 53 5 4 (b) Arrival of new Data Sirish Chandrasekaran
Joins over R and S: Arrival of New Data S-Data Store ID S.a S.b 48 4 4 49 5 3 52 3 2 R-Data Store Query Store ID Predicate ID R.a R.b 20 R.a=5 and R.b<S.b 47 4 3 21 R.a>4 and R.b<S.b and S.a<10 50 5 3 22 R.b=4 and R.a+5>S.a and S.b>2 51 3 8 23 R.a<4 and R.b<S.b 53 5 4 BUILD PSoup (c) Building R-Data Store Sirish Chandrasekaran
Joins over R and S: Arrival of New Data S-Data Store ID S.a S.b 48 4 4 49 5 3 52 3 2 R-Data Store Matches Query Store ID Predicate ID R.a R.b 20 R.a=5 and R.b<S.b 47 4 3 { 21 R.a>4 and R.b<S.b and S.a<10 50 5 3 22 R.b=4 and R.a+5>S.a and S.b>2 51 3 8 PROBE 23 R.a<4 and R.b<S.b 53 5 4 PSoup (c) Probing Query Store Sirish Chandrasekaran
Joins over R and S: Arrival of New Data S-Data Store Hybrid Structs ID S.a S.b R.ID Q.ID Q.Predicate 48 4 4 ? ? 4<S.b 49 5 3 53 21 ? 52 3 2 53 22 ? R-Data Store Query Store Matches ID Predicate ID R.a R.b 20 R.a=5 and R.b<S.b 47 4 3 { 21 R.a>4 and R.b<S.b and S.a<10 50 5 3 22 R.b=4 and R.a+5>S.a and S.b>2 51 3 8 23 R.a<4 and R.b<S.b 53 5 4 PSoup (d) Constructing Hybrid Structs Sirish Chandrasekaran
Joins over R and S: Arrival of New Data S-Data Store Hybrid Structs ID S.a S.b R.ID Q.ID Q.Predicate 48 4 4 53 20 4<S.b 49 5 3 53 21 4<S.b and S.a<10 52 3 2 53 22 10>S.a and S.b>2 R-Data Store Query Store Matches ID Predicate ID R.a R.b 20 R.a=5 and R.b<S.b 47 4 3 { 21 R.a>4 and R.b<S.b and S.a<10 50 5 3 22 R.b=4 and R.a+5>S.a and S.b>2 51 3 8 23 R.a<4 and R.b<S.b 53 5 4 PSoup (d) Constructing Hybrid Structs Sirish Chandrasekaran
Joins over R and S: Arrival of New Data Results S-Data Store Hybrid Structs R,S,Q ID S.a S.b R.ID Q.ID Q.Predicate } Matches 53,48,22 48 4 4 53 20 4<S.b 53,49,22 49 5 3 PROBE 53 21 4<S.b and S.a<10 52 3 2 53 22 10>S.a and S.b>2 R-Data Store Query Store ID Predicate ID R.a R.b 20 R.a=5 and R.b<S.b 47 4 3 21 R.a>4 and R.b<S.b and S.a<10 50 5 3 22 R.b=4 and R.a+5>S.a and S.b>2 51 3 8 23 R.a<4 and R.b<S.b 53 5 4 PSoup (e) Probing S-Data Store Sirish Chandrasekaran
Other Queries • N-way Joins • Similar to 2-way joins • Probe, generate hybrid structs, repeat • Can be executed without intermediate tables • Aggregations • Performed at query invocation • Uses n-ary ranked tree, clustered on time Sirish Chandrasekaran
Telegraph Background: CACQ • CACQ [MSHR02] • Shared execution of multiple queries with one Eddy • Tuple lineage • Query Indices • Queries and Data treated very differently • Only Landmark Continuous Queries • No support for disconnected operation Sirish Chandrasekaran
PSoup in Telegraph • Leverage SteMs to store and index queries • Changes to Eddies • Encode queries as tuples • break Where clause into individual boolean factors (BF) • encode each BF as R.a relop [R.b|S.b] [+|-] constant • Stream Prefix Consistency • A new query or data tuple is completely processed before any other tuple: no holes in Result Structure. • Results Structure: to buffer the results. Sirish Chandrasekaran
Experiments and Results • Alternatives • NoMat – No background processing • PSoup-Partial – background processing, apply current window on invocation • PSoup-Complete – current windows are also continuously applied in the background • Experimental Parameters • Unloaded Server with two Intel Pentium III, 666 MHz processors with 768 MB RAM • Data arrives as fast as possible, in domain [0,255] • Queries of form R.a relop C, where c in [0,255] • Join Queries of form R.a relop S.b +/- C. Sirish Chandrasekaran
Experiments: Response Time vs. Window Size • Interval Predicates, Selection Queries Sirish Chandrasekaran
Experiments: Response Time vs. Window Size • Equality Predicates, Selection Queries Sirish Chandrasekaran
Experiments: Max data arrival rate vs. #SQCs • Window Size = 1000 tuples Sirish Chandrasekaran
PSoup in traditional query processor • PSoup = SQL QUERY over data and client query streams? • Joins = expression evaluators • Notes • Conventional QPs do not have tuple lineage • Conventional QPs always use intermediate tables Sirish Chandrasekaran
Conclusions • Treating Queries and Data the same • Combines approaches for previously studied queries • Queries over the past and continuous queries • Allows new functionality – hybrid queries • Separating Result Generation and Delivery • Makes disconnected operation feasible • Efficient support for repeated query invocations Sirish Chandrasekaran