110 likes | 227 Views
Semantics and Evaluation Techniques for Window Aggregates in Data Stream. Jin Li, David Maier, Kristin Tufte, Vassillis Papadimos, Peter Tucker. Presented by: Venkatesh Raghvan Charudatta Wad CS 525 Class discussion. Overview. Background Problem Statement Window semantics
E N D
Semantics and Evaluation Techniques for Window Aggregates in Data Stream Jin Li, David Maier, Kristin Tufte, Vassillis Papadimos, Peter Tucker. Presented by: Venkatesh Raghvan Charudatta Wad CS 525 Class discussion
Overview • Background • Problem Statement • Window semantics • WID approach • Discussion
Background • Disorders Handling: Punctuations. • Aggregate Queries: • In SQL? • In CQL? (without WIDs) • In sliding windows, what causes an output?
Problem Statement • Lack of explicit window semantics. • Implementation efficiency. • Out of order arrival of data.
Running Example • Consider the example from the paper: • Schema <seg-id, speed, ts> • Query: SELECT seg-id, max(speed), min(speed) FROM Traffic [Range 300 seconds SLIDE 60 seconds WATTR ts] GROUP BY seg-id.
Running Example - This picture is taken from the paper itself.
Big Picture • Mapping of tuples to window extents and vice versa. • New Window semantics. • Window specifications: RANGE, SLIDE and WATTR.
Window specification • Time based query: • Counting the number of vehicles in each segment for the past 1 hour, update the result every 20 min. SELECT seg-id, count(*) FROM Traffic [RANGE 60 minutes SLIDE 20 minutes WATTR ts] GROUP BY seg-id.
Window specification • Tuple-based query: • Counting the number of vehicles in each segment for the past 100 rows, update the result every 10 rows. SELECT seg-id, count(*) FROM Traffic [RANGE 100 rows SLIDE 10 rows WATTR row-num] GROUP BY seg-id.
Window specification • Can we specify RANGE and SLIDE on different attributes: • YES!! SELECT seg-id, count(*) FROM Traffic [RANGE 300 seconds SLIDE 10 rows RATTR ts SATTR row-num] GROUP BY seg-id.
WID Approach • Explained by Venky.