190 likes | 315 Views
Windows in Niagara. Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte. Overview. Make Windows Explicit Tag tuples with a window id Standard operators don’t know about different kinds of windows - work with window ID attribute Use Punctuation Infrastructure
E N D
Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte
Overview • Make Windows Explicit • Tag tuples with a window id • Standard operators don’t know about different kinds of windows - work with window ID attribute • Use Punctuation Infrastructure • Punctuation signals end of window • No need for specialized window operators – just use punctuate-aware operators • Flexible • Window on system time, external time or tuple-based • Data can arrive and be processed out of order
Niagara Control Structure • Push-based (pipelined) system. • Each operator is a thread. • Operators are connected by queues of tuples. • Operators wait on input queue, when tuple is ready, it is processed and result is inserted in output queue select unnest (path expr) streamscan
bid bidderid:501 price: $5.00 bid bidderid:501 price: $5.00 Niagara Query Execution Query: Find all bids that bidder with id = 501 has made. select (bidderid = 501) Uses a path expression to extract matching elements from input tuples. unnest (bid.bidderid) streamscan Reads and parses data from a stream. <bid> <bidderid> 501 </bidderid> <itemid> 42 </itemid> <price> $10.00 </price> </bid>
speaker slide_number * Kristin
bid auctionid bidderid price datetime auctionsite NEXMark Schema Streams: auction itemname seller category id description reserve expires auctionsite Note: bid.datetime and auction.expires are time generated at the source sites.
Three Example Queries • All three queries are window aggregates, specifically, time-based window count • Query 1: use internal system time and internal punctuations • Query 2: use external timestamp and internal punctuations • Query 3: use external timestamp and external punctuations
Query 1: Select the number of bids on each item in the past five minutes. Update the results every minute. Groupby(B1.auctionid, B.wid, count(*)) Bucketizer Add window ranges to tuples Punctuate end of window unnest (auctionid) Punctuator/Timestamper Add timestamp field to tuple Punctuate at end of minute Timer Timestamp = CURRENT_TIME streamscan (Bid) B1 SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid WindowGroupby(B1.auctionid, count(*)) unnest (auctionid) streamscan (Bid) B1
Query 1 Details auctionid count 10 1 15 2 * * * 1-1 5:00 10 T1 * (,5:00] * 5:00 5:00 10 15 1-5 T3 T1 * ( ,5:00] 5:00 5:00 15 T2 T2 5:00 5:00 15 15 1-5 1-5 T3 T2 5:00 T3 5:00 T1 T3 T2 T1 SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid T1 T2 T3 groupby(B1.auctionid, B1.winId, count(*)) bucketizer unnest (auctionid) punctuator/timestamper 5:01 5:00 5:01 5:00 TS = timer Timestamp = CURRENT_TIME streamscan (Bid) B1
Query 1 vs. Query 2 Query 1: SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid Select the number of bids on each item in the past five minutes. Update the results every minute. Query 2: SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every minute. “CQL2004”
Query 2: Select the number of bids made at each auction site in the past five minutes. Update the results every minute. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite unnest (auctionsite, datetime) punctuator/enforcer Enforce datetime > current timestamp Punctuate at end of minute timer Timestamp = CURRENT_TIME – 5 MINUTES streamscan (Bid) B1
Query 2: Select the number of bids made at each auction site in the past five minutes. Update the results every minute. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite punctuator/enforcer Enforce datetime > current timestamp Punctuate at end of minute timer Timestamp = CURRENT_TIME – 5 MINUTES unnest (auctionsite, datetime) streamscan (Bid) B1
Query 2 vs. Query 3 Query 2: SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every minute. Query 3: SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes.
T1 T2 Query 3: Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite unnest (auctionsite, datetime) streamscan (Bid) B1 Site A … … … … … Site B Site C
Auctionsite, count, wid A A * * 2-2 1-1 A B B A 5:04 5:02 5:01 5:07 2-2 1-1 1-1 1-1 T2A T1A T1B T2B A (,5:05] A A B B 5:01 5:04 5:02 5:07 T2B T1B T2A T1A A (,5:10] T2 T1 window 1, site A: T1A Query 3 Details A, 1, 1 A, 1, 2 window 1, site B: T1B window 1, site B: T2B window 2, site A: T2A groupby(B1.auctionsite, B.winId, count(*)) SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite bucketizer Add window ranges to tuples Punctuate end of window unnest (auctionsite, datetime) Legend: Window 1: 5:00 – 5:05 Window 2: 5:05 – 5:10 streamscan (Bid) B1 Site A … … … … … Site B Site C
Discussion • Bucketizer • Apply a function to the stream • Encapsulate window semantics • Punctuate-Aware e.g. punctuation on time -> punctuation on wid • Wid is used as a grouping/join attribute • Punctuator • Adds timestamp as an attribute - optional • Enforce punctuations - optional • Converts stream semantics to punctuations • Outputs punctuations • Punctuations signal the end of windows, results are output and state is purged
Conclusions • Process window queries without specialized window operators • Flexible window semantics • Use punctuate-aware operators, introduce minimum number of new operators
Future Work • Semantics of window operators • Performance of different implementations • Study affect of disorder • Groupby ? Window
Questions? • … …