1 / 19

Windows in Niagara

Windows in Niagara. Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte. Overview. Make Windows Explicit Tag tuples with a window id Standard operators don’t know about different kinds of windows - work with window ID attribute Use Punctuation Infrastructure

newton
Download Presentation

Windows in Niagara

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte

  2. Overview • Make Windows Explicit • Tag tuples with a window id • Standard operators don’t know about different kinds of windows - work with window ID attribute • Use Punctuation Infrastructure • Punctuation signals end of window • No need for specialized window operators – just use punctuate-aware operators • Flexible • Window on system time, external time or tuple-based • Data can arrive and be processed out of order

  3. Niagara Control Structure • Push-based (pipelined) system. • Each operator is a thread. • Operators are connected by queues of tuples. • Operators wait on input queue, when tuple is ready, it is processed and result is inserted in output queue select unnest (path expr) streamscan

  4. bid bidderid:501 price: $5.00 bid bidderid:501 price: $5.00 Niagara Query Execution Query: Find all bids that bidder with id = 501 has made. select (bidderid = 501) Uses a path expression to extract matching elements from input tuples. unnest (bid.bidderid) streamscan Reads and parses data from a stream. <bid> <bidderid> 501 </bidderid> <itemid> 42 </itemid> <price> $10.00 </price> </bid>

  5. speaker slide_number * Kristin

  6. bid auctionid bidderid price datetime auctionsite NEXMark Schema Streams: auction itemname seller category id description reserve expires auctionsite Note: bid.datetime and auction.expires are time generated at the source sites.

  7. Three Example Queries • All three queries are window aggregates, specifically, time-based window count • Query 1: use internal system time and internal punctuations • Query 2: use external timestamp and internal punctuations • Query 3: use external timestamp and external punctuations

  8. Query 1: Select the number of bids on each item in the past five minutes. Update the results every minute. Groupby(B1.auctionid, B.wid, count(*)) Bucketizer Add window ranges to tuples Punctuate end of window unnest (auctionid) Punctuator/Timestamper Add timestamp field to tuple Punctuate at end of minute Timer Timestamp = CURRENT_TIME streamscan (Bid) B1 SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid WindowGroupby(B1.auctionid, count(*)) unnest (auctionid) streamscan (Bid) B1

  9. Query 1 Details auctionid count 10 1 15 2 * * * 1-1 5:00 10 T1 * (,5:00] * 5:00 5:00 10 15 1-5 T3 T1 * ( ,5:00] 5:00 5:00 15 T2 T2 5:00 5:00 15 15 1-5 1-5 T3 T2 5:00 T3 5:00 T1 T3 T2 T1 SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid T1 T2 T3 groupby(B1.auctionid, B1.winId, count(*)) bucketizer unnest (auctionid) punctuator/timestamper 5:01 5:00 5:01 5:00 TS = timer Timestamp = CURRENT_TIME streamscan (Bid) B1

  10. Query 1 vs. Query 2 Query 1: SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid Select the number of bids on each item in the past five minutes. Update the results every minute. Query 2: SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every minute. “CQL2004”

  11. Query 2: Select the number of bids made at each auction site in the past five minutes. Update the results every minute. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite unnest (auctionsite, datetime) punctuator/enforcer Enforce datetime > current timestamp Punctuate at end of minute timer Timestamp = CURRENT_TIME – 5 MINUTES streamscan (Bid) B1

  12. Query 2: Select the number of bids made at each auction site in the past five minutes. Update the results every minute. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite punctuator/enforcer Enforce datetime > current timestamp Punctuate at end of minute timer Timestamp = CURRENT_TIME – 5 MINUTES unnest (auctionsite, datetime) streamscan (Bid) B1

  13. Query 2 vs. Query 3 Query 2: SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every minute. Query 3: SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes.

  14. T1 T2 Query 3: Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite unnest (auctionsite, datetime) streamscan (Bid) B1 Site A … … … … … Site B Site C

  15. Auctionsite, count, wid A A * * 2-2 1-1 A B B A 5:04 5:02 5:01 5:07 2-2 1-1 1-1 1-1 T2A T1A T1B T2B A (,5:05] A A B B 5:01 5:04 5:02 5:07 T2B T1B T2A T1A A (,5:10] T2 T1 window 1, site A: T1A Query 3 Details A, 1, 1 A, 1, 2 window 1, site B: T1B window 1, site B: T2B window 2, site A: T2A groupby(B1.auctionsite, B.winId, count(*)) SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite bucketizer Add window ranges to tuples Punctuate end of window unnest (auctionsite, datetime) Legend: Window 1: 5:00 – 5:05 Window 2: 5:05 – 5:10 streamscan (Bid) B1 Site A … … … … … Site B Site C

  16. Discussion • Bucketizer • Apply a function to the stream • Encapsulate window semantics • Punctuate-Aware e.g. punctuation on time -> punctuation on wid • Wid is used as a grouping/join attribute • Punctuator • Adds timestamp as an attribute - optional • Enforce punctuations - optional • Converts stream semantics to punctuations • Outputs punctuations • Punctuations signal the end of windows, results are output and state is purged

  17. Conclusions • Process window queries without specialized window operators • Flexible window semantics • Use punctuate-aware operators, introduce minimum number of new operators

  18. Future Work • Semantics of window operators • Performance of different implementations • Study affect of disorder • Groupby ? Window

  19. Questions? • … …

More Related