Distributed event aggregation for content-based Publish/Subscribe systems

Distributed event aggregation for content-based Publish/Subscribe systems Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1 Kaiwen Zhang2 Hans-Arno Jacobsen2 1University of Oslo 2University of Toronto

Motivation: Intelligent Transport System (ITS) • Non-aggregate subscriptions • Accident reports • Traffic violation reports • Aggregate subscriptions • Count number of cars passing a street light per hour • Average speed of cars on a road segment per day • Information providers: road sensors, crowdsourced mobile apps • Information seekers: commuters, police, first responders, radio networks etc. http://www.wired.com/images_blogs/autopia/2012/08/12A914.jpg

Aggregation in pub/sub • Pub/sub is well known for efficient content filtering and dissemination for distributed event sources and sinks. • However, pub/sub does not support aggregation, which is required in emerging applications. • Our primary objective is to retain the traditional pub/sub focus on low communication cost, while adding support for aggregation.

Contributions: aggregation in pub/sub • We propose a framework and baseline approaches for aggregation in content-based pub/sub systems (CBPS). • We show how the relative performance of the baseline approaches varies with workload properties. • We propose a per-broker distributed adaptive approach.

Advertisement-based pub/sub model Broker P[val,8] A[val, > ,4] Subscription Delivery Tree (SDT) Subscriber Publishers BI S[val, > ,3] Bp Bq BS BI B

Comparison with stream processing

Publication filtering procedure (PFP) Proposed aggregation framework • Subscription: • { RoadID = 101, speed > 10, op=‘avg’ , Duration (ω) = 2 hour, shift size (δ) = 1 hour} Pub2 Pub3 Pub1 NWR1 NWR2 subscription NWR3 0 1 2 3 Time • Notification window ranges (NWR) A single publication can participate in several NWRs, even for the same subscription.

Initial computation procedure (ICP) Publication filtering procedure (PFP) Proposed aggregation framework Pub2 Pub3 Pub1 NWR1 subscription NWR2 NWR3 x 0 1 2 3 Time • Notification window ranges (NWR) Outgoing messages: { avg(Pub1, Pub2, Pub3), avg(Pub2, Pub3) } Outgoing messages: { avg(Pub1, Pub2), avg(Pub2), Pub3 } Processing start time presents a trade-off between communication cost and end-to-end delay.

Initial computation procedure (ICP) Recurrent processing procedure (RPP) Publication filtering procedure (PFP) Proposed aggregation framework avgp Collection delay avgq Bp BI Bq avgpq Collection delay is another parameter affecting the delay-communication trade-off.

Late aggregation approach RPP PFS ICP P[val,3] P[val,5] Messages exchanged in Late aggregation: 6 BI Bp Subscriber Publishers BS Bs Smin[val,>,2] Bq P[val,9] P[Valmin,3] P[val,2] Late approach aggregates messages at subscriber-edge brokers. 10

Early aggregation approach RPP PFS ICP P[val,3] P[val,5] Messages exchanged in Late aggregation: 6 BA Messages exchanged in Early aggregation: 3 P[valmin,3] Subscriber Publishers BI Bq Bp BS P[valmin,3] Smin[val,>,2] P[valmin,9] P[val,9] P[valmin,3] P[val,2] Early approach aggregates messages at publisher-edge brokers. 11

Early does not always outperform Late P[val,3] P[val,5] P[valmin,3] P[valmax,5] P[valcount,3] P[valcount,2] P[valmax,9] Smin[val,>,2] P[valmin,3] Smax[val,>,2] BS Bp Bq BI Scount[val,>,2] P[valcount,1] P[valmax,9] P[valmin,9] P[val,9] P[val,2] Late aggregation Messages exchanged: 6 Early aggregation Messages exchanged: 9 12

Comparison between Early and Late Several parameters affect the performance of our baselines: Reducing the communication cost requires an adaptive solution

Benefits of adaptive aggregation P[val,3] P[val,5] BF P[valmin,3] P[valmin,3] Smin[val,>,2] P[val,9] S[val,>,6] BS BA BA Bq BI Bp BA P[valmin,9] P[val,9] P[val,9] P[val,2] 14

Benefits of adaptive aggregation P[val,3] P[val,5] Bq BI P[valmin,3] P[valmin,3] Smin[val,>,2] P[val,9] S[val,>,6] BI BA BA BS Bq Bp P[val,9] P[val,9] P[val,2] Per-broker adaptation reduces communication cost 15

Adaptation process (MAPE-K) Plan Analyze • Compare the ratio between Pubs vs. NWRs • Estimate the notification rate • Choose the suitable mode • Transition between aggregate and forward mode Knowledge • Information at a broker • Registered subscriptions • Current execution mode Execute Monitor • Matching publications within sampling period • Changes in subscription set • Start/stop aggregation at broker General framework with a parametric cost model

Experimental setup B B B B B B B B B B B B B B B B • Implemented in Java over the PADRES framework • Topology: 16 brokers • Combination of publisher-edge only, subscriber-edge only and mixed brokers • Real life datasets: • Traffic dataset from the ONE-ITS service1 • Yahoo! Finance Stock dataset • Metrics: • Number of messages exchanged • Processing overhead • End-to-end delay 1http://one-its-webapp1.transport.utoronto.ca

Results (Stock dataset) Decision becomes more accurate when available information is sufficient Varying Publication/second Varying number of subscriptions • Early perform better at high pub rates whereas Late is better with large number of subscriptions. • Adaptive aggregation performs close to the best among Early and Late for all settings.

Results (Traffic dataset) Varying Publication/second Varying number of subscriptions Per-Broker adaptation can cause individual brokers to make incorrect decisions

Processing overhead (Stock) Aggregation-related overhead Predicate matching cost Adaptation overhead is dominating the aggregation overhead

Conclusions • We provide an aggregation framework for CBPS with baseline solutions. • We demonstrate that neither baseline is dominant and depends upon workload parameters. • We provide a generic adaptive aggregation framework. • We experimentally demonstrate that our distributed adaptive solution performs close to the best baseline across all settings.

Thank you! For questions and comments Contact: navneet@ifi.uio.no

Motivation: stock market application • Information providers: stock exchanges • Information seekers: brokers, buyers • Aggregate subscriptions: • Stock market indicators (eg. MACD) • Non-aggregate subscriptions: • Stock value updates http://opinion-forum.com/index/wp-content/uploads/2012/08/stock_market.jpg

Aggregation semantics • ω = 2 hour, δ = 1 hour, δ ω • ω = δ = 2 hour, δ ω • ω = 2 hour, δ = 24 hour, δ ω • Window parameters • Window shift size (δ) • Duration (ω) • Example • Sliding window: Moving average of the number of cars passing a street light per hour. • Tumbling window: Average speed of cars on a road segment. • Hoping window: Number of cars crossing during rush hour.

Challenges of adaptive deployment • Data flow is hard to predict: • Irregular event rates at the publishers • Dynamic number of subscriptions • Coupled with dynamic content matching • Brokers function autonomously • Compatible solution: • Congruent to Pub/Sub routing standards • Minimum impact over QoS for regular publications

Other experiments please refer our full paper. End to end delay Sensitivity towards sampling period Sensitivity towards Collection delay

Sensitivity analysis: Collection delay Increasing collection time reduces the number of messages but delays the delivery of result.

Publication process flow No No Any regular subscription matched? Any regular subscription matched? Yes Yes Send Tag as aggregated Timestamp publication if not No No Matched for aggregation Is broker aggregating? Yes Yes Enqueue for aggregation computation

Aggregation Basics Matching Publication NWR31 NWR32 Sliding Window sub3 NWR33 NWR34 NWR21 NWR22 NWR23 NWR24 sub2 Tumbling Window NWR31 NWR32 NWR33 sub1 Sampling Window 0 1 2 3 4 5 6 7 Time Notification window Ranges

Motivation Pub/Sub is well known for efficient content filtering and dissemination for distributed event source and syncs. Content-based Pub/Sub does not supports time-based aggregation.

Pub/Sub systems :- a popular communication paradigm Business process[4] RSS filtering[1] work- flow management[5] stock- market monitoring[3] social interaction[2] network monitoring and management[6] Researches in Pub/sub have traditionally focused on performance than extending functionality.

Event distribution systems such as ITS demand aggregation filters • Moving average of the number of cars passing a street light per hour. • Average speed of cars on a road segment. • Number of cars crossing a highway during rush hour.

Scope of our solution Acyclic overlay Broker federated Pub/Sub Advertisement based forwarding model Time based aggregation

Distributed event aggregation for content-based Publish/Subscribe systems