300 likes | 387 Views
Talk at the 29 th Australasian Computer Science Conference (ACSC2006) Pruning Subscriptions in Distributed Publish/Subscribe Systems. Sven Bittner and Annika Hinze , 18 January 2006. pub(item,price, timeLeft,…). User. pub(item,...). Subscription. pub(item,price, timeLeft,…).
E N D
Talk at the 29th Australasian Computer Science Conference (ACSC2006)Pruning Subscriptions in Distributed Publish/Subscribe Systems Sven Bittner and Annika Hinze, 18 January 2006
pub(item,price, timeLeft,…) User pub(item,...) Subscription pub(item,price, timeLeft,…) Notify about items of interest Motivation: Publish/Subscribe • Subscribers register subscriptions • Publishers send event messages • Systeminforms usingnotifications EBay Distributed Pub/Sub System TradeMe Filtering Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
AND title like “Harry Potter” OR endingWithin < 1 day AND AND price < 10.0 condition = NEW price < 15.0 condition = USED Motivation:Subscription Example • A subscriber is interested in books whose title contains the phrase “Harry Potter”. • According to the condition of the copy of the book (new, used), she wants to pay at most NZ$10.0 or NZ$15.0. • To avoid unnecessary notifications, the subscriber will be notified not earlier than one day before the auction ends. Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
Motivation: Problem Sizes • Online auctions • Subscriptions: > 106 (no. of users) • Events: > 20 / sec (new items and bids) • Notifications: not time-critical, but events must be processed permanently • Facility management • Subscriptions: > 50,000 (today’s systems) • Events: > 1,000 / sec (from sensors, switches) • Notifications: delay < 0.1 sec Time and spaceefficiency required Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
Structure • Motivation • Subscription Pruning • Selectivity Estimation • Evaluation of Subscription Pruning • Summary and Outlook Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
Structure • Motivation • Subscription Pruning • Selectivity Estimation • Evaluation of Subscription Pruning • Summary and Outlook Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
Current Optimizations • Work on conjunctive subscriptions only (−) Restricted subscription language Not applicable for general-purpose systems • Strong assumptions (−) • Similarities/relationships among subscriptions • Evaluations too simplistic for high-level applications We cannot generalize evaluation results Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
Subscription Generalization • Routing optimization for arbitrary Boolean subscriptions (+) • Optimizes subscriptions independently of each other (+) Optimization potential regardless of individual and collective subscription structure Favourable routing optimization for general- purpose pub/sub systems Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
Generalization by Pruning • Goals of pruning • Remove parts of subscription tree of non-local subscribers • Create more general subscription Less predicates to filter on (+) Less complex subscriptions (+) More events to filter (−) More time and space efficient filtering Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
Routing table Routing table Subscriber Un-optimized routing: Routing table Routing table Application of Pruning (1) • Forwarding of subscriptions for selective routing Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
Routing table Routing table Routing table Routing table Application of Pruning (2) • Forwarding of subscriptions for selective routing Subscriber • Less complex subscriptions • More time and spaceefficient filtering Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
Routing table Routing table Routing table Routing table Application of Pruning (3) • Forwarding of subscriptions for selective routing Subscriber • But more general subscriptions • More forwarded event messages • More event messages to filter Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
AND OR OR title like “Harry Potter” endingWithin < 1 day AND AND AND AND AND Remove unary operators price < 10.0 condition = NEW price < 15.0 condition = USED OR OR title like “Harry Potter” endingWithin < 1 day condition = USED AND AND condition = NEW price < 15.0 Example of Pruning (1) • Valid pruning - Remove child of conjunction Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
AND OR OR title like “Harry Potter” endingWithin < 1 day AND AND AND AND Remove unary/ Summarize consecutive operators price < 10.0 AND condition = NEW price < 15.0 condition = USED title like “Harry Potter” endingWithin < 1 day condition = NEW price < 15.0 Example of Pruning (2) • Invalid pruning - Remove child of disjunction No filtering of used books anymore! Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
Challenges of Pruning • Questions 1. Which subscription should be pruned first? 2. Which part of a subscription should be pruned? • Answer The subscription (1.) supporting a pruning (2.) that minimally influences the network traffic Utilize selectivities of subscriptions to determine effects of pruning on network load Motivation Subscription Pruning Selectivity Estimation Evaluation Outlook
Structure • Motivation • Subscription Pruning • Selectivity Estimation • Evaluation of Subscription Pruning • Summary and Outlook Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
Selectivity of Subscriptions • Calculation of selectivity for • Original subscriptions - Counting • Predicates - Counting/Approximation • Pruned subscriptions - No suitable method Estimate selectivities for subscriptions MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Selectivity Estimation: Idea • Using three easily computable estimates • Minimalselmin Worst case - smallest possible selectivity value for all distributions of events • Averageselavg Average case - assuming uniform distribution of all possible event messages and independent predicates in subscriptions • Maximalselmax Best case - largest possible selectivity value for all distributions of events MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
0.01 0.1 (0.13, 0.19, 0.2) (0.7, 0.72, 0.8) 0.2 0.93 0.9 0.8 Selectivity Estimation: Example • Selectivity of predicates via counting • Selectivity of subscriptions via estimation (0.0, 7.7e-4, 0.01) AND title like “Harry Potter” OR endingWithin < 1 day (0.7, 0.77, 1.0) AND AND price < 10.0 condition = NEW price < 15.0 condition = USED MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Selectivity Degradation • Absolute degradation when pruning sx to sy • Describes expected influence on network load • Maximal difference between three components max( selmin(sy) - selmin(sx), selavg(sy) - selavg(sx), selmax(sy) - selmax(sx)) MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Structure • Motivation • Subscription Pruning • Selectivity Estimation • Evaluation of Subscription Pruning • Summary and Outlook Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
Experiments: Goal • Evaluate general setting • Real-world subscriptions • Real-world attribute domains • Initial set of experiments • Evaluation of memory usage and real selectivity • Real selectivity shows expected network load MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Experiments: Setup • E-commerce setting (online book auctions) • Ten attributes, e.g., author, format and price • Events • Analysis of real-world distributions • Average for 1,000,000 messages • Subscriptions • Three typical classes involving conjunctions and disjunctions • 10,000 registered subscriptions MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Experiments: Results (1) • Setting involving all three subscription classes Expected increase in network load Memory usage Cut-off point MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Experiments: Results (2) • At cut-off point (Column 4) • Slight increase in selectivity (Column 2) • Strong reduction in memory usage (Column 3) Subscription class Class 1 Class 2 Class 3 Class 1–3 Increase in selectivity 0.009 0.012 0.016 0.026 Relief in memory 0.667 0.833 0.368 0.663 Cut-off point at percent of prunings 0.750 0.875 0.525 0.771 MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Structure • Motivation • Subscription Pruning • Selectivity Estimation • Evaluation of Subscription Pruning • Summary and Outlook Annika Hinze – Pruning Subscriptions in Distributed Publish/Subscribe Systems
Summary (1) • Motivation • Publish/subscribe (pub/sub) systems • Routing and optimizations in pub/sub • Subscription pruning • Drawbacks of current optimizations • Prune/remove parts of subscription trees • Pruning has to result in more general subscription MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Summary (2) • Selectivity estimation • Three values easily computable values • Degradation measure predicted influence of prunings • Practical analysis • Evaluation of real-world scenario • Setting with all subscriptions • Space usage decreased by 66% of maximal reduction • Only slight increase in network load MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Future Work • Integrate pruning in pub/sub prototype • Extended experiments • Measure network load, throughput and memory usage • Other real-world scenarios MotivationSubscription Pruning Selectivity Estimation Evaluation Outlook
Thank you for your attention! Contact: Annika Hinze a.hinze@cs.waikato.ac.nz