270 likes | 414 Views
Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks. Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France. Motivation. The increasing popularity of event notification
E N D
Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France
Motivation • The increasing popularity of event notification • Yahoo! Alerts, MSN Mobile, AOL anywhere, InfoSpace, … • Complements the traditional polling model in Web • Examples: stock quotes, sport scores, weather, news, … • Event Distribution Network (EDN) • Distributed and scalable event distribution • Parallel the idea of Content Distribution Network (CDN) for event distribution • Built on top of a self-configuring overlay network of servers • Content-based publish/subscribe systems through in-network processing of aggregated subscription filters • Versus simply extending topic-based pub/sub with all filtering processing at end servers
Subscription Partitioning • Basic idea: similarity-based clustering for reducing total event traffic • Event Space Partitioning(ESP) • Filter Set Partitioning (FSP)
Equality Predicates • Hash predicates to get uniform distribution • Treat the hashed domain as the event space • Use Event Space Partitioning • Subscription is a point; does not intersect multiple sub-spaces • Use over-partitioning for better load balancing • Use offline greedy algorithm to assign buckets to servers for load balancing • Use indirection table to dynamically map buckets to servers for load re-balancing • Use bloom filters to further reduce traffic • Fast detection of true negatives at the expense of (very low) false-positive rate
Simulation Results • Actual Notification Money log • 1.48M subscriptions with 0.29M unique filters over 21,741 stock symbols • Zipf-like distribution
Simulation Results (Cont.) • Simulate 100M new subscriptions from 43,734 symbols • Scaled-up Zipf-like distribution • Perturbation and permutation • Uniform distribution • 50 servers with over-partitioning ratio = 10 • Without load re-balancing • Load imbalance (max/min) ranged from 1.41 to 6.66 (Uniform case) • With imbalance threshold of 2.0 • Re-balancing was triggered only 5 times, each time involving re-assignment of up to 3 buckets and migration of up to 0.7% subscriptions.
Range Predicates • Use Filter Set Partitioning • K-Mean clustering • Use center point to represent a rectangle • R-tree-based clustering • R-tree: dynamic index structure for multi-dimensional data rectangles • Offline R-tree algorithm • Exhaustively and recursively search for partitions that minimize sum of bounding rectangle volumes • Online R-tree algorithm • Insert from root down the path that greedily minimizes the increase in bounding rectangle volume • Simulation results • Off-line R-tree > On-line R-tree > K-Mean > Random
Related Work • Pub/Sub systems • Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, JEDI, Keryx, Ready, Scribe, Siena, … • Clustering in the pub/sub • All the previous work focus on reducing # multicast groups [OAA+00, RLW+02, WKM00]
Summary • Proposed two subscription partitioning and routing approaches • Event Space Partitioning • Filter Set Partitioning • Evaluated performance via simulations • Subscription partitioning reduces network traffic • Over-partitioning helps to achieve good load balancing dynamically • Bloom filter further reduces event traffic
Simulation Results • 10,000 random subscriptions per server on average • Offline R-tree performs the best; reduces event traffic by 20% to 60%
Model of Content-based Pub/Sub • Content-based filtering • Event schema with d attributes, supporting equality and range predicates • Event: a point in the d–dimensional space • Subscription: a rectanglein that space • Match: a rectangle contains the point • Content-based routing • Based on a subset of attributes • Consider d’-dimensional points and rectangles where d’ ≤ d
EDN Network Architecture • Submit subscriptions • Subscription routing • Content-based route updates • Peer exchange of route updates • Content-based event routing • Notification delivery Event Src. 5 EDN nodes 3 3 2 5 4 1 Notification Routing Services 6 subscriber
Imprecise Summary Precise Summary • Optimize various performance metrics, subject to load-balancing constraints • Minimize total event traffic • Volume of union of rectangles • Maximize overall system throughput • Minimize end-to-end latency Subscription rectangles
4 3 2 Partition Existing Subscriptions Route Events Summary Reporting 1 5 Route New Subscriptions The EDN Optimization Problem Centralized Architecture Distributed Architecture Event Sources Notification Routing Service Server Subscribers
Three Research Directions • Theoretical Study • Optimal or approximation algorithms for simplified versions • System Design and Simulation • Subscription partitioning for reducing event traffic • Summary-based routing for enhancing system throughput • Indigo-based Implementation • Extensible routing & pub/sub architecture
System Design and Simulation:Summary-based Routing • Basic idea: summary precision-based load balancing for enhancing system throughput
If dispatcher is not the bottleneck, use precise summary. • Otherwise, reduce summary precision until either the outgoing link or the servers are about to become the bottleneck. • Throughput increasing • Further reduction of summary precision would generate excessive false-positive traffic to throttle back the dispatcher • Throughput decreasing
Simulation results • Imprecise summaries enhance throughput
Imprecise summaries combined with R-tree-based partitioning further enhance throughput
Dispatcher-to-link and dispatcher-to-sever bottleneck ratios
EDN on Herald • Piggyback subscription routing & summary reporting on multicast tree forming process • Need to additionally consider notification traffic (because subscribers are now part of multicast tree) Subscription Routing Subscriber
Indigo-based Implementation • Indigo M2 routing & pub/sub architecture was not extensible • EDN used M2 messaging and built a WS-compliant, extensible routing & pub/sub architecture on top of it • Close collaboration with Indigo • Extensibility proposals to Indigo • Some appeared in M3 • But most sealed for security for now • Some being considered for M4
EDN Extensible Routing and Pub/Sub Namespace Binding Layer EDN Subscription Manager EDN Route Manager MS Route Manager WS-Eventing Subscription Manager WS-Routing Route Manager EDN R-tree Matcher XPath Filter Matcher Indigo Messaging
Other XML-Messaging/Indigo interactions • State dependency management • Design tool for new features involving “state transplant” • E.g., System Restore (across time), Intellimirror (across space) • Repair tool providing consistent undo • System Restore + rollback of “atomic units” • GoBack3 + roll-forward of “atomic units” • Troubleshooting tool • Trace-diff & state-diff approaches • Our automatic, bottom-up, black-box discovery approach complements their manual, top-down, logical declaration approach (TravisM) • Install-time and run-time information augments the authoring-time information • Targeted problem spaces help identify things to declare for manageability