110 likes | 196 Views
Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks. Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France. Motivation. Phenomenal growth in Web usage Future trends
E N D
Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France
Motivation • Phenomenal growth in Web usage • Future trends • Switch from polling to notifications • Example: stock quotes, sports scores, weather, news, … • Yahoo! Alerts, MSN Mobile, AOL anywhere, InfoSpace, … • Complements the traditional polling model in Web • Event Distribution Network (EDN) • Distributed and scalable event distribution • Parallel the idea of Content Distribution Network (CDN) for event distribution • Built on top of a self-configuring overlay network of servers • Content-based publish/subscribe systems through in-network processing of aggregated subscription filters
Model of Content-based Pub/Sub • Content-based filtering/routing • Event schema with d attributes, supporting equality and range predicates • Event: a point in the d–dimensional space • Subscription: a rectanglein that space • Match: a rectangle contains the point
Subscription Partitioning • Basic idea: similarity-based clustering for reducing total event traffic • Event Space Partitioning(ESP) • Filter Set Partitioning (FSP)
Equality Predicates • Hash predicates to get uniform distribution • Treat the hashed domain as the event space • Use Event Space Partitioning • Subscription is a point; does not intersect multiple sub-spaces • Use over-partitioning for better load balancing • Use offline greedy algorithm to assign buckets to servers for load balancing • Use indirection table to dynamically map buckets to servers for load re-balancing • Use bloom filters to further reduce traffic • Fast detection of true negatives at the expense of (very low) false-positive rate
Simulation Results • Actual Notification Money log • 1.48M subscriptions with 0.29M unique filters over 21,741 stock symbols • Zipf-like distribution
Simulation Results (Cont.) • Simulate 100M new subscriptions from 43,734 symbols • Scaled-up Zipf-like distribution • Perturbation and permutation • Uniform distribution • 50 servers with over-partitioning ratio = 10 • Without load re-balancing • Load imbalance (max/min) ranged from 1.41 to 6.66 (Uniform case) • With imbalance threshold of 2.0 • Re-balancing was triggered only 5 times, each time involving re-assignment of up to 3 buckets and migration of up to 0.7% subscriptions.
Range Predicates • Use Filter Set Partitioning • K-Mean clustering • Use center point to represent a rectangle • R-tree-based clustering • R-tree: dynamic index structure for multi-dimensional data rectangles • Offline R-tree algorithm • Exhaustively and recursively search for partitions that minimize sum of bounding rectangle volumes • Online R-tree algorithm • Insert from root down the path that greedily minimizes the increase in bounding rectangle volume • Simulation results • Off-line R-tree > On-line R-tree > K-Mean > Random
Related Work • Pub/Sub systems • Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, JEDI, Keryx, Ready, Scribe, Siena, … • Clustering in the pub/sub • All the previous work focus on reducing # multicast groups [OAA+00, RLW+02, WKM00]
Summary • Proposed two subscription partitioning and routing approaches • Event Space Partitioning • Filter Set Partitioning • Evaluated performance via simulations • Subscription partitioning reduces network traffic • Over-partitioning helps to achieve good load balancing dynamically • Bloom filter further reduces event traffic