1 / 24

Constructing Scalable Overlays for Pub/Sub With Many Topics

Constructing Scalable Overlays for Pub/Sub With Many Topics. Problems, Algorithms, and Evaluation G. Chockler, R. Melamed, Y. Tock , IBM Haifa Research Lab R. Vitenberg , University of Oslo. Publish/Subscribe (Pub/Sub). {A,B,C,E,}. Subscription (N1)={B,C,D}. N2. {A,D}. N1. N3. M1.

lsheffield
Download Presentation

Constructing Scalable Overlays for Pub/Sub With Many Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed, Y. Tock, IBM Haifa Research Lab R. Vitenberg, University of Oslo

  2. Publish/Subscribe (Pub/Sub) {A,B,C,E,} Subscription(N1)={B,C,D} N2 {A,D} N1 N3 M1 Message Bus M1 {A,X} Publish(M1, A) N5 M1 N4 {A,B,X}

  3. Scalability of Pub/Sub • Most traditional pub/sub systems are geared towards small scale deployment • E.g., Isis MDS, TIB, MQSeries, Gryphon • New generation of applications… • Large data centers: Amazon, Google, Yahoo, EBay,… • RSS, feed/news readers, on-line stock trading and banking • Web 2.0, Second Life • …drive dramatic growth in scale • 10,000s of nodes, 1000s of topics, Internet-wide distribution • Emerging systems address this trend using P2P techniques

  4. Overlay-Based Pub/Sub Relay {A,B,C,E} {B,C,D} (M1, A) N2 {A,D} N1 N3 (M1, A) (M1, A) (M1, A) • SCRIBE • Corona • Feedtree • Sub-2-Sub • TERA • ... N5 (M1, A) {A,X} N4 {A,B,X}

  5. Overlay Topologies for Pub/Sub • “Good”overlay will allow for efficient and simple publication routing • Small routing tables, low load on relays, • low latency • Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub-graph • Most existing implementations construct topic-connected overlays

  6. Topics B,C,X,E are connected Topics A and D are disconnected Topic-Connectivity {A,B,C,E} {B,C,D} N2 {A,D} N1 N3 N5 {A,X} N4 {A,B,X}

  7. Node degree grows linearly with the subscription size • Roughly twice as big as the average subscription size for rings/trees Topic-Connectivity: Simple Solution {A,B,C,E} {B,C,D} N2 {A,D} N1 N3 N5 {A,X} N4 {A,B,X}

  8. Scalability of the Simple Solution • Negative impact on performance due to • CPU load: neighbor monitoring, message processing • Connection maintenance and header overhead • Memory overhead: per-link state associated with routing and/or compression schemes being used, etc. • Scalability barrier for large systems offering a wide range of subscription choices Can we do better?

  9. The Min-TCO Problem • Minimum Topic-Connected Overlay (Min-TCO) problem: • For a set of nodes V, set of topics T, and Interest: V  T {true, false} • Construct a topic-connected overlay G with the minimum possible number of edges (or average degree) • TCO (decision version): • Decide whether there is a topic-connected overlay consisting of k edges (for a given k)

  10. Complexity of TCO {B,C,D} {A,B} Lemma: TCO(V,T,Interest,k)NP Proof: Topic connectivity is verifyable in polynomial time Lemma: TCO(V,T,Interest,k) is NP-hard Proof: • Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node  d • Set Cover is polynomially reducible to SN-TCO • SN-TCO is polynomially reducible to TCO Theorem: TCO is NP-complete N5 N2 {A,D} N3 N1 N4 {A,B,C,D} {A,C}

  11. Approximating Min-TCO • The idea: exploiting subscription overlaps • Connecting the nodes with overlapping interests improves connectivity of several topics at once • Greedy Merge (GM) algorithm: • Start from a singleton connected component for each (v, t)  V  T • At each iteration: add an edge that reduces the number of connected components for the biggest number of topics • Stop, once there is a single connected component for each topic

  12. Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}

  13. Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}

  14. Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}

  15. Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}

  16. Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}

  17. Average degree of 2 vs. almost 3 for ring-per-topic! Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}

  18. GM Running Time • O(|V|4|T|) • At most |V|2 iterations • At most |V|2 edges inspected at each iteration • At most |T| steps to inspect an edge • Can be optimized to run in O(|V|2|T|) • For each e  V  V, weight(e) = the number of connected components merged by e • At each iteration, output the heaviest edge and adjust the other edge weights accordingly • Stop once there are no more edges with weight > 0

  19. Approximability Results Lemma: • The number of edges in the overlay constructed by GM  log(|V||T|) OPT Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover • There exists an input on which GM’s output meets this ratio Theorem: No algorithm can approximate Min-TCO within a constant factor (unless P=NP) Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)

  20. Practical Benefits

  21. More Overlay Design Problems • Filtering: Given an upper bound d on the node degree, minimize the number of relays used to connect each topic • Captures the cases when full topic-connectivity is infeasible because of resource constraints • Diameter: Given an upper bound d on the node degree, minimize the diameter of each topic in the overlay • Latency optimal routing under resource constraints • …

  22. Conclusions • Initiated formal study of the problem of designing efficient and scalable overlay topologies for pub/sub • Defined a representative problem (Min-TCO) capturing the cost of constructing topic-connected overlays • NP-Completeness, polynomial approximation, inapproximability results • Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs

  23. Future Directions • Study dynamic case • Investigate other overlay design problems • Study distributed case • Partial knowledge of other node interest • Dynamically changing interest assignments

  24. Thank You!

More Related