290 likes | 424 Views
Minimum Maximum Degree Publish-Subscribe Overlay Network Design. Melih Onus. TOBB Ekonomi ve Teknoloji Üniversitesi, 28 Mayıs 2009. Publish/Subscribe (Pub/Sub). {A,B,C,E,}. Subscription (N1)={B,C,D}. N2. {A,D}. N1. N3. M1. Message Bus. M1. {A,X}. Publish(M1, A). N5. M1. N4.
E N D
Minimum Maximum Degree Publish-Subscribe Overlay Network Design Melih Onus TOBB Ekonomi ve Teknoloji Üniversitesi, 28 Mayıs 2009
Publish/Subscribe (Pub/Sub) {A,B,C,E,} Subscription(N1)={B,C,D} N2 {A,D} N1 N3 M1 Message Bus M1 {A,X} Publish(M1, A) N5 M1 N4 {A,B,X}
Scalability of Pub/Sub • Most traditional pub/sub systems are geared towards small scale deployment • E.g., Isis MDS, TIB, MQSeries, Gryphon • New generation of applications… • Large data centers: Amazon, Google, Yahoo, EBay,… • RSS, feed/news readers, on-line stock trading and banking • Web 2.0, Second Life • …drive dramatic growth in scale • 10,000s of nodes, 1000s of topics, Internet-wide distribution • Emerging systems address this trend using P2P techniques
Overlay-Based Pub/Sub Relay {A,B,C,E} {B,C,D} (M1, A) N2 {A,D} N1 N3 (M1, A) (M1, A) (M1, A) • SCRIBE • Corona • Feedtree • Sub-2-Sub • TERA • ... N5 (M1, A) {A,X} N4 {A,B,X}
Overlay Topologies for Pub/Sub • “Good”overlay will allow for efficient and simple publication routing • Small routing tables, low load on relays, • low latency • Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub-graph • Most existing implementations construct topic-connected overlays
Topics B,C,X,E are connected Topics A and D are disconnected Topic-Connectivity {A,B,C,E} {B,C,D} N2 {A,D} N1 N3 N5 {A,X} N4 {A,B,X}
Node degree grows linearly with the subscription size • Roughly twice as big as the subscription size for rings/trees Topic-Connectivity: Simple Solution {A,B,C,E} {B,C,D} N2 {A,D} N1 N3 N5 {A,X} N4 {A,B,X}
Scalability of the Simple Solution • Negative impact on performance due to • CPU load: neighbor monitoring, message processing • Connection maintenance and header overhead • Memory overhead: per-link state associated with routing and/or compression schemes being used, etc. • Scalability barrier for large systems offering a wide range of subscription choices Can we do better?
The MinMax-TCO Problem • Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem: • For a set of nodes V, set of topics T, and Interest: V T {true, false} • Construct a topic-connected overlay G with the minimum possible maximum degree • TCO (decision version): • Decide whether there is a topic-connected overlay with maximum degree k (for a given k)
GM Algorithm • The GM algorithm can have maximum degree of (n), when constant maximum degree overlay network exists.
Complexity of TCO Lemma: TCO(V,T,Interest,k)NP Proof: Topic connectivity is verifyable in polynomial time Lemma: TCO(V,T,Interest,k) is NP-hard Proof: • Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node d • Set Cover is polynomially reducible to SN-TCO • SN-TCO is polynomially reducible to TCO Theorem: TCO is NP-complete
Approximating Min-TCO • The idea: exploiting subscription overlaps • Connecting the nodes with overlapping interests improves connectivity of several topics at once • Overlay Design Algorithm (ODA): • Start from a singleton connected component for each (v, t) V T • At each iteration: add an edge that reduces the number of connected components for the biggest number of topics among the ones which increase maximum degree minimally • Stop, once there is a single connected component for each topic
Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Maximum degree of 2 vs. almost 4 for ring-per-topic! Greedy Merge {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
ODA Running Time • O(|V|4|T|) • At most |V|2 iterations • At most |V|2 edges inspected at each iteration • At most |T| steps to inspect an edge • Can be optimized to run in O(|V|2|T|) • For each e V V, weight(e) = the number of connectedcomponents merged by e • At each iteration, output the heaviest edge and adjust the other edge weights accordingly • Stop once there are no more edges with weight > 0
Approximability Results Lemma:The number of edges in the overlay constructed by GM log(|V||T|) OPT Proof: Similar to that of the approximation ratio of the greedyalgorithm for Set Cover Uses Maximum Weighted Matching Uses Edge Coloring Theorem: No algorithm can approximate MinMax-TCO within a constant factor (unless P=NP) Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)
Experimental Results I Maximum Node Degree #topics: 100 #subscriptions: 10 Uniform distribution
Experimental Results II Average Node Degree #topics: 100 #subscriptions: 10 Uniform distribution
Experimental Results III Maximum Node Degree #topics: 100 #nodes: 100 Uniform distribution
Constant Diameter Overlays • Constant Diameter Topic-Connected Overlay (CD-TCO) problem: • For a set of nodes V, set of topics T, and Interest: V T {true, false} • Construct a topic-connected, constant diameter overlay G with the minimum possible average degree • The GM algorithm can have diameter of (n), where n is number of nodes in the pub/sub system.
Constant Diameter Overlay Algorithm • The idea: adding stars • Make topics connected with star structures • Constant Diameter Overlay Design Algorithm: • Start from a singleton connected component for each (v, t) V T • At each iteration: • Add a starwhich connects maximum number of nodes, • Remove topics which are connected by the star • Stop, once there is a single connected component for each topic Number of neighbors of node u:
Experimental Results Average Node Degree #topics: 100 #nodes: 100 Uniform distribution Only 2.3 times more edge
Conclusions • Formal study of the problem of designing efficient and scalable overlay topologies for pub/sub • Defined the problem (MinMax-TCO) capturing the cost of constructing topic-connected overlays • NP-Completeness, polynomial approximation, inapproximability results • Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs • Defined the problem (CD-TCO), empirical results
Future Directions • Study dynamic case • Investigate other overlay design problems • Study distributed case • Partial knowledge of other node interest • Dynamically changing interest assignments