410 likes | 553 Views
Overlay Network Construction in Highly Decentralized Networks. Melih Onus PhD Thesis Defense. Committee: Andrea W. Richa (Chair) Goran Konjevod Rida Bazzi Christian Scheideler. April 14 , 200 9 , Arizona State University.
E N D
Overlay Network Construction in Highly Decentralized Networks Melih Onus PhD Thesis Defense Committee: Andrea W. Richa (Chair) Goran Konjevod Rida Bazzi Christian Scheideler April14, 2009, Arizona State University
Publish/Subscribe (Pub/Sub) {A,B,C,E,} Subscription(N1)={B,C,D} N2 {A,D} N1 N3 M1 Message Bus M1 {A,X} Publish(M1, A) N5 M1 N4 {A,B,X}
Scalability of Pub/Sub • Most traditional pub/sub systems are geared towards small scale deployment • E.g., Isis MDS, TIB, MQSeries, Gryphon • New generation of applications… • Large data centers: Amazon, Google, Yahoo, EBay,… • RSS, feed/news readers, on-line stock trading and banking • Web 2.0, Second Life • …drive dramatic growth in scale • 10,000s of nodes, 1000s of topics, Internet-wide distribution • Emerging systems address this trend using P2P techniques
Overlay-Based Pub/Sub Relay {A,B,C,E} {B,C,D} (M1, A) N2 {A,D} N1 N3 (M1, A) (M1, A) (M1, A) • SCRIBE • Corona • Feedtree • Sub-2-Sub • TERA • ... N5 (M1, A) {A,X} N4 {A,B,X}
Overlay Topologies for Pub/Sub • “Good”overlay will allow for efficient and simple publication routing • Small routing tables, low load on relays, • low latency • Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub-graph • Most existing implementations construct topic-connected overlays
Topics B,C,X,E are connected Topics A and D are disconnected Topic-Connectivity {A,B,C,E} {B,C,D} N2 {A,D} N1 N3 N5 {A,X} N4 {A,B,X}
Node degree grows linearly with the subscription size • Roughly twice as big as the subscription size for rings/trees Topic-Connectivity: Simple Solution {A,B,C,E} {B,C,D} N2 {A,D} N1 N3 N5 {A,X} N4 {A,B,X}
Scalability of the Simple Solution • Negative impact on performance due to • CPU load: neighbor monitoring, message processing • Connection maintenance and header overhead • Memory overhead: per-link state associated with routing and/or compression schemes being used, etc. • Scalability barrier for large systems offering a wide range of subscription choices Can we do better?
Outline • Minimum Maximum Degree Publish-Subscribe Overlay Network Design • Parameterized Maximum and Average Degrees in Publish-Subscribe Overlay Network Design • Constant Diameter Publish-Subscribe Overlay Network Design
The MinMax-TCO Problem • Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem: • For a set of nodes V, set of topics T, and Interest: V T {true, false} • Construct a topic-connected overlay G with the minimum possible maximum degree • TCO (decision version): • Decide whether there is a topic-connected overlay with maximum degree k (for a given k)
GM Algorithm • The GM algorithm can have maximum degree of (n), when constant maximum degree overlay network exists.
Complexity of MinMax-TCO Lemma:MinMax-TCO(V,T,Interest,k)NP Proof: Topic connectivity is verifyable in polynomial time Lemma:MinMax-TCO(V,T,Interest,k) is NP-hard Proof: • Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node d • Set Cover is polynomially reducible to SN-TCO • SN-TCO is polynomially reducible to TCO Theorem: MinMax-TCO is NP-complete
Approximating MinMax-TCO • The idea: exploiting subscription overlaps • Connecting the nodes with overlapping interests improves connectivity of several topics at once • Overlay Design Algorithm (ODA): • Start from a singleton connected component for each (v, t) V T • At each iteration: add an edge that reduces the number of connected components for the biggest number of topics among the ones which increase maximum degree minimally • Stop, once there is a single connected component for each topic
Overlay Design Algorithm {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Overlay Design Algorithm {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Overlay Design Algorithm {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Overlay Design Algorithm {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Overlay Design Algorithm {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
Maximum degree of 2 vs. almost 4 for ring-per-topic! Overlay Design Algorithm {B,C,D} {A,B,C,E} N1 N2 {A,D} N3 N5 {A,X} N4 {A,B,X}
ODA Running Time • O(|V|4|T|) • At most |V|2 iterations • At most |V|2 edges inspected at each iteration • At most |T| steps to inspect an edge • Can be optimized to run in O(|V|2|T|) • For each e V V, weight(e) = the number of connectedcomponents merged by e • At each iteration, output the heaviest edge and adjust the other edge weights accordingly • Stop once there are no more edges with weight > 0
Approximability Results Lemma:The number of edges in the overlay constructed by GM log(|V||T|) OPT Proof: Similar to that of the approximation ratio of the greedyalgorithm for Set Cover Uses Maximum Weighted Matching Uses Edge Coloring Theorem: No algorithm can approximate MinMax-TCO within a constant factor (unless P=NP) Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)
Experimental Results I Maximum Node Degree #topics: 100 #subscriptions: 10 Uniform distribution
Experimental Results II Average Node Degree #topics: 100 #subscriptions: 10 Uniform distribution
Experimental Results III Maximum Node Degree #topics: 100 #nodes: 100 Uniform distribution
Outline • Minimum Maximum Degree Publish-Subscribe Overlay Network Design • Parameterized Maximum and Average Degrees in Publish-Subscribe Overlay Network Design • Constant Diameter Publish-Subscribe Overlay Network Design
v3 v2 v2 v1 v3 vn v1 vn … … vn-1 vn-1 v2 v1 v3 vn … vn-1 ODA Algorithm • The ODA algorithm can have average degree of (n), when constant average degree overlay network exists.
ODA and GM Algorithms • GM Algorithm: Choose edge with maximum benefit • Average Degree: O(log nt) approximation • Maximum Degree: O(n) approximation • ODA Algorithm: Choose edge with maximum benefit among the ones that increases maximum degree minimally • Average Degree: O(n) approximation • Maximum Degree: O(log nt) approximation How to approximate both average and maximum degree?
Parameterized Algorithm • e1: Edge with maximum benefit • e2: Edge with maximum benefit among the ones that increases maximum degree minimally • If w(e2) > w(e1) / k, choose e2 • Otherwise, choose e1 1 < k < n
Algorithms • GM Algorithm: • Average Degree: O(log nt) approximation • Maximum Degree: O(n) approximation • ODA Algorithm: • Average Degree: O(n) approximation • Maximum Degree: O(log nt) approximation • P-ODA Algorithm: • Average Degree: O(k * log nt) approximation • Maximum Degree: O((n/k)*log nt) approximation
Outline • Minimum Maximum Degree Publish-Subscribe Overlay Network Design • Parameterized Maximum and Average Degrees in Publish-Subscribe Overlay Network Design • Constant Diameter Publish-Subscribe Overlay Network Design
Constant Diameter Overlays • Constant Diameter Topic-Connected Overlay (CD-TCO) problem: • For a set of nodes V, set of topics T, and Interest: V T {true, false} • Construct a topic-connected, constant diameter overlay G with the minimum possible average degree • The GM algorithm can have diameter of (n), where n is number of nodes in the pub/sub system.
Constant Diameter Overlay Algorithm • Constant Diameter Overlay Design Algorithm: • At each iteration: • Find number of neighbors for each node • Add a starwhich connects maximum number of nodes, • Remove topics which are connected by the star • Stop, once there is a single connected component for each topic Number of neighbors of node u:
Constant Diameter Overlay Algorithm I • Constant Diameter Overlay Design Algorithm I: • At each iteration: • Find weight for each node • Add a starwhich connects the node with maximum weight, • Remove topics which are connected by the star • Stop, once there is a single connected component for each topic Weight of node u:
Constant Diameter Overlay Algorithm II • Constant Diameter Overlay Design Algorithm II: • At each iteration: • Find number of neighbors for each node • Add a starwhich connects the node with maximum density, • Remove topics which are connected by the star • Stop, once there is a single connected component for each topic Density of node u:
Experimental Results I Average Node Degree Varying #nodes #topics: 100 #subscription: 10 Uniform distribution Only 2.3 times more edge
Experimental Results II Average Node Degree Varying #topics #nodes: 100 #subscription: 20 Uniform distribution Only 1.9 times more edge
Experimental Results III Average Node Degree Varying #subscription #nodes: 100 #topics: 100 Uniform distribution Only 1.8 times more edge
Conclusions • Formal study of the problem of designing efficient and scalable overlay topologies for pub/sub • Defined the problem (MinMax-TCO) capturing the cost of constructing topic-connected overlays • NP-Completeness, polynomial approximation, inapproximability results • Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs • Parameterized algorithm with low maximum and average degree • Defined the problem (CD-TCO), empirical results
Future Directions • Study dynamic case • Investigate other overlay design problems • Study distributed case • Partial knowledge of other node interest • Dynamically changing interest assignments • Proving diameter results theoretically
Publications • Parameterized Maximum and Average Degrees in Topic-based Publish-Subscribe Overlay Network Design, M. Onus and A. W. Richa,Submitted to 21st Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), August 2009. • Minimum Maximum Degree Publish-Subscribe Overlay Network Design, M. Onus and A. W. Richa, 28th Annual IEEE Conference on Computer Communications (INFOCOM), April 2009, Rio De Janeiro, Brazil. • Distributed Coloring with O(log n) bits, K. Kothapalli, M. Onus, C. Scheideler and C. Schindelhauer, To appear in Journal of Parallel and Distributed Computing (JPDC), 2008. • Linearization: Locally Self Stabilizing Sorting in Graphs, M. Onus, A. W. Richa, C. Scheideler, Workshop on Algorithm Engineering & Experiments (ALENEX), January 2007, New Orleans, Louisiana. • A Scalable Multilevel Algorithm for Community Structure Detection, H. Djidjev and M. Onus, 4th Workshop on Algorithms and Models for the Web-Graph (WAW), November 2006, Banff, Alberta. • Heuristics for Minimum Brauer Chain Problem, F.Gelgi and M.Onus, 21st International Symposium on Computer and Information Sciences (ISCIS), Springer LNCS 4263, November 2006, Istanbul, Turkey. • Distributed Coloring with O(log n) bits, K. Kothapalli, C. Scheideler, M. Onus and C. Schindelhauer, 20th IEEE Parallel & Distributed Processing Symposium (IPDPS), April 2006, Rhodes Island, Greece. • Efficient Broadcasting and Gathering in Wireless Ad-Hoc Networks, M. Onus, A. W. Richa, K. Kothapalli and C. Scheideler.International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN), December 2005, Las Vegas, Nevada. • Constant Density Spanners for Wireless Ad-Hoc Networks, K. Kothapalli, C. Scheideler, M. Onus and A. W. Richa. 17th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), July 2005, Las Vegas, Nevada.