1 / 31

Divide and Conquer Algorithms for Pub/Sub Overlay Design

Divide and Conquer Algorithms for Pub/Sub Overlay Design. Chen Chen 1 joint work with Hans-Arno Jacobsen 1,2 , Roman Vitenberg 3 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto 3 Department of Informatics University of Oslo.

dinesh
Download Presentation

Divide and Conquer Algorithms for Pub/Sub Overlay Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Divide and Conquer Algorithms for Pub/Sub Overlay Design Chen Chen 1 joint work with Hans-Arno Jacobsen 1,2, Roman Vitenberg3 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto 3 Department of Informatics University of Oslo ICDCS’10 Genoa, Italy

  2. Example: Pub/Sub Interests: boy boy Interests: boy girl Interests: girl ICDCS’10 Genoa, Italy

  3. Pub/Sub • A communication paradigm • Subscribers express their interests • Publishers disseminate messages • Many applications and industry standards • Application integration, financial data dissemination, RSS feed distribution, business process management • WS Notifications, WS Eventing, OMGs’ Real-time Data Dissemination Service • Topic-based pub/sub • TIBCO RV • Google’s GooPS ICDCS’10 Genoa, Italy

  4. Two componentsin pub/sub implementation Design of routing protocols Construction of overlay The construction of the overlay topology such that network traffic is minimized. Chockler et al., PODC’07 Onus et al., INFOCOM’09 • The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network. • G. Li et al., ICDCS’08 • M. Castro et al., JSAC’02 ICDCS’10 Genoa, Italy

  5. Desirable properties for overlays Low average node degree Low fan-out of a node Low diameter Topic-connectivity Efficiency to construct Adaptability to churn Ease of distributed implementation ICDCS’10 Genoa, Italy

  6. Our contributions ICDCS’10 Genoa, Italy

  7. Topic-connectivity {b,c,d} {b,c,d} V1 V1 {a,c} {a} {a} {a,c} V5 V2 V5 V2 V4 V3 V4 V4 V3 {a,b} {b,d} {a,b} {b,d} {a,b} Suboverlay Ga is topic-connected Suboverlay Gbis NOT topic-connected An overlay G ICDCS’10 Genoa, Italy

  8. MinAvg-TCO problem {b,c,d} {b,c,d} V1 V1 {a,c} {a,c} {a} {a} V5 V2 V5 V2 V4 V3 V4 V3 {a,b} {b,d} {a,b} {b,d} TCO1 has 5 edges TCO2 has 10 edges ICDCS’10 Genoa, Italy

  9. MinAvg-TCO problem {b,c,d} V1 {a} V2 {a,c} V5 {a,b} V3 V4 {b,d} • A high-quality overlay • Topic-connectivity • Total number of edges • Input: • a set of nodes V, • a set of topics T, • the interest function Int • MinAvg-TCO(V,T,Int) (optimization version) Construct a TCO(V,T,Int,E) such that |E| is minimum. • Avg-TCO(V,T,Int,k) (decision version) Is there a TCO(V,T,Int,E) such that |E|=k? • Theorem: MinAvg-TCOis NP-complete ICDCS’10 Genoa, Italy

  10. Greedy-Merge (GM) algorithm • Greedy: always making the choice that looks best at the moment • GM for MinAvg-TCO: always adding an edge with maximum link contribution • Running Time: O(|V|2|T|) • Approximation Ratio: O(log(|V||T|)) ICDCS’10 Genoa, Italy

  11. Our contributions ICDCS’10 Genoa, Italy

  12. TCO join problem • Given p TCOs: TCOd (Vd,Td,Intd,Ed), d=1,..,p • MinAvg-TCO-Join(V,T,Int,p) (optimization version) Construct a TCO(V,T,Int,E) such that |E| is minimum • Avg-TCO-Join(V,T,Int,p,k) (decision version) Is there a TCO(V,T,Int,E) such that |E|=k? • MinAvg-TCO is a special case of MinAvg-TCO-Join: Theorem: MinAvg-TCO-Join is NP-complete ICDCS’10 Genoa, Italy

  13. Solving MinAvg-TCO-Join • MinAvg-TCO-Join could be solved by GM, but NOT practical: • Tear down all existing links • Rebuild the overlay from scratch using GM • It is better to preserve all existing edges and only add edges incrementally. ICDCS’10 Genoa, Italy

  14. Bad case for incremental addition of edges Vall : interested in all topics in T Constructing incrementally Constructing from scratch Vall Vall V1 V1 V1 Vn V2 Vn V2 Vn V2 Vn-1 Vi Vn-1 Vi Vn-1 Vi TCO0 : TCO2 : TCO1 : ICDCS’10 Genoa, Italy

  15. Naive Merge (NM) algorithm GM algorithm NM algorithm Input: (Vd,Td,Intd,Ed), d=1,...,p Output: one TCO Algorithm: - Start with existing internal-TCO links; - Always add a cross-TCO edge with maximum link contribution. Running time: NM is based on the same greedy heuristic as GM. • Input: (V,T,Int) • Output: one TCO • Algorithm: - Start with an empty edge set; - Always add an edge with maximum link contribution. • Running time: ICDCS’10 Genoa, Italy

  16. Example of NM {c} {a} V0 V1 {c} {a,c,d} V4 {d} V3 V12 {a,b,c} V13 V7 {c} V6 V9 V10 {d} {a,b,c} {c} Still a prohibitively high running time!!! {a,b,c} V2 V11 {b,c,d} {a,b,d} V8 V14 V5 {a} {a,b,d} ICDCS’10 Genoa, Italy

  17. Star set Given a TCO (V,T,Int,E) A Star set S is a subset of V that covers all V’s topics. {b,c,d} {b,c,d} {b,c,d} V1 V1 V1 {a} {a} {a} V5 V2 V5 V2 V5 V2 {a,c} {a,c} {a,c} V4 V3 V4 V3 V4 V3 {a,b} {a,b} {b,d} {b,d} {a,b} {b,d} {v3, v5} is a star set which covers all topics {a,b,c,d} {v2, v3, v4} is not a star set; it only covers {a,b,d} A topic-connected overlay ICDCS’10 Genoa, Italy

  18. Star set • Star set nodes • Represents the interests of all the nodes • Can function as bridges to determine cross-TCO links • Observation: minimal star sets tend to be substantially smaller than the total number of nodes. • How to find a minimum star set S* for (V,T,Int)? • Equal to classic set cover problem: NP-complete • Could be approximated with a log approximation ratio ICDCS’10 Genoa, Italy

  19. Star Merge (SM) algorithm NM algorithm SM algorithm Input: (Vd,Td,Intd,Ed), d=1,..,p Output: one TCO Algorithm: - Start with existing internal-TCO links; - Find a star set for each sub-TCO; - Always add a cross-Star edge with maximum link contribution. • Input: (Vd,Td,Intd,Ed), d=1,..,p • Output: one TCO • Algorithm: - Start with existing internal-TCO links; - // Do nothing; - Always add a cross-TCO edge with maximum link contribution. ICDCS’10 Genoa, Italy

  20. Example of SM {c} {a} V0 V1 {c} {a,c,d} V4 {d} V6 V12 {a,b,c} V13 V7 {c} V9 {a,b,c} V10 V3 {d} {c} Running time largely improved because #stars << #nodes for most cases. {a,b,c} V2 V11 {b,c,d} {a,b,d} V8 V14 V5 {a} {a,b,d} ICDCS’10 Genoa, Italy

  21. Divide and Conquer (DC) for MinAvg-TCO • The number of nodes is a dominant factor for the running time of the GM algorithm. • Divide-and-conquer • Divide the MinAvg-TCO problem into several sub-overlay construction problems • Conquer the sub-MinAvg-TCO problems independently and build sub-overlays into sub-TCOs • Combine these sub-TCOs to one TCO ICDCS’10 Genoa, Italy

  22. Design of DC algorithm • How to divide the node set V: • Node clustering vs. random partitioning • The number of partitions p • The balance between conquer and combine • p = 1 (single partition): conquer only = GM • p = |V| (each node is a partition): combine only = GM • How to decentralize DC: • Note the DC algorithm as presented is fully centralized. • However, it is possible to decentralize it. • Theoretical analysis: not straightforward. ICDCS’10 Genoa, Italy

  23. Example of DC {c} {a} V0 V1 {c} {a,c,d} V4 {d} V6 V12 {a,b,c} V13 V7 {c} V9 {a,b,c} V10 V3 {d} {c} - Divide overlay based on V - Conquer each sub-TCO by GM - Combine TCO into one by SM {a,b,c} V2 V11 {b,c,d} {a,b,d} V8 V14 V5 {a} {a,b,d} ICDCS’10 Genoa, Italy

  24. Experiment setting • The number of nodes |V| = 1000 ranging from 1000 to 8000 • The number of topics |T| = 100 ranging from 100 to 1000 • The number of topics that subscribed by a node NodeIntSize=20 ranging from 10 to 100 • Topic distribution uniform, zipf, exponential ICDCS’10 Genoa, Italy

  25. Experiment design • Evaluation:average node degree, running time • Star Merge for MinAvg-TCO-Join • DC for MinAvg-TCO • Random node partitioning • The effects of the number of nodes • The effects of the number of topics • The effects of average subscription size of a node • Comparison with RingPT RingPT is an algorithm that mimics the common practice of building separate overlay for each topic. ICDCS’10 Genoa, Italy

  26. Star MergeSM vs NM vs GM ICDCS’10 Genoa, Italy

  27. Divide-and-conquerThe effect of the number of nodes ICDCS’10 Genoa, Italy

  28. Divide-and-conquerDC vs GM vs RingPT ICDCS’10 Genoa, Italy

  29. Algorithm summary ICDCS’10 Genoa, Italy

  30. ICDCS’10 Genoa, Italy

  31. Minimal Number of Links • A typical pub/sub system combines a number of protocols, many of which maintaining per-link state • A node must constantly monitor the availability of each of its neighbors (heartbeats and keep-alive state) • If the links are maintained using TCP, there is the cost of connection state for each link • The more links there are, the fewer topics can be routed over each individual link, thereby diminishing cross-topic aggregation benefits • If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table

More Related