Divide and Conquer Algorithms for Pub/Sub Overlay Design

Divide and Conquer Algorithms for Pub/Sub Overlay Design Chen Chen 1 joint work with Hans-Arno Jacobsen 1,2, Roman Vitenberg3 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto 3 Department of Informatics University of Oslo ICDCS’10 Genoa, Italy

Example: Pub/Sub Interests: boy boy Interests: boy girl Interests: girl ICDCS’10 Genoa, Italy

Pub/Sub • A communication paradigm • Subscribers express their interests • Publishers disseminate messages • Many applications and industry standards • Application integration, financial data dissemination, RSS feed distribution, business process management • WS Notifications, WS Eventing, OMGs’ Real-time Data Dissemination Service • Topic-based pub/sub • TIBCO RV • Google’s GooPS ICDCS’10 Genoa, Italy

Two componentsin pub/sub implementation Design of routing protocols Construction of overlay The construction of the overlay topology such that network traffic is minimized. Chockler et al., PODC’07 Onus et al., INFOCOM’09 • The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network. • G. Li et al., ICDCS’08 • M. Castro et al., JSAC’02 ICDCS’10 Genoa, Italy

Desirable properties for overlays Low average node degree Low fan-out of a node Low diameter Topic-connectivity Efficiency to construct Adaptability to churn Ease of distributed implementation ICDCS’10 Genoa, Italy

Our contributions ICDCS’10 Genoa, Italy

Topic-connectivity {b,c,d} {b,c,d} V1 V1 {a,c} {a} {a} {a,c} V5 V2 V5 V2 V4 V3 V4 V4 V3 {a,b} {b,d} {a,b} {b,d} {a,b} Suboverlay Ga is topic-connected Suboverlay Gbis NOT topic-connected An overlay G ICDCS’10 Genoa, Italy

MinAvg-TCO problem {b,c,d} {b,c,d} V1 V1 {a,c} {a,c} {a} {a} V5 V2 V5 V2 V4 V3 V4 V3 {a,b} {b,d} {a,b} {b,d} TCO1 has 5 edges TCO2 has 10 edges ICDCS’10 Genoa, Italy

MinAvg-TCO problem {b,c,d} V1 {a} V2 {a,c} V5 {a,b} V3 V4 {b,d} • A high-quality overlay • Topic-connectivity • Total number of edges • Input: • a set of nodes V, • a set of topics T, • the interest function Int • MinAvg-TCO(V,T,Int) (optimization version) Construct a TCO(V,T,Int,E) such that |E| is minimum. • Avg-TCO(V,T,Int,k) (decision version) Is there a TCO(V,T,Int,E) such that |E|=k? • Theorem: MinAvg-TCOis NP-complete ICDCS’10 Genoa, Italy

Greedy-Merge (GM) algorithm • Greedy: always making the choice that looks best at the moment • GM for MinAvg-TCO: always adding an edge with maximum link contribution • Running Time: O(|V|2|T|) • Approximation Ratio: O(log(|V||T|)) ICDCS’10 Genoa, Italy

Our contributions ICDCS’10 Genoa, Italy

TCO join problem • Given p TCOs: TCOd (Vd,Td,Intd,Ed), d=1,..,p • MinAvg-TCO-Join(V,T,Int,p) (optimization version) Construct a TCO(V,T,Int,E) such that |E| is minimum • Avg-TCO-Join(V,T,Int,p,k) (decision version) Is there a TCO(V,T,Int,E) such that |E|=k? • MinAvg-TCO is a special case of MinAvg-TCO-Join: Theorem: MinAvg-TCO-Join is NP-complete ICDCS’10 Genoa, Italy

Solving MinAvg-TCO-Join • MinAvg-TCO-Join could be solved by GM, but NOT practical: • Tear down all existing links • Rebuild the overlay from scratch using GM • It is better to preserve all existing edges and only add edges incrementally. ICDCS’10 Genoa, Italy

Bad case for incremental addition of edges Vall : interested in all topics in T Constructing incrementally Constructing from scratch Vall Vall V1 V1 V1 Vn V2 Vn V2 Vn V2 Vn-1 Vi Vn-1 Vi Vn-1 Vi TCO0 : TCO2 : TCO1 : ICDCS’10 Genoa, Italy

Naive Merge (NM) algorithm GM algorithm NM algorithm Input: (Vd,Td,Intd,Ed), d=1,...,p Output: one TCO Algorithm: - Start with existing internal-TCO links; - Always add a cross-TCO edge with maximum link contribution. Running time: NM is based on the same greedy heuristic as GM. • Input: (V,T,Int) • Output: one TCO • Algorithm: - Start with an empty edge set; - Always add an edge with maximum link contribution. • Running time: ICDCS’10 Genoa, Italy

Example of NM {c} {a} V0 V1 {c} {a,c,d} V4 {d} V3 V12 {a,b,c} V13 V7 {c} V6 V9 V10 {d} {a,b,c} {c} Still a prohibitively high running time!!! {a,b,c} V2 V11 {b,c,d} {a,b,d} V8 V14 V5 {a} {a,b,d} ICDCS’10 Genoa, Italy

Star set Given a TCO (V,T,Int,E) A Star set S is a subset of V that covers all V’s topics. {b,c,d} {b,c,d} {b,c,d} V1 V1 V1 {a} {a} {a} V5 V2 V5 V2 V5 V2 {a,c} {a,c} {a,c} V4 V3 V4 V3 V4 V3 {a,b} {a,b} {b,d} {b,d} {a,b} {b,d} {v3, v5} is a star set which covers all topics {a,b,c,d} {v2, v3, v4} is not a star set; it only covers {a,b,d} A topic-connected overlay ICDCS’10 Genoa, Italy

Star set • Star set nodes • Represents the interests of all the nodes • Can function as bridges to determine cross-TCO links • Observation: minimal star sets tend to be substantially smaller than the total number of nodes. • How to find a minimum star set S* for (V,T,Int)? • Equal to classic set cover problem: NP-complete • Could be approximated with a log approximation ratio ICDCS’10 Genoa, Italy

Star Merge (SM) algorithm NM algorithm SM algorithm Input: (Vd,Td,Intd,Ed), d=1,..,p Output: one TCO Algorithm: - Start with existing internal-TCO links; - Find a star set for each sub-TCO; - Always add a cross-Star edge with maximum link contribution. • Input: (Vd,Td,Intd,Ed), d=1,..,p • Output: one TCO • Algorithm: - Start with existing internal-TCO links; - // Do nothing; - Always add a cross-TCO edge with maximum link contribution. ICDCS’10 Genoa, Italy

Example of SM {c} {a} V0 V1 {c} {a,c,d} V4 {d} V6 V12 {a,b,c} V13 V7 {c} V9 {a,b,c} V10 V3 {d} {c} Running time largely improved because #stars << #nodes for most cases. {a,b,c} V2 V11 {b,c,d} {a,b,d} V8 V14 V5 {a} {a,b,d} ICDCS’10 Genoa, Italy

Divide and Conquer (DC) for MinAvg-TCO • The number of nodes is a dominant factor for the running time of the GM algorithm. • Divide-and-conquer • Divide the MinAvg-TCO problem into several sub-overlay construction problems • Conquer the sub-MinAvg-TCO problems independently and build sub-overlays into sub-TCOs • Combine these sub-TCOs to one TCO ICDCS’10 Genoa, Italy

Design of DC algorithm • How to divide the node set V: • Node clustering vs. random partitioning • The number of partitions p • The balance between conquer and combine • p = 1 (single partition): conquer only = GM • p = |V| (each node is a partition): combine only = GM • How to decentralize DC: • Note the DC algorithm as presented is fully centralized. • However, it is possible to decentralize it. • Theoretical analysis: not straightforward. ICDCS’10 Genoa, Italy

Example of DC {c} {a} V0 V1 {c} {a,c,d} V4 {d} V6 V12 {a,b,c} V13 V7 {c} V9 {a,b,c} V10 V3 {d} {c} - Divide overlay based on V - Conquer each sub-TCO by GM - Combine TCO into one by SM {a,b,c} V2 V11 {b,c,d} {a,b,d} V8 V14 V5 {a} {a,b,d} ICDCS’10 Genoa, Italy

Experiment setting • The number of nodes |V| = 1000 ranging from 1000 to 8000 • The number of topics |T| = 100 ranging from 100 to 1000 • The number of topics that subscribed by a node NodeIntSize=20 ranging from 10 to 100 • Topic distribution uniform, zipf, exponential ICDCS’10 Genoa, Italy

Experiment design • Evaluation:average node degree, running time • Star Merge for MinAvg-TCO-Join • DC for MinAvg-TCO • Random node partitioning • The effects of the number of nodes • The effects of the number of topics • The effects of average subscription size of a node • Comparison with RingPT RingPT is an algorithm that mimics the common practice of building separate overlay for each topic. ICDCS’10 Genoa, Italy

Star MergeSM vs NM vs GM ICDCS’10 Genoa, Italy

Divide-and-conquerThe effect of the number of nodes ICDCS’10 Genoa, Italy

Divide-and-conquerDC vs GM vs RingPT ICDCS’10 Genoa, Italy

Algorithm summary ICDCS’10 Genoa, Italy

ICDCS’10 Genoa, Italy

Minimal Number of Links • A typical pub/sub system combines a number of protocols, many of which maintaining per-link state • A node must constantly monitor the availability of each of its neighbors (heartbeats and keep-alive state) • If the links are maintained using TCP, there is the cost of connection state for each link • The more links there are, the fewer topics can be routed over each individual link, thereby diminishing cross-topic aggregation benefits • If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table

Divide and Conquer Algorithms for Pub/Sub Overlay Design