220 likes | 347 Views
Revisiting Content-Based Publish/Subscribe. Costin Raiciu, David Rosenblum, Mark Handley University College London. Problem. Why has Large Scale Content-Based Publish/Subscribe not been adopted in the real world? Intense research efforts - many solutions exist
E N D
Revisiting Content-Based Publish/Subscribe Costin Raiciu, David Rosenblum, Mark Handley University College London
Problem • Why has Large Scale Content-Based Publish/Subscribe not been adopted in the real world? • Intense research efforts - many solutions exist • Gryphon, Siena, Hermes, Jedi, Medym, … • Plenty possible applications • RSS Dissemination, Online Games, Stock Quotes,… • Is this problem important? • Are we exploring the wrong side of the solution space? • What to do next?
Contributions • Reasons for lack of adoption: • Complexity • Application Diversity • Lack of Deployment • Research Agenda: • Layering CBPS • Creating Configurable Solutions • Proof of Concept
Complexity of CBPS • Content-Based Publish/Subscribe is composed of two sub-problems: • Content-Based Matching - for each notification, find the set of matching subscribers • Event Routing - deliver each notification to the matching subscribers • Both sub-problems are difficult on their own! • Event Routing • Optimal delivery tree changes with (almost) every notification • Computing it is equivalent to computing the minimum Steiner tree, which is NP-complete for certain cost metrics
Complexity of Content-Based Matching • Related to search in large-scale networks • Active research field – structured overlays, etc. • Theoretically not scalable • Consider the following static system • N – nodes in the system • R – replication rate for subscriptions • H – (max) number of nodes a notification visits • IS – storage load balancing • IR – routing load balancing
Complexity of Content-Based Matching(2) • We can easily show that: • If a notification matches all subscriptions, then • If a subscription matches all notifications, then • Generic content-based matching solutions cannot be scalable on all directions • Either replicate subscriptions to all the nodes (R=N) • Broadcast notifications to all the nodes (H=N) • Create bottlenecks, for either storage or routing (IS=N, IR=N) • Select a trade-off, e.g. N1/3
Application Diversity • Survey of applications suitable for CBPS, 5 applications selected: • Online Games, RSS Feeds, Stock Quotes, Security Alerts, Location Based Services • Tolerable message latency – 1ms – 1min • Number of publishers – 1 - 106 • Number of subscribers – 102 - 106 • Notification frequency – 10-2/s – 104/s
Application Diversity (2) • High Diversity – can a single solution accommodate all applications? No. • Current solutions – not built with specific applications in mind • Embed optimizations based on expected properties of applications, rather than particular examples • Siena – clustering of subscriptions based on geographic proximity • Hermes – distributing event load using message types + clustering • Applications do not seem to benefit the optimizations of any single architecture!
Lack of Deployment • CBPS is a trade-off between broadcast and publisher-side filtering of messages • If CBPS solutions cannot be easily used, application developers will use alternative solutions! • Advantages • Use current solutions to deploy applications • Find out the impact of different optimizations on the performance of the application • Could develop a research agenda • Difficulties • No single deployment can accommodate all applications! • Multiple solutions should be made available
Our Proposal • Layer Content-Based Publish/Subscribe • Solve Content-Based Matching and Event Routing Separately • Compose full CBPS solutions from pieces • Create configurable solutions for the two sub-problems • Provide parameters that allow a solution to be tuned for a specific application • Supporting a new application • Tune the event routing and content-based matching algorithms • Compose them into a full solution • Deploy them using a predefined infrastructure
Event Routing Content Based Matching s s s Layering Content-Based Publish/Subscribe • Benefits of Layering • Solving CBPS is easier • Modularity • Interoperability between layers • Independent development • Testing • Adaptability • We can combine solutions to support new applications • Existing work can be leveraged n
Benefits of Layering (cont’d) • Supports different scale for content based matching and event routing • In some cases, simple solutions are enough – centralized matching and point to point routing is sufficient • Example: stock quote matching • Different companies could provide the two services • Separates the domains of trust • Confidential content-based publish/subscribe is not cheap • Requires subscribers and publishers share a secret key • Separation minimizes the number of trusted servers • Content-Based Matching nodes need access to subscription and notification payload • Event routing nodes only need the addresses of the subscribers
Drawbacks of Layering • Suboptimal solutions • Independent optimizations of the two layers • Increased costs • Duplicate data structures maintained in the two layers • Inter layer signalling costs – when nodes are specialized in content-based matching or event routing • Our hope • Costs are moderate • Resulting solutions have good-enough performance • Argument similar to SQL?
Configurable Solutions • Optimizations must be application specific • Layering allows us to reutilize solution parts to accommodate new applications • The solution parts must be easily configurable to support new applications • Cut down the process of supporting a new application to fine tuning – similar to creating indexes in DBs. • Composition of different solutions • Focus on content-based matching
Configurable Solutions (2) • What parameters should we control? • Application requirements: notification latency, throughput • Solution parameters: R, H, load balancing • Can we use low-level parameters to control high-level parameters? • Simple model • Latency can be improved by minimizing H • Throughput can be improved by increasing H and optimizing storage road balancing • Bottlenecks can be alleviated with increased replication and routing load balancing • Real dependencies – cannot be inferred from this model
Cluster 1 Cluster X 2160 0 Cluster 2 Cluster X-1 Proof of Concept: Configurable Solution • R, H parameters • N nodes – uniformly distributed identifiers in a circular space • Divide the space into X regions, such that N = X*R • Full mesh of nodes • Cluster membership protocol is distributed • Each node knows R, N
Cluster 1 Cluster X 2160 0 Cluster 2 Cluster X-1 Proof of Concept: Configurable Solution (2) • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Subscribe process • f(s) = 2 • rnd{-H/2,…,H/2} = -1 • Store s on cluster 1 = 2-1 • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Subscribe process • f(s) = 2 • rnd{-H/2,…,H/2} = -1 • Store s on cluster 1 = 2-1 • Publish process • f(N) = 2 • Route to Clusters 1, 2 and 3
Analysis of the solution • Fine grained trade-off: • Routing Hops: H+1 • Replication Rate: R • Load Balancing: IR = N/R*H, IS = N/R*H • Fine tuning • Choose R and Hops minimal such that Imbalance is not bottleneck for desired performance • Flexibility • Different applications on the same architecture • Rendezvous functions • Primary key attributes • Hashes of attribute type
Analysis of the solution (2) • Some issues • Full mesh of nodes – stretch paths with log N, for log N routing tables • Computing N through sampling – what are the implications on consistency? • Drawbacks • Average Case = Worst Case • It would be pleasant to have average routing hops a lot smaller • Can mitigate to some extent by using optimizations • Assumes all distributions are uniform • Replication is the same for all subscriptions
Deployment Issues • Testbed: PlanetLab • Freely available to the scientific community • ~400 nodes scattered throughout the world • Deploying a single solution is not enough • We would like • A common API for content-based matching and event routing • A common code base • Networking functions • Logging • The ability to • Configure layers easily • Compose full CBPS solutions
Summary • We have analyzed reasons for the lack of adoption of large-scale Content-Based Publish/Subscribe • Complexity • Application Diversity • Lack of deployment of current solutions • We have proposed two techniques to mitigate this state of affairs • Layering Content-Based Publish/Subscribe • Content-Based Matching • Event Routing • Building Configurable Solutions • Proof of concept for Content-Based Matching
Questions? Costin Raiciu c.raiciu@cs.ucl.ac.uk David Rosenblum d.rosenblum@cs.ucl.ac.uk