Mercury: Building Distributed Applications with Publish-Subscribe

Mercury: Building Distributed Applications with Publish-Subscribe Ashwin Bharambe Carnegie Mellon University Monday Seminar Talk

Quick Terminology Recap • Basics • Publishers: inject data/events/publications • Subscribers: register interests/subscriptions • Brokers: match subscriptions with publications and deliver to subscribers • Mercury: distributed publish-subscribe system • Performs matching and content routing in distributed fashion • Data model Name = ashwin Age = 23 X = 192.3 Y = 223.4 Name = * Age > 35 X > 100 X < 180 Publication Subscription

x 100 y 200 x ≥ 50 x ≤ 150 y ≥ 150 y ≤ 250 Virtual reality example Events (50,250) (100,200) User Arena (150,150) Interests Virtual World

Mercury goals • Implement distributed publish-subscribe • Support range queries • Avoid hot-spots in the system • Flooding anything is bad • Avoid publication flooding completely • Avoid subscription flooding as much as is possible • Consider queries like SELECT * from RECORDS • Peer-to-peer scenario • No dedicated brokers • Highly dynamic network

Talk Contents • Mercury Architecture • Overlay construction • Routing guarantees • Overlay properties • How randomness is useful • Load balancing; histogram maintenance • Application Design

0 150 1000 X - hub 250 age 900 name x y 700 450 Hubs in the system Structure of a single hub Attribute Hubs • Each attribute range is divided into bins • A node responsible for range of attribute values • Assigned when the node joins; can change dynamically

Routing y Generating point S • Send a subscription to one hub • Which one? Interesting question in itself! • Determine query selectivity – send to “highest selective” hub age S name Name = * X > 100 X < 180 Subscription x

Routing (contd.) age Generating point • We must send publications to all hubs • Ensures matching x P P P name y Name = ashwin Age = 23 Publication

Subscription [240, 320) 50 ≤ x ≤ 150 150 ≤ y ≤ 250 [0, 105) [0, 80) Hx [160, 240) Hy Publication [105, 210) x 100 y 200 [210, 320) Rendezvous point [80, 160) Routing illustrated

Choose this link with probability: P(x) = 1/(x ln n) x Hub structure and routing (~Symphony) • Naïve routing along the circle scales linearly • Utilize the small-world phenomenon [Kleinberg 2000] • Know thy neighbors and one random person; and you can contact anybody quickly • Routing policy: choose the link which gets you closest to destn • Performance • Average hop length = O(log2 (n)/k) with k “random” links Need to be careful when node ranges are not uniform

Caching • O(log2 (n)) is good, but each hop is still an application level hop • Latency can be quite large if overlay non-optimized • For distributed applications like games, this is way off from optimal • Exploit locality in the access patterns of an application • In addition to k “random” links, have cached links • Store nodes which were the rendezvous points for recent publications

Performance (Uniform workload) #long links = 6 #cache links = log(n) Publications were generated from a uniform distribution

Performance (Skewed workload) #long links = 6 #cache links = log(n) Publications were generated from a high skew Zipf distribution

Performance (Memory reference trace) #long links = 6 #cache links = log(n) Publications were generated from memory references of SPEC2000 benchmark

Pr(X=x) x Two Problems 1. Load Balancing • Concern because publication values need not follow a uniform, or a priori known, distribution • Node ranges are assigned when the nodes join

Name = * X > 100 X < 180 Sending to Name hub vs. X hub Problems (contd.) 2. Hub Selectivity • Recall: subscription is sent to one “randomly” chosen hub! • Ideally, it should be sent to the “highest selective” hub • Need to estimate selectivity of a subscription

Hail randomness • Randomized construction of the network gives additional benefits! • Turns out, this network is an Expander with high probability • Random walks mix rapidly – i.e., they approach the stationary distribution rapidly • Uniform sampling non-trivial • Node ranges are not uniform across nodes • Random walks: efficient way of sampling • No explicit hierarchy required (as in RANSUB [USITS ’03]) • In general, several statistics about a very dynamic network can be efficiently maintained

Hub Selectivity (ideas) • Use sampling to build approximate histograms • Approach 1: (Push) • Each “Rendezvous point” selects publications with a certain probability and sends them off with specific TTL • log2(n) length random walk ensures good mixing • Traffic overhead / #publications • Approach 2: (Pull) • Perform uniform random sampling periodically • Each sample = histogram of sampled node • Question: how to combine histograms?

Load balancing (ideas) • Sample “average” load in the system • Utilize the histograms to quickly know high/low load areas • Strategy 1: • A “light” load gracefully leaves the overlay • Re-inserts itself into a “high” load area • Strategy 2: • Use load “diffusion” – “heavy” nodes shed load to neighbors • Only if the neighbor is “light”

Distributed Game Design • Current implementation: Distributed version of the Asteroids game! • Questions: • How is state distributed across the system? • How is consistency handled in the system? • Cheating???

Conclusion • Distributed publish-subscribe system supporting • Range queries • Scalable routing and matching • Randomized network construction • Provides routing guarantees • Also yields an elegant way of sampling in a distributed system • Exports an API for applications • Implemented; deployed on Emulab • Distributed game using Mercury • Almost done • To be deployed on Planetlab soon

Mercury: Building Distributed Applications with Publish-Subscribe