440 likes | 449 Views
This paper discusses the use of distributed hash table (DHT) based overlay networks in large distributed systems, focusing on their scalability, fault-tolerance, security, reliability, and low maintenance cost. It introduces a new approach for achieving proximity awareness in the overlay network by constructing an auxiliary routing network using AS-level topology derived from BGP reports. The simulation results show close to optimal routing performance compared to previous approaches.
E N D
Turning Heterogeneity into an Advantage in Overlay Routing(To be presented at IEEE Infocom’03) Zhichen Xu, Mallik Mahalingam, Magnus Karlsson Internet Systems and Storage Lab Hewlett-Packard Company
Motivation • For a large distributed system to function well it must be scalable, fault-tolerant, secure, reliable, and have low maintenance cost • Distributed hash table (DHT) based overlay networks provide a simple abstraction that maps “keys” to “values” • They can be used in many important applications, as a result these applications can enjoy these nice properties • E.g., distributed storage, DNS, media streaming, web caching, content-based searching, distributed firewalls, etc. • Several proposals: Pastry, Tapestry, CAN, eCAN, SkipNet, etc. • Provide a homogeneous abstraction to the applications, but vary in their logical structures and flexibility Zhichen Xu
Baseline DHT, a 2-dimensional CAN node zone • Cartesian space partitioned into zones • A node serves as “owner” of a zone • A key is a “point” in the Cartesian space • “Value” stored on node that owns the zone that contains the point (key) Zhichen Xu
Low maintenance cost & self-organizing… new zone new node • Node join: pick a point and split zone with node currently owns the point • Node departure: a neighboring node takes over “state” of the departing node • Dynamisms are shielded from the users and applications! Zhichen Xu
Logical routing 1 2 3 • Routing: traverse a series of neighboring zones from source to destination Zhichen Xu
Each logical hop can correspond to multiple physical hops 1 1 2 3 3 2 • It is important that the structure of the overlay efficiently uses the underlying physical network! Zhichen Xu
Techniques for achieving proximity awareness • Within the overlay [Castro et al] • Geographic layout, e.g., Topologically-aware CAN • uneven distribution of the nodes and • chance of overloading nodes Zhichen Xu
Techniques for achieving proximity awareness • Within the overlay [Castro et al] • Geographic layout, e.g., Topologically-aware CAN • uneven distribution of the nodes and • chance of overloading nodes • Proximity routing, e.g., Chord, • Choices limited Closest to s s: source Candidate 1 Candidate 2 d: destination Candidate 3 Zhichen Xu
Techniques for achieving proximity awareness • Within the overlay [Castro et al] • Geographic layout, e.g., Topologically-aware CAN • uneven distribution of the nodes and • chance of overloading nodes • Proximity routing, e.g., Chord, • Choices limited • Proximity-neighbor selection, e.g., Pastry, Tapestry, eCAN • Routing table entries selected according to proximity metric among nodes that satisfy the constraint 1 3 2 4 1 2 4 8 3 6 7 5 11 12 9 10 16 14 15 13 Zhichen Xu
Techniques for achieving proximity awareness • Within the overlay [Castro et al] • Geographic layout, e.g., Topologically-aware CAN • uneven distribution of the nodes and • chance of overloading nodes • Proximity routing, e.g., Chord, • Choices limited • Proximity-neighbor selection, e.g., Pastry,Tapestry, eCAN • Routing table entries selected according to proximity metric among nodes that satisfy the constraint Performance constrained by the logical structure of the default overlay Zhichen Xu
Techniques for achieving proximity awareness • Auxiliary networks, e.g. Brocade • Constructing a secondary overlay network • Still use logical routing in the secondary network • Pushes the problem to an auxiliary network of a smaller size • Dilemma in picking the size of the secondary network • Within the overlay [Castro et al] • Geographic layout, e.g., Topologically-aware CAN • uneven distribution of the nodes and • chance of overloading nodes • Proximity routing, e.g., Chord, • Choices limited • Proximity-neighbor selection, e.g., Pastry • Routing table entries selected according to proximity metric among nodes that satisfy the constraint Performance constrained by the logical structure of the default overlay Zhichen Xu
Our contributions • Decouple the homogeneous abstraction from routing • Constructing auxiliary routing network using • AS-level topology derived from BGP reports • Landmark-numbering scheme • Route advertisement using a “distance vector” algorithm with a route summarization to reduce state • Works with all currently existing overlays • Simulation results show that our approach can achieve close to optimal routing performance • 1.04 to 1.12 times optimal for an Internet-like topology • Previous approaches 2.5 to 5 times optimal for the same topology Zhichen Xu
Outline • Motivation • Related work • Default overlay network eCAN • Expressway: unconstrained auxiliary network • How does a node find the close-by nodes? • How do we control the routing state? • What can the expressway be used for? • Experimental results • Discussions & conclusions Zhichen Xu
eCAN, represents state-of-art CAN zones (order-1 zones) Zhichen Xu
K default CAN zones make an order-2 zone Order-2 zones Zhichen Xu
K order-2 zones make an order-3 zone Order-3 zones Zhichen Xu
High order routing neighbors • High-order routing tables are soft-state • Allows for proximity-neighbor selection • Neighbor selection based on landmark clustering / controlled data placement • Topology-aware Chord is equivalent to 1-d eCAN Zhichen Xu
Expressway definitions & challenges • Expressway nodes are nodes that have good connectivity and availability • Expressway nodes connect to other expressway nodes that are close-by to form a backbone • Ordinary nodes connect to closest expressway node • Traffic go through expressway, if possible • Challenges: • How does a node (ordinary or expressway) find the close-by expressway nodes? • How do we control the routing state? • What can the expressway be used for? Zhichen Xu
Outline • Motivation • Related work • Default overlay network eCAN • Expressway: unconstrained auxiliary network • How does a node find the close-by nodes? • How do we control the routing state in the expressway? • What can the expressway be used for? • Experimental results • Discussions & conclusions Zhichen Xu
Landmark clustering • Related work • Landmark ordering [Ratnasamy et al 2002]: • Coordinate-based [Eugene and Zhang 2001]: Landmark3 Landmark space di: distance to landmark I <d1, d2, d3> Landmark1 Landmark vector Nodes with similar distances to landmarks likely close to each other Landmark2 Zhichen Xu
Locating close-by expressway node • Landmark vector as key to store information of the expressway nodes on the DHT such that distances in the “landmark space” are preserved • A node uses its landmark vector to search the DHT to find close-by nodes • Expressway nodes finds and connects to physically close-by expressway nodes to form the expressway network Landmark3 DHT a a b b Landmark1 c c Landmark2 Zhichen Xu
But, the dimensionality of the landmark space and that of the DHT can be different Landmark3 DHT Dimension reduction a a b b Landmark1 c c Landmark2 Zhichen Xu
Space Filling Curves : Hilbert Curve • Points close to each other in n-d space mapped to points close to each other in 1-d space, and vice versa 2 3 8 7 1 4 5 6 Zhichen Xu
Proximity-preserving dimension reduction of landmark vectors : landmark numbering 5 6 2 3 7 8 4 3 1 4 6 5 7 1 2 Landmark number (a) (b) Zhichen Xu
Discussions • A similar procedure can be used for other overlays • For Chord, we use the landmark number as the DHT key to store information of the expressway nodes on a node whose ID is greater or equal to the landmark number • For Tapestry and Pastry, we can use a prefix of the node IDs to partition the logical space into grids. In summary, our goal is to store expressway node information such that information about close-by nodes is stored close to each other on the overlay Whereas, e.g., Pastry relies on the ability of finding physically closest node at node join and requires message exchanges to fix up the existing routing tables Zhichen Xu
Outline • Motivation • Related work • Default overlay network eCAN • Expressway: unconstrained auxiliary network • How does a node find the close-by nodes? • How do we control the routing state in the expressway? • What can the expressway be used for? • Experimental results • Discussions & conclusions Zhichen Xu
Route advertisement with summarization • An expressway node periodically advertises all local nodes that are in its physical proximity to neighboring expressway nodes • Same as the standard distance vector algorithm, except • advertise summarization of multiple nodes, and transport address of one representative node • only expressway nodes participate in route advertisement • Route advertisement messages are controlled with a time-to-live (TTL) expressed as the number of expressway hops Zhichen Xu
0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 Route summarization: aggregate multiple nodes • For CAN, we partition the Cartesian space into virtual grids • For Pastry, we can summarize multiple node with nodeID prefix • For Chord, we can summarize multiple nodes with a nodeID range • Nodes whose zone falls in a virtual grid are summarized by the ID of the virtual grid • The pair <GridID, IP of representative node> are propagated representative node Zhichen Xu
Outline • Motivation • Related work • Default overlay network eCAN • Expressway: unconstrained auxiliary network • How does a node find the close-by nodes? • How do we control the routing state in the expressway? • What can the expressway be used for? • Experimental results • Discussions & conclusions Zhichen Xu
Expressway node Expressway node node node node Expressway node Direct route vs. Expressway-node forwarding source • Direct route: • Requires slightly more storage space to keep the route summary and relies on IP routing • Expressway–node forwarding: • If a node leaves the system, it is less expensive to repair • May deliver routing performance better than default IP routing [RON 2001, Detour 1999] • Ordinary nodes cache addresses of nodes associated with the same expressway node node node node node Direct route node node Expressway node node node dest Zhichen Xu
Experimental evaluation : 2-d eCAN as default overlay • AS topology: • 1000 AS from a total of 13,000 active AS • Assume 100 ms inter-AS delay and 10 ms intra-AS delay • A node is assigned to one of the 1000 AS. • Transit-stub graph using GT-ITM: • 10,000 nodes, 228 transit domains, 5 nodes /transit domain, 4 stub domains/transit node, and 2 nodes in each stub domain. • 100ms for cross transit links, 20 ms for links inside a transit, 5 ms for links connecting a transit and stub node, and 2 ms for links inside a stub • Compare against • eCAN with roughly the same amount of state • Logical auxiliary: a Brocade-like system that uses a homogeneous auxiliary logical overlay network Zhichen Xu
eCAN with similar state • For fairness, we compare with eCAN with similar state • How do we make use of the additional state? • Rather than always route to the physically closest nexthop candidate, we route to the nexthop that can bring down overall delay 1 2 3 Zhichen Xu
node node Logical auxiliary 0.5 caching along advertising paths 2: lookup the IP address of the destination node Homogeneous auxiliary overlay network 0: ordinary nodes advertise themselves on the auxiliary using nodeIDs as keys to store their IP addresses 1: contact local super node 3: route to the destination Default overlay Zhichen Xu
Parameters used • # of nodes: 512-8K (4K as default) • TTL: 1-9 (9 as default) • Virtual grids : 1 virtual grid /1 node – 1/16 nodes(1/1as default) • # number of landmarks: 15 • Fraction of nodes that are expressway nodes: 1/1-1/64 (1/10 default) • Routing: direct, expressway-node forwarding • Performance metric: stretch • Routing delay / shortest-path delay Zhichen Xu
Summary of results • Expressway produces good average routing performance • Landmark clustering: • For the AS topology, 1.07 times shortest-path routing, individual measurement ranging from 1.04 to 1.12 • For the transit-stub graph, 1.41 on average, with individual measurements ranging from 1.20 to 1.55 (Can be better as ordinary nodes associating with the same expressway node do not establish direct route among themselves) • eCAN and homogeneous auxiliary stays between 2.5-7 times shortest-path routing Zhichen Xu
Comparison of various approaches • Our approach: 1.07 to 1.41 times of optimal • Other approaches: 2.5 to 7 times of optimal AS topology Transit-stub graph Zhichen Xu
Direct route vs. expressway-node forwarding • Direct route performs better than expressway-node forwarding, due to shortest-path routing • Performance of our approach improves as number of nodes increases Zhichen Xu
Effect of varying the ratio of expressway nodes in the system • As the percentage of expressway nodes increases, expressway better approximates the underlying physical network • Whereas “logical auxiliary” cannot take advantage of this Zhichen Xu
Conclusions • Propose generic techniques to construct an auxiliary network for DHT-based overlays • Decouples routing from DHT abstraction to take advantage of heterogeneity that exists in the system • Achieves routing performance close to optimal • The protocol is relatively complicated • The expressway nodes need to be relatively stable Zhichen Xu
High-order node node zone 5 High-order node zone 1 node High-order zone 4 node node node node node node node High-order High-order zone 3 zone 2 node node node node node More about eCAN • Topology-aware Chord: 1-d eCAN • High-order zones allows for locality-preserving data placement (SkipNet) Placement of objects can be controlled to preserve locality Machines that belong to certain organizations can be co-located logically node Zhichen Xu
Varying the number of virtual grids 1 node/virtual grid 4 nodes/virtual grid 16 nodes/virtual grid Zhichen Xu
Example applications • Distributed storage space • Content SHA-1 key • Place <Key, document> pair on top of DHT • Object lookup translates to routing • Distributed content-based search • Controlled placement of document info on DHT such that documents that are similar in contents are co-located • Search space is effectively controlled It is important that structure of the overlay efficiently uses the underlying physical network! Zhichen Xu