410 likes | 429 Views
Routing as a Service. Karthik Lakshminarayanan (with Ion Stoica and Scott Shenker) Sahara/ i3 retreat, January 2004. Problem. Applications demand greater flexibility in route selection Resilience: RON, Tapestry Performance: Detour Applications need different routing functionality
E N D
Routing as a Service Karthik Lakshminarayanan (with Ion Stoica and Scott Shenker) Sahara/i3 retreat, January 2004
Problem • Applications demand greater flexibility in route selection • Resilience: RON, Tapestry • Performance: Detour • Applications need different routing functionality • Multicast: ESM, Overcast • DDoS defense: SOS, Mayday • Anycast: Gia • Difficult to change any routing-level component in the Internet today!
Current approach • Overlay networks • Layer above IP • Deployability • Problems: • Ossification: overlay solutions again ossify routing in the protocol; hard to modify once deployed on large scale (lessons from the Internet) • Efficiency: replicate packets multiple times along a physical link; inefficient route construction • Lack of control for ISPs: traffic hard for ISPs to control; circumvent ISPs’ policies
Multiple route providers Routing in transportation network
Time taken Distance
Our thesis Push routing out of infrastructure • Argument for “edge-controlled” routing • Related: NIRA (NewArch group, MIT/ISI) • Our contribution: • Fine-grained control over routing • Control plane for achieving this
System architecture • Forwarding infrastructure • Provides basic routing (referred to as default routing) • Exports primitives for inserting routes
Network information NEWS-1 NEWS-2 Performance-based, policy-based routing (span multiple ISPs) System architecture 2. NEWS/Route selector • Aggregates network information • Selects routes on behalf of applications
Client A Network information Query/reply routing info. NEWS-1 Setup routes NEWS-2 Client D Client B Client C System architecture 3. End-hosts • Queries NEWS to setup paths
Architectural position Infrastructure Host Separate control plane and data plane by using clean abstractions Data plane Internet & Infrastructure overlays Control plane P2P & End-host overlays Data plane Control plane Our proposal Control plane Data plane
Challenges • Open, multi-provider system (design of primitives) • Unlike intra-domain, e.g. GSMP • Security: control provided should not be used for attacking the system • Trust: between entities of the system, e.g. what information does system give to NEWS • Large-scale system (route selection) • Scalability: monitoring; service to end-hosts • Stability: should not lead to oscillations • Deployability: ISP control
Infrastructure primitives • Label-switching-like primitive • Allows insertion of forwarding entries (id1, id2), where id1, id2 are labels • id = [ NodeID : LocalID ] • Establishing paths – Loose virtual path (LVP) • Composition of label switches: T = (id1, id2, …, idn) is composed as (id1, id2), …, (idn-1, idn) • Construct different topologies • Aggregation can be performed at the level of tunnels that end at infrastructure nodes
Network infrastructure NEWS 1. Trust • Infrastructure provides network information to NEWS • Verification: NEWS should be able to verify this • Indirect measurement techniques using primitive alone • Metrics: Delay, loss, bandwidth
Network infrastructure NEWS Client C 1. Trust • NEWS provides routes across the network • Verification: Network verifies correctness
2. Scalability • Monitoring: • Monitor a subset of links • Update period depends on stability (exploit link stationarity) • For e.g., updates can be sent when metric on the link changes by a factor of x • Computation: • Incremental computation of best paths • Multiple paths are returned • Querying: • Default paths are used if special routing is not needed • Hierarchical dissemination • Caching of results: TTL chosen to reflect stability of paths
3. Deployment • Infrastructure nodes • Hosted at certain points within ISPs • NEWS/Route selection • 3rd party provider like Akamai • Few in number • Determined by application requirements • Trust relations • NEWS trusts infrastructure for information (verifiable) • ISPs trust paths that NEWS returns (verifiable) • Export links that obey the underlying policy constraints
Implementation status • i3 primitives for setting up forwarding state • Distributed NEWS implemented • Route computation based on delay, loss and bandwidth • Deployed on PlanetLab • i3 proxy has been modified to query NEWS • Legacy applications can be used with NEWS
Summary of results • Verification of measurement techniques • Delay: 97% of cases have error < 10% • Loss-rate: 90% in over 80% of the cases • Bandwidth: Within a factor of 1.5 in 60% of cases • Scalability of monitoring • Simulation-based • Logarithmic-degree graph • Achieve 90% RDP of 2.3 (for delay) for TS-16384
Summary • Routing control pushed outside the infrastructure • Routes computed by third-party entities (NEWS) along with measurement information provided by the infrastructure • Leads to “evolvable” networks • Deploy new routing schemes or optimize existing routing without changing the infrastructure
m1 m1 m1 m R NEWS: Round-trip delay n1 n2 • Use path selection primitive to send packet m along R→n1→R • Use path selection in conjunction with packet replication to send packet along R→n1→n2→n1→R • Difference yields the RTT of the link (n1↔n2) To measure: RTT(n1→n2)
m1 m1 m1 m2 m NEWS: Measuring loss rate n2 n1 • Forwarding links • (n11→ n21) • (n11 → R) • (n21 → n12) • (n21 → R) • (n12 → R) R To measure loss(n1→n2)
m1 m1 m1 m2 m m R NEWS: Measuring loss rate n1 n2 • Forwarding links • (n11→ n21) • (n11 → R) • (n21 → n12) • (n21 → R) • (n12 → R) To measure: loss(n1↔n2)
m1 m1 m1 m2 m NEWS: Measuring loss rate n2 n1 • m2 used to differentiate loss on (n1→n2) from that on (n2→n1) • (m Λ ~m1 Λ ~m2) loss on virtual link (n1→n2) • False positives • False negatives • Probability of false positives/negatives ≈ O(p2 ) R To measure loss(n1→n2)
1 1 Bottleneck? 1 NEWS: Available bandwidth n2 n1 • Delay-based bandwidth measurement (TCP Vegas like) • Increase sending rate till increase in delay is seen cwd=1 cwd=2 cwd=4 R T = received time – sent time T’ = smallest RTT seen thus far
1 1 1 1 NEWS: Available bandwidth n2 n1 • Use packet replication to identify if the bottleneck is on (n1→n2) or not cwd=2 cwd=3 R T = received time – sent time
1 1 1 1 1 NEWS: Available bandwidth n2 n1 • Use packet replication to identify if the bottleneck is on (n1→n2) or not cwd=2 R T = received time – sent time
2 2 1 1 1 d 2 NEWS: Bottleneck bandwidth n2 n1 Packet-pair-like technique Bottleneck R
2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 d d d1>d 1 1 1 NEWS: Bottleneck bandwidth n2 n1 • BBW = k*p/d1, where k = deg of replication • More the degree of replication, greater is the possibility of error • Intervening packets would affect this Bottleneck R
1. Trust • Problem: Verify network information (delay, loss, bandwidth) provided by the network • Partial trust relations between the third party (NEWS) that computes routes and the infrastructure • Solution: Ability to measure network characteristics using the simple label-switching primitive alone • Infrastructure cannot differentiate data packets and measurement packets
2. Security • Problem: To prevent construction of illegitimate forwarding graphs using the primitives (e.g. loops) • Implicit mechanisms: • Cryptographic constraints on successive forwarding labels (described in Secure-i3) • Protects against forming loops, confluences in the forwarding graph • Explicit mechanisms: • NEWS servers ensure that computed paths are legal • NEWS signs the paths that it returns • Infrastructure trusts NEWS and inserts the signed paths • Can verify the validity of the paths that NEWS returns
NEWS-2 NEWS-1 Scalability • Multiple vantage points for measurements/monitoring • Maintain a subset of links • Division of overlay graph to reflect underlying paths
NEWS-2 NEWS-1 Scalability 2-level hierarchy • Random partitioning of nodes into buckets • Maintain few edges within the same bucket • Maintain few edges to every other bucket • If bucket size is √N, each measurement point responsible for only O(√N) links
Implementation status • i3 primitives are used as the infrastructure primitives • Distributed NEWS is implemented and can perform route computation based on delay, loss and bandwidth • i3 proxy has been modified to query NEWS • Legacy applications can be used with NEWS
Evaluation • Effectiveness of indirect measurements • Planetlab experiments • Scalability techniques
NEWS: Delay Estimation • More than 97% of the samples have error < 10% • If we consider median over 10 consecutive samples, 99.3% of the samples have error < 10%
NEWS: Loss-Rate Estimation • Accuracy of 90% in over 80% of the cases that have loss rate more than 0.1% • Performs well in identifying high lossy links
NEWS: Avail-BW Estimation • Relative error < 0.5 in 60% of the cases • Underestimates for Far-Far • Overestimates for Far-Close in some cases • Compare with stable TCP bandwidth • Measurement points are classified on the basis of distance of two targets from the source of measurement
Scalability • Delay based route selection • 90th percentile RDP is 2.33 (HRG), 3.74 (RG) and 1.16 (PRG) RG = Random gh PRG = Proximity random gh HRG = Hierarchical random gh Transit stub network 16384 nodes Average node degree = 20