520 likes | 693 Views
Routing Economics under Big Data. Murat Yuksel yuksem@cse.unr.edu Computer Science and Engineering University of Nevada – Reno, USA. Outline. Routing in a Nutshell BigData Routing Problems Economic Granularity Routing Scalability Bottleneck Resolution Summary. Routing in a Nutshell.
E N D
Routing Economics under Big Data Murat Yuksel yuksem@cse.unr.edu Computer Science and Engineering University of Nevada – Reno, USA
Outline Routing in a Nutshell BigData Routing Problems Economic Granularity Routing Scalability Bottleneck Resolution Summary
Routing in a Nutshell 15169 Google 3356-10026-15169 7018-3356-10026-15169 10026-15169 Pac-Net Level3 AT&T 1951-7018-3356-10026-15169 Broad-Band One Tier-1 Tier-1 NSHE Regional Local URL: http://www.youtube.com IP Address: 74.125.224.169 Path: 1951-7018-3356-10026-15169 IP Prefix: 74.125.224/24
Routing in a Nutshell Internet Core Level3 Backbone ISP (Tier-1) AT&T Cogent NSHE SBC Regional ISPs Local ISPs Customer / Provider Customer Cone Peer / Peer
Routing in a Nutshell AT&T 7018 Level3 3356 Broad-Band One 1951 • Inter-domain Routing among ISPs: • Single Metric (Number of hops, ISPs) • Partial network information • Scalable • Intra-domain Routing within ISP network: • Multi-Metric (Delay, bandwidth, speed, packet loss rate …) • Computationally heavy, • Complete network information in terms of links • Not scalable for large networks
What if the flow is big? Real big? Big-Data Alice Google Pac-Net Level3 AT&T Broad-Band One NSHE Big-Data Bob 100+ Gb/s Flow-aware Economics? A few Mb/s Negligible Flow Cost/Value
Problem 1: Economic Granularity ATT 7018 Level3 3356 Anywhere NSHE 3851 • Point-to-Anywhere • Not automated, rigid SLA (6+ months…) • Transit service seen as commodity • Value sharing structure, edge gets all the money
Contract Routing Architecture • An ISP is abstracted as a set of “contract links” • Contract link: an advertisable contract • between peering/edge points i and j of an ISP • with flexibility of advertising different prices for edge-to-edge (g2g) intra-domain paths • Contract components • performance component, e.g., capacity • financial component, e.g., price • time component, e.g., term capability of managing value flows at a finer granularity than point-to-anywhere deals Global Internet 2008
G2G Set-of-Links Abstraction • Can change things a lot even for small scenarios..
G2G Set-of-Links Abstraction • Max Throughput Routing Average over 50 Random topologies ICC 2012
G2G Set-of-Links Abstraction • Min Delay Routing Average over 50 Random topologies Average over 50 BRITE topologies ICC 2012
Path-Vector Contract Routing [5, A, 1-2, 15-30Mb/s, 15-30mins, $8] [5, 10-30Mb/s, 15-45mins, $10] [5, A-B, 1-2-4, 15-20Mb/s, 20-30mins, $4] ISP B path request path request 2 reply reply [A-B-C, 1-2-4-5, 20Mb/s, 30mins] 1 4 ISP A User X reply 3 ISP C path request 5 [5, A, 1-3, 5-10Mb/s, 15-20mins, $7] Paths to 5 are found and ISP C sends replies to the user with two specific contract-path-vectors. Paths to 5 are found and ISP C sends replies to the user with two specific contract-path-vectors. [A-C, 1-3-5, 10Mb/s, 15mins]
Results – Path Exploration Over 80% path exploration success ratio even at 50% discovery packet filtering thanks to diversity of Internet routes. With Locality, PVCR achieves near 100 percent path exploration success. As budget increases with BTTL and MAXFWD, PVCR becomes robust to filtering GLOBECOM 2012
Results – Traffic Engineering PVCR provides end-to-end coordination mechanisms. No hot-spots, network bottlenecks ICC 2012
Problem 2: Routing Scalability • Routing scalability is a burning issue! • Growing routing state and computational complexity • Timely lookups are harder to do • More control plane burden • Growing demand on • Customizable routing (VPN) • Higher forwarding speeds • Path flexibility: policy, quality
Problem 2: Routing Scalability • Cost of routing unit traffic is not scaling well • Specialized router designs are getting costlier, currently > $40K • BigData flows More packets at faster speeds.. • How to scale routing functionality to BigData levels?
Offload the Complexity to the Cloud? • Cloud services are getting abundant • Closer: • Delay to the cloud is reducing [CloudCmp, IMC’10] • Bandwidth to the cloud is increasing • Cheaper: • CPU and memory are becoming commodity at the cloud • Cloud prices are declining • Computational speedups via parallelization • Scalable resources, redundancy
CAR: Cloud-Assisted Routing • Goal: To mitigate the growing routing complexity to the cloud • Research Question: If we maintain the full router functionality at the cloud but only partial at the router hardware, can we solve some of the routing scalability problems? Cloud Providing CAR Services to Many Routers Proxy Router X (Software with Full Routing Functions) Updates and Packets Updates CAR Router X Router X (Hardware with Partial Routing Functions) Use the valuable router hardware for the most used prefixes and the most urgent computations. Amdahl’s Law in action!
CAR: An Architectural View Barrier being pushed Specialized HW Scalability (packets/sec) Specialized ASIC (Cisco Catalyst Series) NetFPGA OpenFlow Hybrid SW/HW CAR PacketShader[25] SwitchBlade[17] RouteBricks[7] More Platform Dependence Pure SW RCP[8] Cisco CSR[10] Click Per Interface Per Flow Per Packet Flexibility (# of configuration parameters) Finer Programmability
CAR: A sample BGP Peering Scenario Step 1: Table Exchange btw Proxies • BGP Peer Establishment • 400K Prefix Exchanged (Full Table) • Takes approx. 4-5 minutes • Only 4K prefixes selected as best path Step 2: ORF List Exchange Between Routers and Proxies • BGP Peer Establishment w/ CAR • 4K prefixes provided to Routers • Outbound Route Filtering RFC 5291 • Takes approx. 1-2 minutes • Only selected alternative paths out of 400K installed later Step 3: Only Selected Prefixes Exchange Initially Btw Routers BGP Peer Establishment Scenario CAR’s CPU Principle: Keep the control plane closer to the cloud! Offload heavy computations to the cloud.
CAR: A sample BGP Peering Scenario Potential for 5x speed-up and 5x reduction of CPU load during BGP peer establishment • BGP Peer Establishment • 400K Prefix Exchanged (Full Table) • Takes approx. 4-5 minutes • Peak CPU Utilization • Only 4K prefixes selected as best path
CAR: Caching and Delegation Full FIB Full RIB Regular Updates and Replacement Temporal and Prefix Continuity / Spatiality Traffic Partial FIB Partial RIB
CAR: Caching and Delegation 1st Option: Traffic into large buffers (150 ms) Resolve next hop from Cloud Proxy 2nd Option: Reroute Traffic to Cloud Proxy via Tunnels Miss (0.1%) Traffic Partial FIB Partial RIB Hit (99.9%) • Revisiting Route Caching: World should be Flat, PAM 2009 • One tenth of the prefixes account for 97% of the traffic • One fourth of FIB can achieve 0.1 % miss rate • LRU Replacement of Cache CAR’s Memory Principle: Keep data plane closer to the router. Keep the packet forwarding operations at the router to the extent possible.
Problem 3: Bottleneck Resolution BigData flows Long time scales A few mins Fixed network behavior Fixed bottlenecks Several hours Dynamic network behavior Moving bottlenecks We need to respond to network dynamics and resolve bottlenecks as the BigData flows run!
Where is the Bottleneck? • Intra-Node Bottlenecks Multiple parallel streams with inter-node network optimizations, but ignoring intra-node bottlenecks CPU CPU CPU CPU Relay Node NIC NIC NIC NIC Internet Disk Disk Disk Disk Disk Disk Disk Disk Source End-system Destination End-system Relay Node Truly end-to-end multiple parallel streams with joint intra- and inter-node network optimizations
Leverage Multi-Core CPUs for Parallelism? • Quality-of-Service (QoS) Routing may help! But.. • NP Hard to configure optimally • Route flaps • Multi-core CPUs are abundant • How to leverage them in networking?[CCR’11] • Can we use them to parallelize the protocols? • Multiple instances of the same protocol • Collaborating with each other • Each instance working on a separate part of the network? • A divide-and-conquer? • Should do it with minimal disruption Overlay
B 10 Mb/s 5 Mb/s A 5 Mb/s D 2 2 5 Mb/s 10 Mb/s 1 C 5 Mb/s 1 2 1 5 Mb/s 1 1 1 2 4 1 1 5 Mb/s 3 1 Parallel Routing Substrate 2 Substrate 3 Substrate 1
B 10 Mb/s 5 Mb/s A 5 Mb/s D 5 Mb/s 10 Mb/s C Parallel Routing • Nice! But, new complication: How to slice out the substrates? 5 Mb/s 2 2 1 1 1 1 1 1 5 Mb/s 1 1 1 1 5 Mb/s Substrate 2 Substrate 3 Substrate 1 A-C is maxed out B-D is maxed out
Summary • Economic Granularity • Finer, more flow-aware network architectures • An idea: Contract-Switching, Contract Routing • Routing Scalability • Cheaper solutions to routers’ CPU and memory complexity • An idea: CAR • Bottleneck Resolution • Complex algorithms to better resolve bottlenecks and respond to network dynamics • An idea: Parallel Routing
THE END Thank you! Google “contract switching” Project Website http://www.cse.unr.edu/~yuksem/contract-switching.htm
Collaborators & Sponsors Faculty Mona Hella (hellam@ecse.rpi.edu), Rensselaer Polytechnic Institute Nezih Pala (palan@fiu.edu), Florida International University Students Abdullah Sevincer (asev@cse.unr.edu) (Ph.D.), UNR BehroozNakhkoob (nakhkb@rpi.edu) (Ph.D.), RPI Michelle Ramirez (beemyladybug1@yahoo.com) (B.S.), UNR Alumnus Mehmet Bilgi (mbilgi@cse.unr.edu) (Ph.D.), UC Corp. Acknowledgments This work was supported by the U.S. National Science Foundation under awards 0721452 and 0721612 and DARPA under contract W31P4Q-08-C-0080
Computational Scenario Cloud Proxy Routers 1) Full Table Exchange 2) Outbound Route Filter Exchange 2) Outbound Route Filter Exchange Internet 3) Partial Table Exchange Peers Peers Cloud Assisted BGP Routers
Delegation Scenario Cloud Proxy Router Cloud Proxy Router Full FIB Unresolved Traffic Delegation Internet Cache Updates FIB Cache Peers in an IXP Cloud Assisted Router
Delegation Scenario Proxy Click Router EC2, N. Virginia IP GRE Tunnels Emulab, Utah Traffic Sink Nodes Traffic Generator CAR Click Router
Delegation Scenario • Cloud-Assisted Click Router • Packet Counters for • Flows Forwarded to Cloud • Received Packets • Prefix Based Miss Ratio • Modified Radix-Trie Cache for Forwarding Table • Router Controller • Processing Cache Updates • Clock Based Cache Replacement Vector
Simulation Results • Random topology • Inter-domain and Intra-domain are random • BRITE topology • BRITE model for inter-domain • Rocketfuel Topologies (ABILENE and GEANT) for intra-domain • GTITM topology • GTITM model for inter-domain • Rocketfuel Topologies (ABILENE and GEANT) for intra-domain
Forwarding Mechanisms bTTL: How many copies of discovery packet will be made and forwarded? Provides caps on messaging cost. dTTL: Time to Live, Hop-Count Limit MAXFWD: Max. number of neighbors to be forwarded
Evaluation • CAIDA, AS-level, Internet Topology as of January 2010 (33,508 ISPs) • Trial with 10000 ISP Pair (src,dest), 101 times • With various ISP cooperation / participation and packet filtering levels • NL: No local information used • L: Local information used (with various filtering) • With no directional and policy improvements for base case (worst) performance
Results - Diversity Tens of paths discovered favoring multi-path routing and reliability schemes.
Results – Messaging Cost Number of discovery packet copies is well below theoretical bounds thanks to path-vector loop prevention.
Many Possibilities • Intra-cloud optimizations among routers receiving the CAR service • Data Plane: Forwarding can be done in the cloud • Control Plane: Peering exchanges and routing updates can be done in the cloud • Per-AS optimizations • Data Plane: Pkts do not have to go back to the physical router until the egress point • Control Plane: iBGP exchanges
Some Interesting Analogies? • High cloud-router delay • CAR miss at the router Page Fault • Delegation is preferable • Forward the pkt to the cloud proxy • Low cloud-router delay • CAR miss at the router Cache Miss • Caching (i.e. immediate resolution) is preferable • Buffer the pkt at the router and wait until the miss is resolved via the full router state at the cloud proxy
Intra-Node Bottlenecks • Where is the bottleneck? SFO NYC Disk 0 100Mb/s 1Gb/s NIC 0 CPU 0 file-to-Miami.dat 50Mb/s Internet 50Mb/s Disk 1 NIC 1 CPU 1 file-to-NYC.dat 100Mb/s 1Gb/s Miami
Intra-Node Bottlenecks • Where is the bottleneck? Inter-node Topology without Intra-node Visibility 50 Mb/s The network’s routing algorithm finds the shortest paths to NYC and Miami with NIC 0 and NIC 1 being the exit points, respectively. However, the intra-node topology limits the effective transfer rates. 100 Mb/s NIC 0 NYC 75Mb/s file-to-Miami.dat file-to-NYC.dat 75Mb/s NIC 1 Miami 100 Mb/s 50 Mb/s NYC Disk 0 100Mb/s 1Gb/s NIC 0 CPU 0 file-to-Miami.dat 50Mb/s Network SFO 50Mb/s Disk 1 NIC 1 CPU 1 file-to-NYC.dat 100Mb/s 1Gb/s Miami
Intra-Node Bottlenecks • Where is the bottleneck? Integrated Topology with Visible Intra-node Topology 75 Mb/s When the intra-node topology is included in the calculation of shortest paths by the routing algorithm, it becomes possible to find better end-to-end combinations of flows for a higher aggregate rate. 100 Mb/s 100Mb/s NIC 0 NYC Disk 0 file-to-Miami.dat 75Mb/s 50Mb/s 75Mb/s 50Mb/s file-to-NYC.dat NIC 1 Miami Disk1 100 Mb/s 100Mb/s 75 Mb/s NYC Disk 0 100Mb/s 1Gb/s NIC 0 CPU 0 file-to-Miami.dat 50Mb/s Network SFO 50Mb/s Disk 1 NIC 1 CPU 1 file-to-NYC.dat 100Mb/s 1Gb/s Miami
CAR: An Architectural View Long Transition from Current State of Routing to Cloud-Integrated Next-Generation Future Internet Parallel Architectures (e.g., RouteBricks) Clustered Commodity Hardware (e.g., Trellis, Google) Specialized ASIC (e.g., Cisco) Routing As a Service (e.g., RCP) Managing Routers from Cloud (e.g., NEBULA, Cisco CSR) Separation of Control & Forwarding Planes (e.g., OpenFlow) Cloud-Assisted Routing (CAR) A middle-ground to realize the architectural transition.
Technical Shortcomings AS-Path B-C-D: 45ms 1 ISP A ISP B 2 AS-Path B-C-D: 35ms