340 likes | 751 Views
A Clean Slate 4D Approach to Network Control and Management. Hui Zhang Carnegie Mellon University Joint work with Hemant Gogineni, Albert Greenberg, Gisli Hjalmtysson, David Maltz, Andy Myers, Eugene Ng, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan. A Conventional View of a Network.
E N D
A Clean Slate 4D Approach to Network Control and Management Hui Zhang Carnegie Mellon University Joint work with Hemant Gogineni, Albert Greenberg, Gisli Hjalmtysson, David Maltz, Andy Myers, Eugene Ng, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan
A Conventional View of a Network E • Physical topology is a graph of nodes and links • Run Dijkstra to find route to each node H A C F I J D B G
A Conventional View of a Network E H A C Knowing how the routers are connected says little about how two hosts are communicating F I J D B Configuration File EIRGP BGP OSPF Access Control Table NAT Table Tunnel Table Forwarding Table
A Study of Operational Production Networks • How complicated/simple are real control planes? • What is the structure of the distributed system? • Use reverse-engineering methodology • There are few or no documents • The ones that exist are out-of-date • Anonymized configuration files for 31 active networks (>8,000 configuration files) • 6 Tier-1 and Tier-2 Internet backbone networks • 25 enterprise networks • Sizes between 10 and 1,200 routers • 4 enterprise networks significantly larger than the backbone networks
Router Configuration Files • interface Ethernet0 • ip address 6.2.5.14 255.255.255.128 • interface Serial1/0.5 point-to-point • ip address 6.2.2.85 255.255.255.252 • ip access-group 143 in • frame-relay interface-dlci 28 • router ospf 64 • redistribute connected subnets • redistribute bgp 64780 metric 1 subnets • network 66.251.75.128 0.0.0.127 area 0 • router bgp 64780 • redistribute ospf 64 match route-map 8aTzlvBrbaW • neighbor 66.253.160.68 remote-as 12762 • neighbor 66.253.160.68 distribute-list 4 in access-list 143 deny 1.1.0.0/16 access-list 143 permit any route-map 8aTzlvBrbaW deny 10 match ip address 4 route-map 8aTzlvBrbaW permit 20 match ip address 7 ip route 10.2.2.1/16 10.2.1.7
Limitation of Today’s Control & Management:Complex System of Systems • Systems are designed as components to be used in larger systems in different contexts, for different purposes, interacting with different components • Example: OSPF and BGP are complex systems in its own right, they are components in a routing system of a network, interacting with each other and packet filters … • Complex configuration to enable flexibility • The glue has tremendous impact on network performance • State of art: multiple interactive distributed programs written in assembly language • Lack of intellectual framework to understand global behavior
Are We Going to The Right Direction? • IP Control Plane function overloading • Reachability • Policy control • Resiliency and survivability • Traffic Engineering, load balancing • VPN • Ethernet control plane overloading • Spanning Tree, RSP, MSTP, vLAN, … • Control & management complexity works against robustness, dependability, security
Limitation of Today’s Control & Management: Difficult to Implement Interesting Policies • Network designers want “simple” things, but achieving them is incredibly hard Data Center Infrastructure Servers
Limitation of Today’s Control & Management: Difficult to Implement Interesting Policies • Difficult for designers to express desired behaviors
Limitation of Today’s Control & Management:Hard to Coordinate Multiple Low Level Mechanisms To Achieve Network-wide Goals • Lack of higher level specification of network wide goals • Load balancing objectives vs. per link OSFP weight • Reachability matrix vs. per interface access control list • Difficult to dynamically coordinate multiple mechanisms • Forwarding, access control, NAT, tunnel management
Limitation of Today’s Control & Management:Lack of Robust Management Communication Channel • Circular dependencies between management plane, control plane, data plane • Management & control traffic travels along the paths that they intend to maintain • Operational networks usually require a separate control/management network • What should be the architecture of a converged network?
CM CM Virtualization SW Virtualization SW Substrate HW Substrate HW Fundamental Problem Never Goes Away: The GENI Management Problem • How the GMC communicates with each node? • Fundamental architectural issue that affects overall network manageability and security Slice Manager GENI Management Core Resource Controller Auditing Archive node control sensor data CM Virtualization SW Substrate HW
A Strawman Solution • GMC uses the Internet to manage GENI elements • GENI elements expose external IP addresses that can be reached from the Internet • Management traffic between GMC and GENI elements traverses the Internet
Issues with the Strawman • Not consistent with the architectural vision of a single converged network for the future • Management depends on a separate network • Management plane is susceptible to security and available risks prevalent in current Internet • DDOS attacks target external IP interfaces of GENI elements • GENI elements only as secure and available as the Internet
Summary: Limitations of Today’s Control & Management Plane • High complexity • Difficult to implement non-trivial control policies • Difficult to dynamically coordinate control logics • Lack of robust control communication channel • Root cause: • Wrong partition of functionalities in control/management plane, which leads to lack of flexibility and high complexity
Configuration File Configuration File Configuration File EIRGP EIRGP EIRGP BGP BGP BGP OSPF OSPF OSPF Access Control NAT Table Tunnel Table Access Control NAT Table Tunnel Table Access Control NAT Table Tunnel Table Forwarding Table Forwarding Table Forwarding Table Incremental Approach To Address Problem • Challenges of using sophisticated management plane to manipulate control plane via configuration interface • Configuration interface too primitive • Control plane internal logic too complex Convert to Control plane configuration Reverse-engineer Routing Logic TE/Security Policy Config commands
Good Abstractions Reduce Complexity Management Plane All decision making logic lifted out of control plane • Routers no longer makes decisions on routing • Dissemination plane provides robust communication to/from data plane switches Configs Decision Plane Control Plane FIBs, ACLs FIBs, ACLs Dissemination Data Plane Data Plane
A Clean-Slate Approach: The 4D Architecture Generating table entries Decision Plane Routing Table Access Control Table NAT Table Tunnel Table DisseminationPlane Install table entries Discovery Plane Modeled as a set of tables Data Plane
What is Hard and What is Not • Hard • Distributed algorithms for sophisticated functions • Distributed algorithms and protocols that are extensible • Consensus on the right protocol • Not Hard • Centralized algorithms • Basic protocol for Decision Element (DE) recovery • Robust and configuration-free protocol that provides reachability
4D Separates Distributed Computing Issues from Networking Issues • Distributed computing issues: protocols and network architecture • Overhead • Resiliency • Scalability • Networking issues: decision logic • Traffic engineering and service provisioning • Egress point selection • Tunnel management • Reachability control (VPNs) • Precomputation of backup paths
Example – 4D Approach to Reachability Control Reachability matrix Decision Plane • Reachability matrix directly expresses intended goal • Path computation can jointly balance load and obey reachability constraints • Packet filters installed only where needed, and changed when routing changes Path Computation Traffic Matrix Topology FIBs, ACLs Discovery/Dissemination Plane Load info Data Plane
Example: 4D Enables Simpler and Better Traffic Engineering D • OSPF normally calculates a single path to each destination D • OSPF allows load-balancing only for equal-cost paths to avoid loops • Using ECMP requires careful engineering of link weights D Decision Plane with network-wide view can do more sophisticated optimization
Example: Meta-Management for GENI • A thin layer of meta-control software pervades the GENI network, boots up GENI elements, connects them to the GMC • GENI nodes are able to bootstrap with zero pre-configuration • The meta-control exposes IP protocol stack interface to CM and GMC so that all existing management software can run intact • The meta-control protocol uses the same GENI physical resources but runs independently from native data protocols • The design of meta-control protocol is optimized for high robustness and security • Performance is a lesser consideration • Can implement stronger security mechanisms such as hop-by-hop authentication to prevent man-in-the-middle attack • Can implement robust protocols such as source routing or gossip
Using the 4D Architecture • Install a security key on each device • Connect them together • Connect Decision Elements Example network with 49 switches and 5 DEs
Does it work? Yes. • 4D designed so performance can be predicted • Recovers from single link failure in < 120 ms • < 1 s response considered “excellent” • Faster forwarding reconvergence possible • Survives failure of master Decision Element • New DE takes control within 170 ms • No disruption unless second fault occurs • Gracefully handles complete network partitions • Less than 170 ms of outage • At no point did two DEs attempt to master the same switch
4D Enables Customized Decision Logic • Example also illustrates the 4D controlling both L2 and L3 (Ethernet and IP)
State-of-Art Different set of protocols for different data planes STP for Ethernet OSPF/BGP for IP Same protocols (logic) for different environments Data center, campus, ISP Hard to customize, hard to extend 4D Common dissemination (protocol) IPv4, IPv6, Ethernet Customizable decision plane (algorithm) Data center, enterprise, access, metro, backbone Reachability, traffic engineering, robustness One Size Fits All?
Control Plane: The Key Leverage Point • Great Potential: control plane determines the behavior of the network • Reaction to events, reachability, services • Great Opportunities • A radical clean-slate control plane can be deployed • Agnostic to packet format: IPv4/v6, ethernet • No changes to end-system software • Control plane is the nexus of network evolution • Changing the control plane logic can smooth transitions in network technologies and architectures
Ethernet or 802.3 Early Implementations • Bus-based Local Area Network • Collision Domain, CSMA/CD • Bridges and Repeaters for distance/capacity extension • 1-10Mbps: coax, twisted pair (10BaseT) WAN B/R LAN HUB Switch • Switched solution • Little use for collision domains • 80% of traffic leaves the LAN • Servers, routers 10 x station speed • 10/100/1000 Mbps, 10gig coming: Copper, Fiber WAN Router Ethernet Conc.. Server Learning from Ethernet Evolution Experience Current Implementations: Everything Changed Except Name and Framing