290 likes | 300 Views
Rethinking Network Control and Management. David A. Maltz dmaltz@microsoft.com. Context for Network Control and Management. Many different network environments Access, backbone networks Data-center networks, enterprise/campus Many different technologies
E N D
Rethinking Network Control and Management David A. Maltz dmaltz@microsoft.com
Context for Network Control and Management • Many different network environments • Access, backbone networks • Data-center networks, enterprise/campus • Many different technologies • Longest-prefix routing, label switching, circuit switching • IP, Ethernet, MPLS, optical circuits • Outsourcing of responsibility into the network • Middle-boxes: firewalls, network monitoring, … • Many different policies • Routing, reachability, transit, traffic engineering, robustness
ATT/CMU Study of 31 Production networks • Provider & enterprise networks (10-1200 routers) • Many different routing designs • Packet filters, multiple OSPF instances, multiple ASs 2000 Lines in config file 1000 0 0 881 Router ID
Fundamental Problem: Wrong Abstractions OSPF OSPF OSPF BGP BGP BGP Shell scripts Traffic Eng • Management Plane • Figure out what is happening in network • Decide how to change it Planning tools Databases Configs SNMP netflow modems OSPF • Control Plane • Multiple routing processes on each router • Each router with different configuration program • Huge number of control knobs: metrics, ACLs, policy Link metrics Routing policies FIB • Data Plane • Distributed routers • Forwarding, filtering, queueing • Based on FIB or labels FIB FIB Packet filters
Inside a Single Network Shell scripts • Management Plane • Figure out what is happening in network • Decide how to change it Traffic Eng Planning tools Databases Configs SNMP netflow modems • Control Plane • Multiple routing processes on each router • Each router with different configuration program • Huge number of control knobs: metrics, ACLs, policy OSPF Link metrics Routing policies OSPF OSPF OSPF BGP BGP BGP FIB FIB FIB • Data Plane • Distributed routers • Forwarding, filtering, queueing • Based on FIB or labels • State everywhere! • Dynamic state in FIBs • Configured state in settings, policies, packet filters • Programmed state in magic constants, timers • Many dependencies between bits of state • State updated in uncoordinated, decentralized way! Packet filters
Inside a Single Network Shell scripts • Management Plane • Figure out what is happening in network • Decide how to change it Traffic Eng Planning tools Databases Configs SNMP netflow modems • Control Plane • Multiple routing processes on each router • Each router with different configuration program • Huge number of control knobs: metrics, ACLs, policy OSPF Link metrics Routing policies OSPF OSPF OSPF BGP BGP BGP FIB FIB FIB • Data Plane • Distributed routers • Forwarding, filtering, queueing • Based on FIB or labels • Logic everywhere! • Path Computation built into routing protocols • Routing Policy distributed across the routers • Packet Filters placed by tools inMng. Plane • No way to arbitrate inconsistencies between logic • State everywhere! • Dynamic state in FIBs • Configured state in settings, policies, packet filters • Programmed state in magic constants, timers • Many dependencies between bits of state • State updated in uncoordinated, decentralized way! Packet filters
Control Plane: The Key Leverage Point • Great Potential: control plane determines the behavior of the network • Reaction to events, reachability, services • Great Opportunities • Each network (administrative domain) has its own control plane • A radical clean-slate control plane can be deployed • Agnostic to user data format: IPv4/v6, ethernet, circuit • No changes to end-system software • Control plane is the nexus of network evolution • Changing the control plane logic can smooth transitions in network technologies and architectures
An Alternative: The 4D Architecture • Key principles • Network-level objectives • Network-wide views • Direct control • Corollaries • Predictable behavior (including overload threshold) • Zero device-specific or manual configuration • Data plane support for network-wide view • Define objectives in terms of organizationally salient entities
Good Abstractions Reduce Complexity Management Plane All decision making logic lifted out of control plane • Eliminates duplicate logic in management plane • Dissemination plane provides robust communication to/from data plane switches Configs Decision Plane Control Plane FIBs, ACLs FIBs, ACLs Dissemination Data Plane Data Plane
Overview of the 4D Architecture Network-level objectives Decision Plane: • Allmanagement logic implemented on centralized servers making all decisions • Decision Elements use views to compute data plane state that meets objectives, then directly writes this state to routers Decision Dissemination Direct control Network-wide views Discovery Data
Concerns and Challenges • Distributed Systems issues • How will communication between routers and DEs survive failures in the network? • Latency means DE’s view of network is behind reality. Will the control loop be stable? • What is the overhead to/from the DEs? • What happens in a network partition? • Networking issues • Does the 4D simplify control and management? • Can we create logic to meet multiple objectives?
Evaluation of the 4D Prototype • Evaluated using Emulab (www.emulab.net) • Linux PCs used as routers (650 – 800MHz) • Tested on 9 enterprise network topologies (10-100 routers each) Example network with 49 switches and 5 DEs
Performance of the 4D Prototype Trivial prototype has performance comparable to well-tuned production networks • Recovers from single link failure in < 300 ms • < 1 s response considered “excellent” • Faster forwarding reconvergence possible • Survives failure of master Decision Element • New DE takes control within 1 s • No disruption unless second fault occurs • Gracefully handles complete network partitions • Less than 1.5 s of outage
Future Work • Scalability • Evaluate over 1-10K switches, 10-100K routes • Networks with backbone-like propagation delays • Structuring decision logic • Arbitrate among multiple, potentially competing objectives • Unify control when some logic takes longer than others • Protocol improvements • Better dissemination and discovery planes • Deployment in today’s networks • Data center, enterprise, campus, backbone (RCP)
Future Work • Expand relationships with security • Securing the infrastructure • Using 4D as mechanism for monitoring/quarantine • Formulate models that establish bounds of 4D • Scale, latency, stability, failure models, objectives • Generate evidence to support/refute principles
Themes of Network Control & Management Holistic Design • Many different technologies – a few common problems • Find the right abstractions: exploit commonality Clean Slate • How much autonomy do routers/switches need? • New principles for controlling networks • Separate networking issues from distributed system issues Leverage Network Structure • Many different types of networks exist - each with different objectives and topologies
Recent Publications • G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, “On Static Reachability Analysis of IP Networks,” IEEE INFOCOM 2005, Orlando, FL, March 2005. • J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, H. Zhang, “Network-Wide Decision Making: Toward a Wafer-Thin Control Plane,” Proceedings of ACM HotNets-III, San Diego, CA, November 2004. • D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A. Greenberg, H. Zhang, “Routing Design in Operational Networks: A Look from the Inside,” Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (ACM SIGCOMM 2004), Portland, Oregon, 2004. • D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmtysson, A. Greenberg, J. Rexford, “Structure Preserving Anonymization of Router Configuration Data,” Proceedings of ACM/Usenix Internet Measurement Conference (IMC 2004), Sicily, Italy, 2004.
A Clean-slate Design • What are the fundamental causes of network problems? • How to secure the network and protect the infrastructure? • What functionality needs to be distributed – what can be centralized? • How to reduce/simplify the software in networks? • What would a “RISC” router look like? • How to leverage technology trends? • CPU and link-speed growing faster than # of switches
Three Principles forNetwork Control & Management Network-level Objectives: • Express goals explicitly • Security policies, QoS, egress point selection • Do not bury goals in box-specific configuration Reachability matrix Traffic engineering rules Management Logic
Three Principles forNetwork Control & Management Network-wide Views: • Design network to provide timely, accurate info • Topology, traffic, resource limitations • Give logic the inputs it needs Reachability matrix Traffic engineering rules Management Logic Read state info
Three Principles forNetwork Control & Management Direct Control: • Allow logic to directly set forwarding state • FIB entries, packet filters, queuing parameters • Logic computes desired network state, let it implement it Reachability matrix Traffic engineering rules Write state Management Logic Read state info
Overview of the 4D Architecture Network-level objectives Dissemination Plane: • Provides a robust communication channel to each router – and robustness is the only goal! • May run over same links as user data, but logically separate and independently controlled Decision Dissemination Direct control Network-wide views Discovery Data
Overview of the 4D Architecture Network-level objectives Discovery Plane: • Each router discovers its own resources and its local environment • E.g., the identity of its immediate neighbors Decision Dissemination Direct control Network-wide views Discovery Data
Overview of the 4D Architecture Network-level objectives Data Plane: • Spatially distributed routers/switches • Can deploy with today’s technology • Looking at ways to unify forwarding paradigms across technologies Decision Dissemination Direct control Network-wide views Discovery Data
Fundamental Problem: Conflation of Issues • Ideal case: all routing information flooded to all routers inside network • Robustness achieved via flooding • Reality: routing information filtered and aggregated extensively • Route filtering used to implement security and resource policies • Route aggregation used to achieve scalability
4D Separates Distributed Computing Issues from Networking Issues • Distributed computing issues ! protocols and network architecture • Overhead • Resiliency • Scalability • Networking issues ! management logic • Traffic engineering and service provisioning • Egress point selection • Reachability control (VPNs) • Precomputation of backup paths
4D Can Leverage Network Structure • Decision plane logic can be specialized for structure of each physical network • Distributed protocols must be prepared for arbitrary topology graphs • 4D enables network logic specialized differently for access and for backbone • E.g., creating aggregation tree in access network • Advantages • Faster route computations • Retain flexibility to evolve network as needed • Support transition to 100x100 architecture
The Feasibility of the 4D Architecture We designed and built a prototype of the 4D Architecture • 4D Architecture permits many designs – prototype is a single, simple design point • Decision plane • Contains logic to simultaneously compute routes and enforce reachability matrix • Multiple Decision Elements per network, using simple election protocol to pick master • Dissemination plane • Uses source routes to direct control messages • Extremely simple & robust • Quickly route around failed data links, even multiple failures