180 likes | 306 Views
Evolving Toward a Self-Managing Network. Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex. Why is Network Management So Darn Hard?. Oodles and oodles of complex features Many protocols Many mechanisms Many configurable parameters
E N D
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex
Why is Network Management So Darn Hard? • Oodles and oodles of complex features • Many protocols • Many mechanisms • Many configurable parameters • Little guidance for network administrators • How to select and compose features? • How to set the configurable parameters? • Managing boxes, rather than networks • Routers, switches, firewalls, IDSes, servers, etc. • Low-level, box-specific configuration languages
The Enemy is Complexity • Goal: raising the level of abstraction • Network-level design and configuration • Composition of protocols and mechanisms • Idea #1: add abstraction on top • Compile high-level spec into box configuration • But, must grapple with inherent complexity • Idea #2: design system for manageability • Identify network-level abstractions • … and change the boxes and protocols • But, must grapple with backwards compatibility
Example: Border Gateway Protocol • ASes exchange reachability information • IP prefix: block of destination IP addresses • AS path: sequence of ASes along the path • Configurable routing policies • Path selection (which route to use?) • Path export (who to tell about the route?) “12.34.158.0/24: path (7018,1,88)” “12.34.158.0/24: path (88)” 88 1 7018 data traffic data traffic 12.34.158.5
Too distributed Too indirect Some Things I Hate About BGP… • Routers in an AS have different views • Effect: protocol oscillation and loops • Point fix: testing sufficient conditions • Routing policy distributed across routers • Effect: routers need to share information • Point fix: complex “tagging” of BGP routes • Policy has only an indirect effect on traffic • Effect: selecting the right policy is hard • Point fix: “what if” tools for traffic engineering • BGP route selection depends on the IGP • Effect: disruptions from small internal changes • Point fix: “what if” tools to identify risks
Interdomain Routing: Design for Manageability • Routing Control Platform • Represents the AS to others • Has complete view of candidate routes • Computes answers for the AS’s routers • Communicates with other ASes • Using BGP or (ideally) a brand new protocol Inter-AS Protocol RCP RCP RCP AS 1 AS 2 AS 3 Physical peering
Advantages of RCP Approach • Lower management complexity • Complete, network-wide view • Direct control over the routers • Single specification of policies and objectives • Simpler routers • Much less control-plane software • Much less configuration state • Enabling innovation • New algorithms for selecting paths within an AS • New approaches to inter-AS routing
Deployability: Backwards Compatibility using BGP • Border Gateway Protocol (BGP) • Protocol: messages sent between routers • Decision logic: route-selection process • Policy: configurable rules for path selection/export • The key point is that BGP has • Complex decision logic and policies • Yet a simple protocol(and message format) • Use BGP messages to “program” the routers
Phase 1: Flexible Path Selection in One AS Before: conventional use of BGP in backbone network eBGP iBGP After: RCP learns routes and sends answers to routers eBGP RCP iBGP
Phase 2: AS-Wide Path Selection and Export Before: RCP gets “best” iBGP routes (and IGP feed) eBGP RCP iBGP After: RCP gets all eBGP routes from neighbors eBGP RCP iBGP
Phase 3: Direct Communication Between RCPs Before: RCP gets all eBGP routes from neighbors eBGP RCP iBGP After: ASes exchange routes via RCP Inter-AS Protocol RCP RCP RCP iBGP AS 1 AS 2 AS 3 Physical peering
Systems Considerations (NSDI’05) • Reliability • Problem: single point of failure • Solution: replication of RCP components • Consistency • Problem: inconsistent decisions by replicas • Solution: consistency without inter-replica protocol • Scalability • Problem: storing and computing for all routers • Solution: store each route once and amortize work
Example Network Management Applications • Customer-driven route selection • Customized load-balancing policies • Geographic rules for route selection • Blocking denial-of-service attacks • “Blackhole” routes that drop traffic • Only for routers carrying attack traffic • Hitless maintenance • Move traffic away from certain routers • Before the operators bring down the routers
Conclusion • Network management is too hard • IP was not designed for management • Complex, distributed operation of routers • Must reduce complexity • Network-wide views and objectives • Direct control over the data plane • RCP approach is feasible • Deployable, scalable, and reliable • Solves important management problems • Many interesting open problems
Routing Control Platform (RCP) Routing Control Platform (RCP) Route Control Server (RCS) Options Answers Topology OSPF Viewer BGP Engine … BGP updates OSPF link-state advertisements BGP updates … … Network
Scalability: Standard Computing Platform • Prototype on a high-end PC • 3.2 GHz Pentium-4 with 8 GB of RAM • Running the Linux 2.6.5 kernel • Workload from the AT&T backbone • Replay the BGP and OSPF messages • Good RCP performance • Memory usage: less than 2GB • Speed, BGP changes: less than 40 msec • Speed, topology changes: 0.1-0.8 seconds Short answer: the system can keep up
Reliability: Replication and Consistency • Replication: avoid single point of failure • Multiple RCPs in a network • Connected at different places • Consistency: no explicit coordination • Replica has full view of each partition • Replicas perform the same algorithm on the same data, and get the same answer A, B A B RCP A RCP B