1 / 18

Evolving Toward a Self-Managing Network

Evolving Toward a Self-Managing Network. Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex. Why is Network Management So Darn Hard?. Oodles and oodles of complex features Many protocols Many mechanisms Many configurable parameters

andrew
Download Presentation

Evolving Toward a Self-Managing Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex

  2. Why is Network Management So Darn Hard? • Oodles and oodles of complex features • Many protocols • Many mechanisms • Many configurable parameters • Little guidance for network administrators • How to select and compose features? • How to set the configurable parameters? • Managing boxes, rather than networks • Routers, switches, firewalls, IDSes, servers, etc. • Low-level, box-specific configuration languages

  3. The Enemy is Complexity • Goal: raising the level of abstraction • Network-level design and configuration • Composition of protocols and mechanisms • Idea #1: add abstraction on top • Compile high-level spec into box configuration • But, must grapple with inherent complexity • Idea #2: design system for manageability • Identify network-level abstractions • … and change the boxes and protocols • But, must grapple with backwards compatibility

  4. Example: Border Gateway Protocol • ASes exchange reachability information • IP prefix: block of destination IP addresses • AS path: sequence of ASes along the path • Configurable routing policies • Path selection (which route to use?) • Path export (who to tell about the route?) “12.34.158.0/24: path (7018,1,88)” “12.34.158.0/24: path (88)” 88 1 7018 data traffic data traffic 12.34.158.5

  5. Too distributed Too indirect Some Things I Hate About BGP… • Routers in an AS have different views • Effect: protocol oscillation and loops • Point fix: testing sufficient conditions • Routing policy distributed across routers • Effect: routers need to share information • Point fix: complex “tagging” of BGP routes • Policy has only an indirect effect on traffic • Effect: selecting the right policy is hard • Point fix: “what if” tools for traffic engineering • BGP route selection depends on the IGP • Effect: disruptions from small internal changes • Point fix: “what if” tools to identify risks

  6. Interdomain Routing: Design for Manageability • Routing Control Platform • Represents the AS to others • Has complete view of candidate routes • Computes answers for the AS’s routers • Communicates with other ASes • Using BGP or (ideally) a brand new protocol Inter-AS Protocol RCP RCP RCP AS 1 AS 2 AS 3 Physical peering

  7. Advantages of RCP Approach • Lower management complexity • Complete, network-wide view • Direct control over the routers • Single specification of policies and objectives • Simpler routers • Much less control-plane software • Much less configuration state • Enabling innovation • New algorithms for selecting paths within an AS • New approaches to inter-AS routing

  8. Deployability: Backwards Compatibility using BGP • Border Gateway Protocol (BGP) • Protocol: messages sent between routers • Decision logic: route-selection process • Policy: configurable rules for path selection/export • The key point is that BGP has • Complex decision logic and policies • Yet a simple protocol(and message format) • Use BGP messages to “program” the routers

  9. Phase 1: Flexible Path Selection in One AS Before: conventional use of BGP in backbone network eBGP iBGP After: RCP learns routes and sends answers to routers eBGP RCP iBGP

  10. Phase 2: AS-Wide Path Selection and Export Before: RCP gets “best” iBGP routes (and IGP feed) eBGP RCP iBGP After: RCP gets all eBGP routes from neighbors eBGP RCP iBGP

  11. Phase 3: Direct Communication Between RCPs Before: RCP gets all eBGP routes from neighbors eBGP RCP iBGP After: ASes exchange routes via RCP Inter-AS Protocol RCP RCP RCP iBGP AS 1 AS 2 AS 3 Physical peering

  12. Systems Considerations (NSDI’05) • Reliability • Problem: single point of failure • Solution: replication of RCP components • Consistency • Problem: inconsistent decisions by replicas • Solution: consistency without inter-replica protocol • Scalability • Problem: storing and computing for all routers • Solution: store each route once and amortize work

  13. Example Network Management Applications • Customer-driven route selection • Customized load-balancing policies • Geographic rules for route selection • Blocking denial-of-service attacks • “Blackhole” routes that drop traffic • Only for routers carrying attack traffic • Hitless maintenance • Move traffic away from certain routers • Before the operators bring down the routers

  14. Conclusion • Network management is too hard • IP was not designed for management • Complex, distributed operation of routers • Must reduce complexity • Network-wide views and objectives • Direct control over the data plane • RCP approach is feasible • Deployable, scalable, and reliable • Solves important management problems • Many interesting open problems

  15. Backup Slides

  16. Routing Control Platform (RCP) Routing Control Platform (RCP) Route Control Server (RCS) Options Answers Topology OSPF Viewer BGP Engine … BGP updates OSPF link-state advertisements BGP updates … … Network

  17. Scalability: Standard Computing Platform • Prototype on a high-end PC • 3.2 GHz Pentium-4 with 8 GB of RAM • Running the Linux 2.6.5 kernel • Workload from the AT&T backbone • Replay the BGP and OSPF messages • Good RCP performance • Memory usage: less than 2GB • Speed, BGP changes: less than 40 msec • Speed, topology changes: 0.1-0.8 seconds Short answer: the system can keep up

  18. Reliability: Replication and Consistency • Replication: avoid single point of failure • Multiple RCPs in a network • Connected at different places • Consistency: no explicit coordination • Replica has full view of each partition • Replicas perform the same algorithm on the same data, and get the same answer A, B A B RCP A RCP B

More Related