90 likes | 333 Views
Why Network Management is a Failure. Craig Labovitz. Why Network Management is a Failure. Arbor Background Focus on tier-1/tier2 ISP and large enterprises/hosting Commercial persepctive on network management (though Arbor is not a network manegement company).
E N D
Why Network Management is a Failure Craig Labovitz
Why Network Management is a Failure • Arbor Background • Focus on tier-1/tier2 ISP and large enterprises/hosting • Commercial persepctive on network management • (though Arbor is not a network manegement company)
Why Network Managent has Failed • Not due abstraction • Not due to 3d, 4d versus 5d • Not technology (though that is also a major problem) • Public relations • Market does not value network management (e.g. concord 100M versus netscreen) • No provider/enterprise budgets for managements • ISP RFPs do not prioritize management (linecards, port density) • Analysts hostile • Asset management, SNMP not sexy. Plus you’re supposed to have this already, right?
Irony of Network Management • Arbor provides traffic engineering, asset tracking, market analysis, detection of failures (both security) • But we’re NOT a network management company • (analyst story) • We’re a security company…
Irony of Network Management • Though no market demand, analyst recommendations or market valuation, gaping need for improvement • Most outages not due to attacks, script kiddies, backhoes nor BGP problems (though much more fun to talk about) • Upgrades • Configuration management • Four weeks around new year incredibly stable • Internal studies/FTCS paper
State of Network Management • Few providers have any idea what their network normally looks like • Routing policies (regularly get into arguments) • Traffic distributions • Security policies (ask to turn on/off new firewall rule) • Configuration • Highly entrenched scripts and staff with vested interest in complexity • How do you integrate with our OSS-<provider-name>?
Why Network management so poor? • Networks evolve rapidly • Flood of information • But really because tools and infrastructure support or poor. Why? • No-one cares • Not in ISP vendors rfps • Large hosting interesting peering device choice • Even simple things are suprisizingly difficulty • SNMP is many, many things but not simple • SNMP ifIndex polling vagueries between vendors and close to completely broken on new core router • Story about disabling CPU polling • Most management geared towards devices and not services • And heh, getting the data (Netflow, Cflowd)
Too Much Information and of too low quality • What do you do with 10,000 mrtg graphs showing abnormal traffic? • IDS -> IPS -> SEMs -> NBADs • Market moving towards wholistic view of network events (easier starting from whole and figure out pieces than reverse)
The good and the bad • Some things getting better • Cookie-cutter network design and division of functionality (MPLS core, peering/customer PoP) • Increasing focus on operations and management more visible part of RFPs • Some things getting worse • Loss of visibility (traceroute, ping in MPLS/OAM) • Hidden failures. Determining causality is really, really hard (e.g. recent DNS failure)