Managing your network for availability

Managing your network for availability Eric Severson – CCNP, CCDP, MCSE Network Specialties, Inc. eric@network-specialties.com (817) 491-0267

Agenda • Network Design for availability • Definition of a Managed Network • Basic tools used to manage a network • Discussion

Availability * Unscheduled downtime

Design for Availability Availability of a Single Component Availability = MTBF/(MTBF+MTTR) Example: MTBF = 120,000 hr MTTR = 4 hr Availability = 0.99997 = 99.9967% Annual downtime = 17.5 minutes

Availability – Multiple Components Multiple Components Availability = Avail(component 1) x Avail(component 2) … x Avail(component n) ISP router firewall switch server

Availability – Server System

Availability – Multiple Components

Availability – Other Components • What about A/C power availability? • What about software errors – IOS bugs, application code errors, bad patches or antivirus updates that cause outages? • How about the human fat-finger?

Availability – Power/Software added

Availability – How Can you Improve? • Add redundancy • Reduce repair time • Manage your network…

Availability – With Redundancy Parallel Availability = Same product of availabilities but use 1-((1-availability)*(1-availability)) for each component that has been made redundant. ISP router firewall switch server ISP router firewall switch server

What is a managed network?

Managed Network Characteristics • Systems must be managed towards a common goal • Network must be secure • Infrastructure is thoroughly documented • Equipment must be manageable • Enterprise synchronized to a common time source

Managed Network Characteristics • Logging • SNMP trapping • SNMP polling • Vendor specific alerting – e.g. Dell iDRAC • Application monitoring • Personnel trained on equipment and management systems • Network Management System

Why do we want a managed network? • To achieve the availability that was designed into the system • Downtime is costly!

Equipment is Manageable • Enterprise grade hardware • Configurable • Supports industry standards • Evolves to support new standards/features • Redundancy available if design demands it • Remotely accessible (SSH, http, telnet, SNMP)

Comprehensive Documentation • Organized repository (online/offline) • “First Responder” documents • Network diagrams - logical and physical • Network device lists • Circuits lists • Applications/firewall rules • Contact lists – IT/vendors/support/site • Policies/procedures/service level agreements • Business continuity/disaster recovery plan

Enterprise synchronized to a standard time • Hierarchical design • NTP (Network Time Protocol) is used • Real -time clock or approved Internet source • All network hardware must synchronize • All active systems ( Windows, UNIX and proprietary platforms) must synchronize

Equipment must be maintained • Vendor hardware maintenance • Vendor software maintenance • Hot/cold spares • Periodic patches to fix software/hardware issues • Upgrades to add new features • Configuration management • Change control • Life cycle planning

Logging • Syslog server for accepting logged events • Windows/UNIX Event logging • Logging properly configured on all systems • Systems in place to interpret log events • Predetermined/proscribed actions for log events • Out-of-band alerting for actionable events

SNMP Trapping • SNMP (Simple Network Management Protocol) • NMS to accept SNMP messages • Devices configured to send SNMP messages when events occur • Systems in place to interpret SNMP events • Predetermined/proscribed actions for SNMP events • Out-of-band alerting for actionable events • Operational guidelines for responding to events

SNMP POLLING • SNMP server configured to proactively retrieve operational/performance data • NMS system in place to interpret SNMP events • Proscribed actions for SNMP events • Provide detailed metrics on hardware/software systems • Out-of-band alerting for actionable events • Operational guidelines for responding to events

Application Monitoring • Specific TCP/UDP ports are checked for proper response - e.g. HTTP, SSL, SMTP, DNS, etc • Synthetic transactions are issued – e.g. a query against a web site/database system • Out-of-band alerting for actionable events • Operational guidelines for responding to events

Trained Personnel • Network design • LAN configuration, operation & troubleshooting • WAN configuration, operation & troubleshooting • Windows active directory/networking operations • Vendor specific tools • Generic tools

Systems must be managed towards a common goal • Availability should be specified • Expectations should be explained to customers • Customer expectations should be met • Network metrics should be developed and publicized

Network must be secure • Only authorized access is allowed • Network equipment must be in secure areas • Network equipment must be hardened • AAA (Authentication, Authorization and Accounting) should be in place • Network design should support the security paradigms

Logging • Syslog is native to Unix/Linux • Kiwi Syslog is a free Windows program • Syslog can be a part of a network management software package • Windows event logs can be retrieved by NMS or other application • Define how syslog will be used

SNMP Polling/Trapping • Define what you want to track and thresholds for actionable items • SNMP community strings defined on each device/host • SNMP polling and trapping is configured on NMS • Define actions (NMS and human) should an actionable state occur

How to Build a Managed Network • Document existing infrastructure • Set up logging host • Configure all devices/hosts for logging & SNMP • Set up Network Management Station • Configure logging, polling and traps • Document specific actions for events

No-Cost Systems • Use the tools that vendors provide free • Syslog - Linux or Kiwi syslog • NMS – Nagios, OpenNMS, Zenoss, Pandora, Groundwork, Hyperic, NetXMS • Configuration management • Kiwi Cattools - routers, switches and firewalls • Scripting – Perl/TCL/Expect/WMI

Low-Cost Systems • What’s Up Gold • PRTG • GFI Network Monitor

Enterprise Systems • HP Openview • Solarwinds Orion • CA eHealth • IBM Tivoli • EMC • Ciscoworks • Cisco MARS

Next Steps • Develop strategy • Develop short-term tactical plan to rapidly move towards a more manageable network

Further Information • Comparison of network monitoring systems - http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems • Popular Network Management Software in Comparison - http://ipinfo.info/html/network_management_software.php

Eric Severson Network Specialties, Inc. eric@network-specialties.com 817-491-0267

Managing your network for availability