1 / 46

NOC Services and Applications AFNOG 2002 Brian Longwe

Learn about the Network Operations Centre (NOC) and its role in monitoring and managing a service provider's network, including fault management, configuration management, performance management, security management, and accounting management.

jamesfisher
Download Presentation

NOC Services and Applications AFNOG 2002 Brian Longwe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NOC Services and Applications AFNOG 2002 Brian Longwe *some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner NOC Services and Applications

  2. What is a NOC? Network Operations Centre • Monitors and manages a service provider’s network • Fault monitoring and management • Network status and operational statistics • Information about current, historical and planned availability of systems • Engineers can coordinate their work through the NOC NOC Services and Applications

  3. Network Management - What is it? “In order operate a reliable service, the network must be managed according to a determined discipline, using a coherent structure of information management.” Geoff Huston, ISP Survival Guide NOC Services and Applications

  4. Network Management - Components Parts of Network Management • Fault management • Configuration/Change management • Performance management • Security management • Accounting management NOC Services and Applications

  5. Fault Management • Identify the fault • Regular polling of network elements • Isolate the fault • Diagnosis of the network components • Respond to the fault • Allocate resources to resolve the fault • Priority scheduling • Technical/management escalation • Resolve the fault • notification NOC Services and Applications

  6. Fault Management - systems • reporting mechanism • link to NOC • notify on-call personnel • setup & control alarm procedures • repair/recovery procedures • ticket system NOC Services and Applications

  7. Fault Management - Fault Detection Who notices a problem with the network? • Network Operations Center w/ 24x7 operations staff • open trouble ticket to track problem • preliminary troubleshooting • Assign engineer to problem or escalate ticket status • Customer call • Other ISPs NOC Services and Applications

  8. Fault Management - Fault Detection (con) How can you tell if there is a problem with the network? • Network Monitoring Tools • common utilities • ping • traceroute • Snmp • Monitoring Systems • NOCol • Big Brother • NetSaint • NMIS • HP Openview, etc… • Report state or unreachability • detect node down • routing problems NOC Services and Applications

  9. Exercise: Big Brother Download Big Brother Source from http://t2-noc.ws.afnog.org/downloads.htm Follow instructions on http://t2-noc.ws.afnog.org/bigbrother-setup-notes.txt Set up bb-hosts to monitor: 80.248.70.x tableA.t2.ws.afnog.org 80.248.72.192 t2-noc.ws.afnog.org # smtp ssh BBDISPLAY BBPAGER BBNET 80.248.72.254 noc.ws.afnog.org # smtp ssh http dns page backbone Backbone Routers group-compress <H3><I>Backbone Routers</I></H3> 80.248.72.254 gw-bb.ws.afnog.org # testip 80.248.72.251 t1-gw.ws.afnog.org # testip 80.248.72.252 t2-gw.ws.afnog.org # testip 80.248.72.253 t3-gw.ws.afnog.org # testip page routers T2 Routers group <H3><I>T2 Routers</I></H3> 80.248.70.1 Table A # testip noconn 80.248.70.2 Table B # testip noconn 80.248.70.3 Table C # testip noconn etc... NOC Services and Applications

  10. Fault Management - Ticket System • Very Important! • Need mechanism to track: • failures • current status of outage • carrier tickets NOC Services and Applications

  11. Fault Management:Ticket System • system provides for: • short term memory & communication • scheduling and work assignment • referrals and dispatching • oversight • statistical analysis • long term accountability NOC Services and Applications

  12. Fault Management - Ticket Usage • create a ticket on ALL calls • create a ticket on ALL problems • create a ticket for ALL scheduled events • copy of ticket mailed to reporter and mailing list(s) • all milestones in resolution of problem maintain the same ticket # • ticket stays "open" until problem resolved according to problem reporter NOC Services and Applications

  13. Fault Management - Ticket Example Sample opening ticket SubjectSerial Number Fix sshd on T1 instructor machines 6 AreaQueue none afnog-noc RequestorsOwner B.Candler@pobox.com inst StatusLast User Contact resolved Mon May 7 17:02:21 2001 (30 hr ago) Current PriorityFinal Priority 1 1 Due No date assigned Last Action Mon May 7 17:02:21 2001 (30 hr ago) Created Sat May 5 17:08:08 2001 (3 day ago) NOC Services and Applications

  14. Fault Management - Ticket Example Sample progress ticket TT0000033975 has been MODIFIED. Here are the fields that have been changed: CopyOfTime : 5 TTC Temp : 0 Ticket information log : toppi@umich.edu said ... While I was investigating this, Debbie from UUNet called (via Merit main number) to tell us they were seeing it down. She can be reached at xxx-xxxx. The UUNet ticket is xxxxx.. NOC Services and Applications

  15. Fault Management - Ticket Example Sample closing ticket • includes previous ticket contents plus resolution • Users on the laptop station minihub are not getting correct DHCP responses. No gateway or DNS entries are returned. • Thanks, - Hervey • -- CUSTOMER INFORMATION --------------------- • 'inst' (AFNOG Instructors) – • ------------------------------------- • There have been several issues. First, the Cisco config-switch was set so the box would forget it's config on a power cycle (and we've had a few). Second, I made a typo when I cleaned up a DNS file. Things *should* be working now (famous last words). Resolving this till I hear otherwise. • GJ • ---------------------------------------------------------------- • >otherwise. • >GJ Many thanks! - Hervey NOC Services and Applications

  16. Exercise: Ticket System • Download WebTTS Source from http://t2-noc.ws.afnog.org/downloads.htm • Follow instructions on http://t2-noc.ws.afnog.org/webtts-setup-notes.txt • Create 2-3 users within ticket system • Create tickets to track network occurrences as they occur - network failures will be provided ;-) NOC Services and Applications

  17. Fault Management - typical failures • Node unpingable • no ip connectivity to router • possible reasons: • serial link down • call telco • router down/hardware problem • call engineer • routing problem • troubleshoot with traceroute • routeviews machine NOC Services and Applications

  18. Performance Management A Consistent level of network performance • Data collection • interface stats • throughput • error rates • usage • percent availability • Data analysis for performance metrics and trends • Establishment of performance thresholds • Capacity planning and deployment NOC Services and Applications

  19. Importance of Network Statistics • Accounting • Troubleshooting • Long-term trend analysis • Capacity Planning • Two different types • active measurement • passive measurement • Management Tools have statistical functionality NOC Services and Applications

  20. Performance Management Tools • netflow • cflowd (http://www.caida.org/Tools/Cflowd) • collects flow information from cisco routers • AS to AS information • src and destination ip and port information • useful for accounting and statistics • how much of my traffic is port 80? • how much of my traffic goes to AS237? NOC Services and Applications

  21. Netflow examples • Top ten lists (or top five) ##### Top 5 AS's based on number of bytes ####### srcAS dstAS pkts bytes 6461 237 4473872 3808572766 237 237 22977795 3180337999 3549 237 6457673 2816009078 2548 237 5215912 2457515319 ##### Top 5 Nets based on number of bytes ###### Net Matrix ---------- number of net entries: 931777 SRCNET/MASK DSTNET/MASK PKTS BYTES 165.123.0.0/16 35.8.0.0/13 745858 1036296098 207.126.96.0/19 198.108.98.0/24 708205 907577874 206.183.224.0/19 198.108.16.0/22 740218 861538792 35.8.0.0/13 128.32.0.0/16 671980 467274801 ##### Top 10 Ports ####### input output port packets bytes packets bytes 119 10863322 2808194019 5712783 427304556 80 36073210 862839291 17312202 1387817094 20 1079075 1100961902 614910 62754268 7648 1146864 419882753 1147081 414663212 25 1532439 97294492 2158042 722584770 NOC Services and Applications

  22. Exercise: Cricket Load Track 2 Cricket Page from http://t2-noc.ws.afnog.org/downloads.htm Observe the various characteristics that are being monitored by the system. NOC Services and Applications

  23. Security Management: Do’s & Don’t’s • Dont’ leave things that are likely to be interesting to mice lying on the kitchen table overnight • Plug the holes that mice are using to get into the house • Don’t provide places within the house for mice to build nests • Set traps along walls where you often see mice out of the corner of your eye • Check the traps daily to rebait them and to dispose of squashed mice. Full traps don’t catch mice, and they smell • Avoid using commercial bait-and-kill poisons. Traditional snap traps are best. • Get a cat! NOC Services and Applications

  24. Security Management - Tools • security tools • cops - host configuration checker (www.cert.org) • swatch - email reports of activity on machine • Tcpwrappers – log connections, restrict access • ssh/skey – crypto authentication and communications • Tripwire – monitor changes to system files • Keep up to date with security information • bug reports • CERT advisories mailing list: • http://www.cert.org./contact_cert/certmaillist.html • bug fixes • intruder alerts NOC Services and Applications

  25. Security Management – Good Practice • reporting procedure for security events • e.g. break-ins • abuse email address for customers to report complaints (abuse@your-isp.net) • control internal and external gateways • control firewalls (external and internal) • security log management • centralised logging host NOC Services and Applications

  26. Configuration Management Maintaining information relating to the design of the network and its current configuration • Monitor Network State • Record of network topology • Static • what is deployed • where it is deployed • how it is attached • Dynamic • operational status of the network elements NOC Services and Applications

  27. Configuration Management SNMP driven display husc6 mghgw wjh12 generali harvard talcott wjhgw1 harvisr huelings geo pitirium nngw nnhvd oitgw1 sphgw1 lmagw1 dfch tch tch NOC Services and Applications

  28. Configuration Management Operational Control of network • Start/stop individual components • Alter configuration of devices • Load and save config versions • Hardware/Software upgrades • Methods of access • SNMPGet / SNMPSet • Out-of-Band access NOC Services and Applications

  29. Configuration Management • inventory management • database of network elements • history of changes & problems • directory maintenance • all hosts & applications • nameserver database • host and service naming coordination • "Information is not information if you can't find it" NOC Services and Applications

  30. What is SNMP? • Simple Network Management Protocol • query - response system • can obtain status from a device • standard queries • enterprise specific • uses database defined in MIB • management information base NOC Services and Applications

  31. What do we use SNMP for? • query routers for: • in and out bytes per second • CPU load • uptime • BGP peer session status • query hosts for: • network status • Message queues • Web traffic • Squid proxy load NOC Services and Applications

  32. SNMP Network Management Tools • MRTG http://www.ee-staff.ethz.ch/~oetiker/webtools/mrtg/ • RRDtool http://ee-staff.ethz.ch/~oetiker/webtools/rrdtool/ • Cricket http://cricket.sourceforge.net/ • HP OPenview • Benefits • simple to use and configure • quickly determine spikes/drops in traffic • Can display almost any data that can be collected via SNMP NOC Services and Applications

  33. MRTG NOC Services and Applications

  34. Accounting Management • What do you account for? • Use of the network and the services it provides • Types of accounting data • RADIUS/TACACS accounting data from Access servers • Interface statistics • Protocol statistics • Accounting Data affects Business Models • Bill on usage? • Flat-rate billing? NOC Services and Applications

  35. NOC Practical • network monitor - NOCOL • Observe network status • Create a “problem” • Observe change in status • “resolve” the problem • Statistics? NOC Services and Applications

  36. NOC Practical • Ticket System - WebRT • Overview • Create tickets • As customer • As engineer • Review tickets as engineer • Take/Assign tickets NOC Services and Applications

  37. Exercises • Rows A to I become the NOC • Rows B to J become the customers • Customers send in fault notifications, automatically creating tickets • Engineers take/give tickets and resolve or escalate • Changeover … repeat • <during this, there are network failures that must be detected and fixed> NOC Services and Applications

  38. Exercise Customers B D F J H • Create tickets by sending in email to support@noc.ws.afnog.org • Receive updates on progress of ticket status • Receive notice that ticket has been closed when resolution is complete B Ticket Flow NOC A C G • Use Ticket System web interface http://noc.ws.afnog.org/cgi-bin/webrt.cgi • Assign tickets • Update tickets • Escalate tickets • Resolve tickets E I First Level 2nd Tier: Monitoring, NOC Services and Applications

  39. How do I manage my network? • Which tools should I use? What do I really need? • Keep it simple! • Need to consider engineers working remotely • Don’t want to spend too much time maintaining the tool (it should be helping you!) • Different tools for NOC and engineers • Different tools for statistics • RELIABILITY! NOC Services and Applications

  40. References • http://www.merit.edu/ipma/docs/isp.html • http://www.nanog.org • http://www.caida.org • http://www.nlanr.net • http://www.cisco.com • http://www.amazing.com/internet/ • http://www.isp-resource.com/ • http://www.merit.edu/ipma • http://www.ripe.net NOC Services and Applications

  41. More Tools! • http://www.caida.org/Tools/ • OC3Mon/Coral • http://www.merit.edu/~ipma • RouteTracker • IRRj • ASExplorer • http://www.geektools.com/ • http://www.merit.edu/ipma/tools/other.html NOC Services and Applications

  42. ASexplorer NOC Services and Applications

  43. Route Flap Stats NOC Services and Applications

  44. Looking Glass Tools • http://www.merit.edu/~ipma/tools/lookingglass.html route-views.oregon-ix.net>show ip bgp 35.0.0.0 BGP routing table entry for 35.0.0.0/8, version 56135569 Paths: (17 available, best #12) 11537 237 198.32.8.252 from 198.32.8.252 Origin incomplete, localpref 100, valid, external Community: 11537:900 11537:950 2914 5696 237 129.250.0.3 (inaccessible) from 129.250.0.3 Origin IGP, metric 0, localpref 100, valid, external Community: 2914:420 2914 5696 237 129.250.0.1 (inaccessible) from 129.250.0.1 Origin IGP, metric 0, localpref 100, valid, external Community: 2914:420 3561 237 237 237 204.70.4.89 from 204.70.4.89 Origin IGP, localpref 100, valid, external 267 1225 237 204.42.253.253 from 204.42.253.253 Origin IGP, localpref 100, valid, external Community: 267:1225 1225:237 NOC Services and Applications

  45. More Looking Glass Tools • Traceroute servers • http://www.merit.edu/ipma/tools/trace.html Query: trace Addr: www.isoc.org Translating "www.isoc.org"...domain server (206.205.242.132) [OK] Type escape sequence to abort. Tracing the route to info.isoc.org (198.6.250.9) 1 iad1-core2-fa5-0-0.atlas.digex.net (165.117.129.2) 0 msec 0 msec 4 msec 2 dca5-core2-s5-0-0.atlas.digex.net (165.117.53.41) 0 msec 4 msec 0 msec 3 dca5-core1-fa5-1-0.atlas.digex.net (165.117.56.117) 4 msec 0 msec 4 msec 4 Hssi3-1-0.BR1.DCA1.ALTER.NET (209.116.159.98) 0 msec 0 msec 4 msec 5 101.ATM2-0.XR1.DCA1.ALTER.NET (146.188.160.226) [AS 701] 4 msec 0 msec 4 msec 6 195.ATM7-0.XR1.TCO1.ALTER.NET (146.188.160.102) [AS 701] 4 msec 0 msec 0 msec 7 193.ATM8-0-0.GW1.TCO1.ALTER.NET (146.188.160.33) [AS 701] 4 msec 4 msec 4 msec 8 charlie.isoc.org (198.6.250.1) [AS 701] 8 msec 8 msec 8 msec 9 info.isoc.org (198.6.250.9) [AS 701] 8 msec * 12 msec NOC Services and Applications

  46. SNMP Tool references • MON - http://www.kernel.org/software/mon/ • NOCol - ftp://ftp.navya.com/pub/vikas/nocol.tar.gz • Sysmon - ftp://puck.nether.net/pub/jared • Rover - http://www.merit.edu/~rover • Concord - http://www.concord.com • http://www.merit.net/~netscarf NOC Services and Applications

More Related