290 likes | 502 Views
SURFnet6 Network Monitoring and Reporting. Hans Trompert, SURFnet. Information needs. Annual report. Information detail. Connected organizations. NOC / SURFnet / research. Monitoring versus Reporting. Monitoring real-time status alarms Reporting afterwards
E N D
SURFnet6 Network Monitoring and Reporting Hans Trompert, SURFnet
Information needs Annual report Information detail Connected organizations NOC / SURFnet / research
Monitoring versus Reporting • Monitoring • real-time • status • alarms • Reporting • afterwards • over a specific time period (day, week, month, year)
Avici SSR SURFnet6 operations Nortel ERS8600 Nortel OM5200 Real-time customer reporting Nortel OME6500 Nortel OME1060 Security Information source and destination
Reporting: SNMP metrics SNMP metrics: • Interface in/out octet counters • Interface in/out packet counters (unicast/broadcast/multicast) • Interface input/output errors • Interface availability • Temperature • Memory • CPU • Device uptime • and more …
Reporting: TL1 metrics TL1 metrics: • Input/Output Frames • Errored frames • Discarded frames • Transmit and receive power levels • Errored Seconds - number of seconds that have had CRC errors • Severely Errored Seconds - after 10 seconds of ES we start counting SES • UnAvailable Seconds - Seconds where we had no sync • and more …
Monitoring: SNMP traps SNMP traps • Fan • Temperature • Voltage • Link Up/Down • Bay Controller • Module • PIM + MSDP • BGP • VRRP • ISIS • and more …
Monitoring: TL1 events TL1 Events • Equipment • Circuit pack missing/mismatch/failed • Fan failed/missing • Power failure A or B • High temperature • Shelf • Software upgrade failed/mismatch/…. • Database integrity fail/restore in progress/… • Amplifier • input/output loss of signal • automatic shutoff • and many, many more
Internet Border router Amsterdam1 (SARA) Border router Amsterdam2 (TeleCity II) Core router Amsterdam1 (SARA) Core router Amsterdam2 (TeleCity II) Connected organizations SNMP based volume reporting • Total external traffic • Per traffic class (AMS-IX, Global, privat peers) • Per provider/peer • Total SURFnet internal traffic • Per connected organization
SURFnet external traffic volume • SURFnet external traffic volume • Ams-IX • Private peers (via Ams-IX), including: • Chello, Tiscali, @Home, Planet, XS4all • Garnier Projects, Abovenet , UUnet, Cogent • NREN • Geant2 • SINET • Abilene • Global • Global Crossing • Cable & Wireless
SURFstat: Real-time connected organization traffic volume reporting • Software • Net-SNMP • Python • RRDtool • Features • Easy administration by labeling connections with keywords in interface description on router • Different graph resolutions: day, week, month, year, decade • 1 minute measurement interval • Reports on • volume (bits in/out) • packets (unicast/multicast/broadcast)
Netflow – flow information • Netflow uses the common 5-tuple definition, where a flow is defined as a unidirectional sequence of packets all sharing all of the following 5 values: • Source IP address • Destination IP address • Source TCP port • Destination TCP port • IP protocol • Most common fields in Netflow record: • 5-tuple information • Input and output SNMP interface index • Timestamps for the flow start and finish time • Number of bytes and packets observed in the flow
Netflow – versions v1 First try v5 Most used version v6 Encapsulation information v7 Switch information v8 Several aggregation forms v9 Template Based, allowing many combinations, supports IPv6 IPFIX aka v10; IETF Standardized NetFlow 9 with Enterprise fields and other community input
Internet Border router Amsterdam1 (SARA) Border router Amsterdam2 (TeleCity II) Core router Amsterdam1 (SARA) Core router Amsterdam2 (TeleCity II) Connected organizations Netflow setup FLOWmon PeakFlow perfSONAR Fan out NFSEN test
Netflow applications • connected organizations: • FLOWmon detailed traffic reporting • SURFflow (Arbor Peakflow / NFSEN) suspicious traffic pattern reporting • SURFnet-CERT: • NFSEN suspicious traffic pattern reporting historical flow data queries profiles for custom reports • Geant2 JRA1 perfSONAR probes • Flow Subscription Measurement Point • Flow Selection and Aggregation Measurement Archive
FLOWmon Detailed traffic reporting: • total traffic • prefix-based flow grouping • reports on: • IP version (v4/v6) • IP protocol (TCP, UDP, ICMP, GRE, …) • TCP port (HTTP, SMTP, NNTP, FTP, SSH, …) • UDP port (domain, RTSP, VPN, …) • top N connected organizations • destination AS traffic
Connected organization to world traffic by TCP destination port
SURFflow Reports on suspicious traffic patterns like: • Unusual amount of flows DOS attack • Flows from one host to many ports on other host portscan • From 1 host to same port on many hosts break-in attempt making use of known bug • From many hosts to specific (set of) port(s) to many other hosts virus/worm • etc …
Active measurements: RTTPL Round Trip Time and Packet Loss monitoring • measurement probes throughout the network • central storage of results • active measurements by injecting ICMP echo request packets • measuring min/max/avg RTT and jitter • both IPv4 and IPv6 • both unicast and multicast (under development) • measuring packet loss • 20 pings per minute • report matrices per minute/hour/day/month • results between two probes in graphs
Active measurements: Connected organization availability • measuring availability by sending ICMP Echo Requests to connected organization router • measurement includes last mile to connected organization plus connected organization router port (unlike commercial providers) • Cisco routers with Service Assurance Agent software on both Amsterdam1 and Amsterdam2 • results stored in database and reported monthly • redundancy in measurements by ORing results from Amsterdam1 and Amsterdam2