1 / 55

Tomáš Podermański , tpoder@cis.vutbr.cz

Brno University of Technology CESNET z.s.p.o University Campus Network Monitoring in Everyday Life. Tomáš Podermański , tpoder@cis.vutbr.cz. Brno University of Technology. http://www.vutbr.cz One of the largest universities in the Czech Republic

ismail
Download Presentation

Tomáš Podermański , tpoder@cis.vutbr.cz

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brno University of TechnologyCESNET z.s.p.oUniversity Campus Network Monitoring in Everyday Life Tomáš Podermański, tpoder@cis.vutbr.cz

  2. Brno University of Technology • http://www.vutbr.cz • One of the largest universities in the Czech Republic • founded in 1899, 110th anniversary will be celebrated this year • 20,000 students and 2,000 employees • 9 faculties • 6 other organisation units • Student dormitory for 6,000 students

  3. VUT FP, FEKT, Kolejní 4 VUT Koleje, Kolejní 2 VUT FCH, FEKT, Purkyňova 118 VUT Koleje, Mánesova 12 VUT FEKT, Technická 8 VUT FIT, Božetechova 2 VUT FSI, Technická 2 AV VFU, Palackého 1/3 VUT TI, Technická 4 VUT Koleje, Purk. MU CESNET , Botanická 68a AV ČR UPT MZLU, Tauferova VUT, Kounicova 67a VUT Koleje , Kounicova 46/48 AV ČR UFM VUT Rektorát, Antonínská 1 VUT FAST, Veveří 95 VUT FaVU, Údolní 19 VUT , Gorkého 13 VUT FEKT Údolní 53 VUT FA, Poříčí 5 MU, Vinařská 5 AV ČR, Rybářská 13 VUT FaVU, Rybářská 13

  4. Physical Layer • 24 places connected to each other • Each place is connected at least from two directions (by separated cables) • Over 100 km of optical cables • Most of the cables are the property of the university • IPv4 layer • The network cores are based on Hewlett Packard • OSPF based routing • For multicast PIM SM and DM are used. • Most of the traffic is being transported thought this network • IPv6 layer • IPv6 functionality on HP devices available as beta release • Temporary solution based on 3com devices or PC routers with Xorp. • Dedicated IPv6 switch/router together with the main IPv4 switch/router. • For connections between IPv6 routers VLANs are used. • Temporary low cost solution until main devices will have full IPv6 support

  5. Basic monitoring, active vs. passive • Active monitoring • We sent a probe data and get a response • A probe of the device, network etc. • Passive monitoring • Observer of the device, network etc.

  6. Components in a Monitoring System Agent Agent Manager Agent Agent Agent

  7. Components in monitoring system • Agent and protocol • SNMP agent • Get, Set, Walk, Traps • NetFlow, SFlow, IPFIX probe • Accumulated statistics • For many systems specialized protocol based on the main system • Role of a cache on the agent • Active monitoring • We use an appropriate protocol or data depending on a monitored service • Proxy service (view from the other point) Agent Agent Manager Agent Agent Agent

  8. Components in Monitoring System • Manager & Frontend • Manager collects and proceses data from agents • Store and archive in datastore • SQL, RRD, … • User interface • Web, application • Reports, SLA, … • Configuration • Historical view • System of alerts • Email, SMS, phone call • The most popular systems • Zabbix, Nagios, OpenView, nfsen/dump, flowtools, rrdtool, mrtg, cacti, munin, … Agent Agent Manager Agent Agent Agent

  9. Quiz What causes the most of troubles in IT? • Power supply of systems • Overloaded circuits • Non managed UPS • Mess in eletricity instalations • Improperpower supply could be a booby trap • Cooling systems • Absence of a preventive monitoring • Frozen units • Jam by foliage • …

  10. Physical infrastructure LAYER 0,1

  11. Power Supply with 1 + 1 Redundancy PDU I PDU II UPS II ATS UPS I 2x 16A

  12. Power Supply with 1 + 1 Redundancy PDU I PDU II Load, voltage Load, voltage on source 1, voltage on source 2, Selected source UPS II ATS UPS I Load, Input voltage, output voltage, battery status 2x 16A

  13. power system with 1 + 1 redundancy ATS UPS 2x 16A

  14. power system with 1 + 1 redundancy Load, current Input voltage, output voltage, battery status ATS UPS Load, current voltage on source 1, voltage on source 2, Selected source 2x 16A

  15. power system with 1 + 1 redundancy ATS Overloaded circuit tripped circuit breaker UPS 2x 16A

  16. power system with 1 + 1 redundancy When the power goes up again... in a few minutes UPS is low ATS UPS 2x 16A Second circuit is overloaded tripped circuit breaker

  17. Cooling Systems In many cases a cooling system is a part of the building. Majority of cooling systems are difficult to monitor. Some devices have a support, but it costs a lot of money. In many cases monitoring is more expensive than the cooling device. There is no standard interface (RS485 with a closed protocol). Some devices have a binary output which indicates both error and running state (via relay) Possible conversion to SNMP Another and the easiest solution -> monitoring of temperaturein a communication room. Thermometer with a SNMP output. LonWorks Monitoring system Unit status/SNMP Temperatue/SNMP

  18. Monitoring in Data Center Rooms More complex eletrical installation Having UPS and ATS in every rack is ineffective Devices with a 3-phase power Circuits are divided to 3 groups (direct, genset, UPS) More detailed information about the eletricity distribution is very useful. It is necessary to monitor whether phases are balanced Genset could break down

  19. Power in Data Center Rooms Main power A Devices in racks V V ATS Genset A A Bypass HVAC A V UPS

  20. temperature in datacenter

  21. temperature in datacenter

  22. Server Monitoring Hardware Manufacturers’ software support is required (Dell OpenManage, HP InsightControl, …) Chassis temperature Fan condition Power status Operating system CPU, Load, Memory, Utilization, process Disk subsystem External disk array with own management port Raid status Disk condition (S.M.A.R.T.) SNMP Monitoring system IPMI Other

  23. Network Device Monitoring Hardware Chassis temperature Fan condition Power status State of the operating system CPU Load Memory Monitoring system SNMP

  24. Network Connection – L1 Monitoring Port status Link UP/DOWN Speed Errors on interfaces Traffic on interfaces Remote device status LLDP + data from MIB Remote interface, remote device, …

  25. Link LAYER 2

  26. Network Connection – L2 Monitoring L2 monitoring L2 ping could be very useful We have to use information obtained from other layers (L1,L3) Unfortunately, there is no simple possibility to check connectivity on a single VLAN One option is to obtain some information from MIB, but it’s not sufficient SPT/MSPT information, root bridge VLAN on interfaces

  27. Network Connection – L3 monitoring L3 monitoring ICMP and PING are still the most important The problem is how to monitor broken paths (routing protocol usually covers any problem) Check of the routing protocol state ICMP using the source routing Flow based monitoring Multicast monitoring 147.229.6.1 147.229.6.2 Data

  28. Network Connection – L3 monitoring L3 monitoring Checking the a router having the proper neighbor OSPF-MIB RFC-4750 ospfNbrRtrId VRRP-MIB RFC-2787 vrrpOperAdminState, vrrpOperState, vrrpOperMasterIpAddr Master BDR DR Backup

  29. Multicast Monitoring Quite demanding task For each stream the <S,G> path has to be created Continuously received and transmitted stream doesn’t have to discover problem on the RP Almost impossible to monitor local infrastructure The only one known tool – Multicast Beacon Written in perl Dead project Last release 2006 Without VLAN support or support for multiple interfaces on a single host Homepage unavailable Own solution : mcwatch

  30. Multicast Agents Data is periodically sent to a server

  31. Multicast Agent VLAN POSIX SOCKET APPLICATION Multicast Beacon

  32. Multicast Agent VLAN POSIX SOCKET APPLICATION mcwatch

  33. NetFlow Monitoring • Two NetFlow probes see on both external connectivity lines • NetFlow probes connected directly to optical fiber via TAP • Wire speed accelerated probes (FlowMon). CESNET PoP CRS-1/16 University network 10G Ethernet

  34. Flow Processing Two NetFlow probes see on both external connectivity lines NetFlow probes connected directly to optical fiber via TAP Wire speed accelerated probes (FlowMon). Nfcapd All administrators Datastore SQL aggregated Backbone administrator

  35. Flow Processing Data are stored on a storage server • Data are kept for 30 days • Analysis of security incidents, statistical proposes • Big deal – how to get/select useful data and provide them to people who need them. • Security matter • Full data are accessible only for small and trustful group of administrators • For other IT staff (faculty administrators, IT managers) summarised data are accessible via a web interface. • Data are processed by common open source tools: • nfdump • A lot of troubles, but we don’t have any better solution • We are trying to do any optimalisation into the current impelentations • Several theses on this topic is in process • Commercial tools - situation is not better • Usually plenty of nice charts and statistics • But performance is often terrible (sampling is required)

  36. Transport, application and the others LAYER 4-7

  37. Layer 7 • Many own plugins • Eduroam/radius monitoring • DNS • Database status • Backup server status • …. • Collected data and avilable for administrators on different level • Eduroam/Radius logs • Maillogs (DNSBL, spam clasification, statistics) • WiFi/VPN connections • ….

  38. Components in the Monitoring System zabbix SNMP Zabbix Spinel SNMP xwho, xhis radius mysql icmp snmp xmon NetIs wifilogs millogs radiuslogs honeypots incidents … aggflow nfdump netflow

  39. Monitoring : Layers & Technology zabbix SNMP, zabbix, NetFlow, radius, ICMP, ICMPv6, Spinel, … Power, Cooling systems, Temperature Server and disk arrays Network devices Physical xwho, xhis Port statistics, link status, number of errors LLDP neighbour Link ICMP tests using source routing option OSPF, VRRP peers Multicast traffic monitoring Internet NetIs Application Radius, DNS Other services nfdump

  40. Actuall problems • SNMP protocol • No alternative • Many bugs in various implementations • Absence of the L2 testing tool • Netflow • We have plenty of the data but nobody knows how to process it in the effective way • In some cases the more detailed information is required than Flow • IPv6 brings some new problems and challenges

More Related