150 likes | 325 Views
IT Monitoring WG IT/CS Monitoring System. Virginie Longo. September 14th 2011. Summary. CS Monitoring Systems Spectrum CA Performance Analysis Others Tools Data storage Requirements NMS Status Requirements Researches. CS Monitoring systems. Spectrum CA. Description:
E N D
IT Monitoring WGIT/CS Monitoring System Virginie Longo • September 14th 2011
Summary • CS Monitoring Systems • Spectrum CA • Performance Analysis • Others Tools • Data storage • Requirements • NMS Status • Requirements • Researches
Spectrum CA • Description: • Commercial Tool • Fault management oriented system • Root Cause Analysis/ alarm Correlation • Topology View • Service Manager => Relation With SLS View • Basic Performance manager • Volumes: • ~3000 devices monitored • Support 3K Laser devices for simple alarm (UP/DOWN) • Thousands of attributes polled and analyzed • 6GB of data events over 30 days • Monitoring Protocols: • SNMP and ICMP • Information only feed by SNMP (No remote agent) • Few other support : DNS / DHCP / TRACEROUTE /NTP /HTTP • Few home maid scripts for DHCP, web monitoring.
Alarm Monitoring Spectrum Architecture (Storage system) Mysql Events Spectrum DB Models , topology, current polling value ,alarms SNMP Remote Mysql Service Manager Alarm Notifier Devices Info SSLogger Oracle Alarm History (LANDB) Oracle Stats (CSR) SLS Non Spectrum system Spectrum System
Performance Analysis Statistics Architecture - Mix home maid system and Spectrum tool - Extraction data from Spectrum to Oracle DB - Data consolidation into RRD. - Displayed on Netstat website (PHP). Volumes: - ~9000 models (port + devices) for 24K of RRDs - 36 Metrics - 157 Attributes - ~160K entries load into Oracle DB for 5MN of poll - Data kept 1 months for oracle - 2 years of consolidated data in RRDs. Note : Metric is a group of attributes such as Bandwidth = in/out bits and in/out packets.
Other Tools Syslog event recording - Gathering all log from network devices - Stored into Oracle DB - Accessible from CSDB - Filtering and propagation by notification LHCOPN : Perfsonar Tool - Decentralized networks tool - OWD, latency and throughput regular test - Other tools like traceroute - LHCOPN network analysis Implementation ongoing, testing phase with 1BG link, security tests not complete yet. (www.perfosnar.net)
Data Storage Summary: • Spectrum proprietary DBs for core and alarms • Mysql database for events and service manager • Oracle database for stats (CSR) and alarm history (LANDB) • Oracle database for Syslog info • Standalone Mysql database for Perfsonar tools. • Too many different type of storage. • Missing correlation between Syslog and SNMP
NMS Status • Advantages : • Root cause analysis efficient • Correct Event- Alarm management • High availability • Really good topology views (useful for intervention group) • Support NICE users • Very good level of filtering (topology, alarms) • - Notification support • Negative points / Weakness • Expensive • Polling limitation is almost reached • (new version with complete redraw of polling system will arrive in 2 years) • Not a performance system: can’t handle 50K of statistics • Integration of non certificated manufacturer is complex • Data collection mostly limited to SNMP (changes ongoing)
Requirements • Mandatory: • Root Cause Analysis • High polling system :1-2mn for critical nodes 3-5mn for others • Network topology representation • Notifications (SMS/ MAIL/XMPP) and general console • Distributed environment • High Availability System • Complete performance management • IPv6 Support • Nice to have : • Autodiscovery system • Mobile version • Oracle centralized database • Numbers and storage time : • Polling capacity for at least 5K nodes • Performance statistics for 56K of ports • Data lifetime: 1 month without aggregation, max with aggregation • Devices Alarm: around 2 years
Researches • List of tools which fit better : • Icinga: Nagios like (forked) (Not Yet Tested) • Zabbix: Large polling scale, open source, notification, Oracle database, distributed (NYT) • (http://www.zabbix.com/features.php) • Solarwind: commercial but include performance and less expensive (NYT) • Opennms : • Open source - Completely customizable • High polling system with distributed environment • Events correlation, Alarm management, notification • Many data collection support (SNMP, HTML, JMX, JDBC, NAGIOS-NSCLIENT) • (http://www.opennms.org/about/) • Links : • http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems • http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html