Nagios

Nagios System monitoring, the easy way

What is Nagios • Nagios watches your computers through user-defined commands • It can be set to inform you when a service or host becomes unavailable • In fact, it can inform you, the Sysadmin, your best friend, and even run commands to try to bring a system back up

Nagios config • The main configuration file is “nagios.cfg” in /etc • cfg_file=/etc/contactgroups.cfg • cfg_file=/etc/contacts.cfg • cfg_file=/etc/dependencies.cfg • cfg_file=/etc/escalations.cfg • cfg_file=/etc/hostgroups.cfg • cfg_file=/etc/hosts.cfg • cfg_file=/etc/services.cfg • cfg_file=/etc/timeperiods.cfg • These are much like #include statements, allowing you to structure your files.

Nagios.cfg • There are a number of other controls for nagios, set through flags. • These are beyond the scope of my presentation • Next, we must set up a plan for what nagios will monitor

Monitoring plan • Our main server hosts various services: • Mail • DNS • DHCP • Our second server hosts: • DNS slave • WWW – apache • NFS shares

Hosts.cfg define host{ use generic-host ; Name of host template host_name server1 ; name of computer alias server1.localdomain ; canonical name address 10.0.0.1 ; ip address check_command check-host-alive ; defined in commands.cfg max_check_attempts 10 ; used when check fails notification_interval 60 ; how long between notification events notification_period 24x7 ; defined in timeperiods.cfg notification_options d,u,r ; } • Note that the services are not checked in this file. • When the check command fails, the services associated are not checked

Services.cfg define service{ use generic-service ; template host_name server1 ; defined in hosts.cfg service_description PING ; is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups peoplewhocare ;defined in contactgroups notification_interval 60 notification_period 24x7 notification_options c,r check_command check_ping!100.0,20%!500.0,60% } • This pings the server, and notifies if the ping fails

Commands • Installed with nagios, have various formats. • When they return a failure, nagios marks that against the check attempts.

Contacts.cfg define contact{ contact_name nagios alias Nagios Admin service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email nagios-admin@server1.localdomain pager pagenagios-admin@server1.localdomain } • Note: a contact can have different notification than a host • It may be a good idea to have email go to an outside address

Contactgroups.cfg define contactgroup{ contactgroup_name crit-admin alias critical services members Root } define contactgroup{ contactgroup_name peoplewhocare alias minions members nagios ; defined in contacts } • A host refers a contactgroup, which contains contacts who get notified according to their notification call

Timeperiods.cfg define timeperiod{ timeperiod_name workhours alias "Normal" Working Hours monday 08:00-17:00 tuesday 08:00-17:00 wednesday 08:00-17:00 thursday 08:00-17:00 friday 08:00-17:00 } • This allows for anouncement only during certain times. • Maybe you don’t want your pager going off at night?

Awesome tactics • Oh noes, the service is down! • So, try to stop and start it • Then get a person involved • Perhaps we have something like snort that should signal • We can look at the signal with a script run by nagios, which can then signal with the nagios method

Thus ends the Nagios Brief • Everyone go back to their stuff which is not paying attention

Nagios

Nagios

Presentation Transcript

Pretty Nagios Charts

The Nagios light-bar

NAGIOS

Nagios Network Monitoring

Nagios Integration

Nagios for Grid Services

Nagios XI 2012

NOC TOOLS nagios

NAGIOS

Nagios at UCAR

NAGIOS

Nagios Grid Monitor

REGIONAL NAGIOS

Nagios on Tier1 farm

Nagios Demonstration

Nagios XI 2012

LCGDM Nagios Probes