140 likes | 469 Views
Nagios. System monitoring, the easy way. What is Nagios. Nagios watches your computers through user-defined commands It can be set to inform you when a service or host becomes unavailable
E N D
Nagios System monitoring, the easy way
What is Nagios • Nagios watches your computers through user-defined commands • It can be set to inform you when a service or host becomes unavailable • In fact, it can inform you, the Sysadmin, your best friend, and even run commands to try to bring a system back up
Nagios config • The main configuration file is “nagios.cfg” in /etc • cfg_file=/etc/contactgroups.cfg • cfg_file=/etc/contacts.cfg • cfg_file=/etc/dependencies.cfg • cfg_file=/etc/escalations.cfg • cfg_file=/etc/hostgroups.cfg • cfg_file=/etc/hosts.cfg • cfg_file=/etc/services.cfg • cfg_file=/etc/timeperiods.cfg • These are much like #include statements, allowing you to structure your files.
Nagios.cfg • There are a number of other controls for nagios, set through flags. • These are beyond the scope of my presentation • Next, we must set up a plan for what nagios will monitor
Monitoring plan • Our main server hosts various services: • Mail • DNS • DHCP • Our second server hosts: • DNS slave • WWW – apache • NFS shares
Hosts.cfg define host{ use generic-host ; Name of host template host_name server1 ; name of computer alias server1.localdomain ; canonical name address 10.0.0.1 ; ip address check_command check-host-alive ; defined in commands.cfg max_check_attempts 10 ; used when check fails notification_interval 60 ; how long between notification events notification_period 24x7 ; defined in timeperiods.cfg notification_options d,u,r ; } • Note that the services are not checked in this file. • When the check command fails, the services associated are not checked
Services.cfg define service{ use generic-service ; template host_name server1 ; defined in hosts.cfg service_description PING ; is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups peoplewhocare ;defined in contactgroups notification_interval 60 notification_period 24x7 notification_options c,r check_command check_ping!100.0,20%!500.0,60% } • This pings the server, and notifies if the ping fails
Commands • Installed with nagios, have various formats. • When they return a failure, nagios marks that against the check attempts.
Contacts.cfg define contact{ contact_name nagios alias Nagios Admin service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email nagios-admin@server1.localdomain pager pagenagios-admin@server1.localdomain } • Note: a contact can have different notification than a host • It may be a good idea to have email go to an outside address
Contactgroups.cfg define contactgroup{ contactgroup_name crit-admin alias critical services members Root } define contactgroup{ contactgroup_name peoplewhocare alias minions members nagios ; defined in contacts } • A host refers a contactgroup, which contains contacts who get notified according to their notification call
Timeperiods.cfg define timeperiod{ timeperiod_name workhours alias "Normal" Working Hours monday 08:00-17:00 tuesday 08:00-17:00 wednesday 08:00-17:00 thursday 08:00-17:00 friday 08:00-17:00 } • This allows for anouncement only during certain times. • Maybe you don’t want your pager going off at night?
Awesome tactics • Oh noes, the service is down! • So, try to stop and start it • Then get a person involved • Perhaps we have something like snort that should signal • We can look at the signal with a script run by nagios, which can then signal with the nagios method
Thus ends the Nagios Brief • Everyone go back to their stuff which is not paying attention