680 likes | 763 Views
HP-UX Monitoring Standard. B&DO IES. HP-UX Monitoring Standard. General HP O pen V iew O perations 6.x (OVO 6) Standard all over Europe Trade and HP Account Mainly based on WW standard tools (known as GII) Divided into default and additional tools
E N D
HP-UX Monitoring Standard B&DO IES
HP-UX Monitoring Standard General • HP OpenViewOperations 6.x (OVO 6) • Standard all over Europe • Trade and HP Account • Mainly based on WW standard tools (known as GII) • Divided into default and additional tools • Also available or in preparation for other UX-platforms (solaris, AIX, linux, Tru64, …) not yet ready and not part of this training B&DO IES Event Management
HP-UX Monitoring Standard Contents (Default Tools) Area Tool/Template • Processes ps_mon • File Systems df_mon • LVM vol_mon • Printing (lp only) lp_mon • Kernel-messages dmsg_mon • syslog-messages syslog B&DO IES Event Management
HP-UX Monitoring Standard Contents (Additional Tools Operating System) Area Tool/Template • serviceguard sg_mon • NFS nfs_mon • swapspace swap_mon • cron cronlog, cron_mon • mail mailqueue, mail.log • su sulog • login attempts btmp • system startup rc.log • disk arrays disk_array • housekeeping housekeep B&DO IES Event Management
HP-UX Monitoring Standard General mechanisms • in most cases: scripts/binaries triggered by OVO • local configuration possible • local configuration preferred against default configuration • other cases: default logfiles read by OVO logfile encapsulator B&DO IES Event Management
HP-UX Monitoring Standard General mechanisms (cont’d) File locations: • executables /var/opt/OV/bin/OpC/cmds or possibly/var/opt/OV/bin/OpC/monitor • logfiles /var/opt/OV/log/OpC • default configuration /var/opt/OV/bin/OpC/cmds • local configuration /var/opt/OV/conf/OpC • temporary files /var/opt/OV/tmp/OpC B&DO IES Event Management
HP-UX Monitoring Standard General mechanisms (cont’d) In some cases (ps-mon, df_mon, swap_mon) processing of message can be configured: Possibility to choose if message will be • routed to the troubleticket-interface (EWM) • sent out to the notification interface (JET) • sent to both • kept on the browser B&DO IES Event Management
HP-UX Monitoring Standard process-monitoring with ps_mon B&DO IES Event Management
HP-UX Monitoring Standard • Process monitoring: ps_mon • Binary, triggered every 15 minutes by OVO • checks the following: • process not running • too few instances of process running • too many instances of process running • cpu utilisation of process • size of process • configuration required • default cfg-file: /var/opt/OV/bin/OpC/cmds/ps_mon.cfg • local cfg-file: /var/opt/OV/conf/OpC/ps_mon.cfg B&DO IES Event Management
HP-UX Monitoring Standard ps_mon: configuration ############################################################################# # File: ps_mon.cfg # Description: The ps_mon Configuration file # Package : Concorde - CONC_UNIX # Version: A.01.02 # Syntax: # <Process name> <Severity> <Instances> [<Group/Appl. Name> [<Schedule>]] # [; <Mode> [<Arg_string>; <Cmd_string>]] # [*PINFO <Max %CPU> <Max Size>] # [*ACTION <Command String>] # # <Schedule> = <start time>-<end time> <day of week>[,<day of week>] # [<Schedule>] # <Mode> = c|v|o|n (command | verbatim | option | none) # # Note 1: Some processes change names after they are invoked so be sure # to use the name as listed by "ps -ef" (on HPUX systems). # # Note 2: The "*PINFO" and "Mode" parameters are only available on HP-UX. # ############################################################################# B&DO IES Event Management
HP-UX Monitoring Standard ps_mon: configuration (cont’d) # # Examples: # # Check if exactly one example process is running with cmd option. # # example warning 1; o cmd; cmd # # Check if exactly one example process is running with cmd1 cmd2 options where # where cmd1 takes two args. # # example warning 1; o cmd1 arg arg cmd2; cmd1 cmd2 # # or equivalently. # # example warning 1; o cmd1 arg arg cmd2; cmd2 cmd1 # example warning 1; o cmd2 cmd1 arg arg; cmd1 cmd2 # example warning 1; o cmd2 cmd1 arg arg; cmd2 cmd1 # B&DO IES Event Management
HP-UX Monitoring Standard ps_mon: configuration (cont’d) # For backwards compatibility, if cmd options are prefixed by '-', i.e. "-D", or "-E" # then <Cmd_string> does not have to be specified. For example, the following two # statements are equivalent. # # (1) example warning 1; o -D arg -E # (2) example warning 1; o -D arg -E; -D -E # # Statement (1) and (2) are both legitimate; however, the syntactic form of statement # (1) is defunct. To ensure compatibilty with future versions of ps_mon, please # follow the syntactic format of statement (2). Again, note that statements of the # form in (1) applies *only* when cmd options are prefixed by '-', whereas all # statements of the form in (2) will work. # # Check if netscape process size is more than 1000K. Do not care about the # actual number of processes. # # netscape warning - # *PINFO - 1000 # # Check if at least one sendmail process is running. If not start it. # # sendmail major 1- # *ACTION /sbin/init.d/sendmail start # ############################################################################# B&DO IES Event Management
HP-UX Monitoring Standard ps_mon: configuration (cont’d) check if process midaemon is running in exactly one instance, Monday till Friday from 06:00 till 22:00; if not send a warning message with the object MWA. midaemon warning 1 MWA 0600-2200 1,2,3,4,5 check if process sendmail is running in at least one instance; if not, restart it. sendmail major 1- *ACTION /sbin/init.d/sendmail start Caution: There will be a message raised that the process is not running but in the message you won’t see that the process was restarted! For this, you need to check the ps_mon.log manually (var/opt/OV/log/OpC/ps_mon.log) B&DO IES Event Management
HP-UX Monitoring Standard ps_mon: configuration (cont’d) Special feature: processing of the message: • adding a special prefix to the <group> - parameter causes special treatment of the message: • TT_<group> message will go to the TroubleTicket-interface (EWM) • N_<group> message will be sent to the notification interface (JET) • TN_<group> or NT_<group> message will be sent to both (EWM & JET) Parameters of messages : • Application: HPUX_ps_mon • Message Group: Job • Object: <group> (the parameter specified in the cfg-file will be used) Make sure that appropriate mappings are setup in EWM or JET (or both) ! IMPORTANT: The default configuration file contains NO entries! Current version of ps_mon: 1.5.1.9 B&DO IES Event Management
HP-UX Monitoring Standard ps_mon: configuration testing ps_mon [-s <sleep interval>] [-f <configurationfile>] [-l <logfile>] [-d [<debugfile>]] [-w <waittime>] [-t] [-g] [-q <quiet period>] -s <sleep interval> run program in daemon mode with time in minutes between checking. To specify time in seconds append ‘s’ (e.g. –s 10s) -f <configurationfile> optional path to the configuration file. If omitted, default configuration file is used (/var/opt/OV/bin/OpC/cmds/ps_mon.cfg) -l <logfile> optional path to the logfile. If omitted, default logfile will be used (/var/opt/OV/log/OpC/ps_mon.log) -d [<debugfile>] run in debug mode and optionally write debug message to file. If <debugfile> does not exist, output goes to stderr. -w <waittime> minutes to wait at startup before beginning to monitor -t prefixes event messages with timestamp -g group event messages. Messages with the same group and event condition are combined into one message -q <quiet period> If a process is not running and an autoaction is specified, then the autoaction is given a chance to run. <quiet period> is the number of seconds (5-120) to wait for the autoaction to finish successfully before an error will be reported IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management
HP-UX Monitoring Standard diskspace-monitoring with df_mon B&DO IES Event Management
HP-UX Monitoring Standard • Diskspace monitoring: df_mon • Shellscript, triggered every 15 minutes by OVO • checks the following: • percent of disk space utilization • percent of inode utilization • configuration required • default cfg-file: /var/opt/OV/bin/OpC/cmds/df_mon.cfg • local cfg-file: /var/opt/OV/conf/OpC/df_mon.cfg B&DO IES Event Management
HP-UX Monitoring Standard df_mon: configuration ############################################################################# # File: df_mon.cfg # Description: The Diskspace Monitor Configuration file # Package : Concorde - UXSM # @(#) $Header: /Concorde/uxmon/df_mon.cfg 1.4 2002/02/09 00:22:06 skwok Exp $ # Version: A.01.00 # # <Filesystem> exclude # <Filesystem> <space> [<inode> [<Severity> [<Schedule>]]] [ ; <group>] # [*ACTION <action>] # # <space> = space utilization threshold in percents (decimal value) # <inode> = inode utilization threshold in percents (decimal value) # Note: Monitoring of INODES is not supported on SunOS # <Severity> = warning|minor|major|critical # <Schedule> = hhmm-hhmm [<Daylist> [<Schedule>]] # <Daylist> = n[,<Daylist>] | * # where n represents day of a week starting with # Sunday=0 and Saturday=6; # * means all days # <group> = group name to associate with event - note the space # before the ";" that must be there to separate it from # the preceding fields Ex: "* 95 95 warning ; support_grp" # # <action> = action(program) to be called af event occurs B&DO IES Event Management
HP-UX Monitoring Standard df_mon: configuration (cont’d) # # Use '-' in place of <space> or <inode> to skip threshold parameter # If parameter is not specified the checking of that value is skipped. # # Use '*' in place of <Filesystem> to specify ALL filesystems # ############################################################################# /var 80 99 major /var 97 99 critical /usr 80 99 warning /tmp 85 99 warning /tmp 95 99 critical * 95 95 warning * 99 99 critical ############################################################################# # end of df_mon.cfg ############################################################################# B&DO IES Event Management
HP-UX Monitoring Standard df_mon: configuration (cont’d) Special feature: processing of the message • adding a special prefix to the <group> - parameter causes special treatment of the message: • TT_<group> message will go to the TroubleTicket-interface (EWM) • N_<group> message will be sent to the notification interface (JET) • TN_<group> or NT_<group> message will be sent to both (EWM & JET) Parameters of messages : • Application: HPUX_df_mon • Message Group: OS • Object: <group> (the parameter specified in the cfg-file will be used) Make sure that appropriate mappings are setup in EWM or JET (or both) ! Current script-version: 1.22 B&DO IES Event Management
HP-UX Monitoring Standard df_mon: configuration testing (usage) df_mon.sh [-t] [-a] [-f <config file>] [-l <logfile>] -t append timestamp at the beginning of each message -a append actual value to event message -f <configuration file> use <configuration file> as configuration file instead of default (/var/opt/OV/bin/OpC/cmds/df_mon.cfg) -l <logfile> write event messages into <logfile> instead to a standard output. IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management
HP-UX Monitoring Standard volume-monitoring with vol_mon B&DO IES Event Management
HP-UX Monitoring Standard • HP-UX volume monitor: vol_mon.sh • Triggered every 15 minutes by OVO • Checks • Logical volumes for stale extents • Volume groups status • Mount information (by default if all volumes specified in /etc/fstab are mounted). • local configuration possible B&DO IES Event Management
HP-UX Monitoring Standard • vol_mon: configuration: ############################################################################# # File: vol_mon.cfg # Description: The Volume Monitor Configuration file # Package : Concorde - UXSM # @(#) $Header: /Concorde/uxmon/vol_mon.cfg 1.2 2002/09/16 22:13:47 skwok Exp $ # Version: A.01.00 # # <Filesystem> # # This configuration file for volume monitor is optional and only useful for # specifying file systems which should be mounted but don't appear in the # file /etc/fstab (e.g. ServiceGuard file systems). The configuration file # consists of one-line entries each specifying a separate filesystem. # # Blank lines and comment lines beginning with a "#" are ignored. Also, # extra fields after the filesystem entry on the same line are ignored. # This is useful for specifying the disk space monitoring configuration file # (df_mon.cfg) as the configuration file for volume monitor. # ############################################################################# ############################################################################# # end of vol_mon.cfg ############################################################################# B&DO IES Event Management
HP-UX Monitoring Standard • vol_mon: configuration testing vol_mon.sh [-v] [-c <configfile>] [-l <logfile>] -c <configfile> use <configfile> as configuration file -l <logfile> write event messages into <logfile> instead to a stdout -v verbose mode IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management
HP-UX Monitoring Standard lp print monitoring with lp_mon B&DO IES Event Management
HP-UX Monitoring Standard • HP-UX lp print monitor: lp_mon.sh • Triggered every 15 minutes by OVO • Needs binary lpinfo • Checks • printer queue length • time of print request in queue • active time of print request • status of spooler and printer • local configuration possible and required • default cfg-file: /var/opt/OV/bin/OpC/cmds/lp_mon.cfg • local cfg-file: /var/opt/OV/conf/OpC/lp_mon.cfg B&DO IES Event Management
HP-UX Monitoring Standard • lp_mon: configuration #################################################################################### # File: lp_mon.cfg # Description: The Lp Monitor Configuration File # Package: Concorde - UXSM # Version: A.01.00 # Syntax: # lpsched_check=YES|NO|AUTO [lpsched_options=<lpsched_options>][;<time_schedule>] # exclude <printer>[,<printer>...] # <printer> [queue_length=<#requests>] [request_age=<#days>]\ # [active_time=<#min>] [disable_check=YES|NO|AUTO]\ # [reject_check=YES|NO|AUTO] [phantom_check=YES|NO|AUTO]\ # [;<time_schedule>] # # - queue_length - max number of pending requests # - request_age - max age of print request # - active_time - max active time of print request # - disable_check - check if printer is disabled # - reject_check - check if printer is rejecting print request # - phantom_check - look for phantom print request #################################################################################### lpsched_check=YES #* queue_length=50 request_age=7 active_time=10\ #disable_check=YES reject_check=YES phantom_check=AUTO #################################################################################### # end of lp_mon.cfg #################################################################################### B&DO IES Event Management
HP-UX Monitoring Standard • lp_mon: configuration testing lp_mon.sh [-f <configfile>] [-l <logfile>] -f <configfile> use <configfile> instead of default (/var/opt/OV/bin/OpC/cmds/lp_mon.cfg) -l <logfile> write event messages to <logfile> instead to a stdout. IMPORTANT: Never use the normal logfile encapsulated by ITO for testing! This would cause unnecessary messages in the browser. B&DO IES Event Management
HP-UX Monitoring Standard monitoring of kernelmessages with dmsg_mon B&DO IES Event Management
HP-UX Monitoring Standard • HP-UX kernel messages monitor: dmsg_mon.sh • Triggered every 5 minutes by OVO • Checks • output of “dmesg –” • local configuration for unwanted messages • requires ongoing template review process with each OS-patch applied • local configuration possible • global configuration via template B&DO IES Event Management
HP-UX Monitoring Standard dmsg_mon.sh: configuration ############################################################################### # # File: dmesg_mon.cfg # Description: strings listed here don't generate an ITO message for dmesg # Syntax: just list the strings, one line for each # !!! all dmesg lines matching one of the listed strings # are taken out of monitoring !!! # # Example: # # hardware path # # If the string "hardware path" is listed, all dmesg lines matching (containing) # the string "hardware path" are ignored for monitoring purposes. # Still, the dmesg history contains these lines, but no message is generated. # ############################################################################### ############################################################################### # End of dmesg_mon.cfg ############################################################################### B&DO IES Event Management
HP-UX Monitoring Standard dmsg_mon.sh: configuration testing Basically no “real” testing possible; every run of “dmesg –” will set a new pointer to the dmesg-output check the files: /var/opt/OV/log/OpC/dmsg_mon.hist history of ALL output /var/opt/OV/log/OpC/dmsg_mon.log logfile read by ITO /var/opt/OV/log/OpC/dmsg_mon.tmp normally empty B&DO IES Event Management
HP-UX Monitoring Standard dmsg_mon.sh: ongoing maintenance • every new OS-release, HW-patch etc. causes new output • new messages come with the prefix “DMESG-UNCLASSIFIED:” • Regular reports are crosschecked with UX-PE to classify these messages (match or suppress) B&DO IES Event Management
HP-UX Monitoring Standard monitoring of kerneltables with kts_mon B&DO IES Event Management
HP-UX Monitoring Standard • HP-UX kernel tables monitor: kts_mon.sh • Triggered every 15 minutes by OVO • Checks • nproc over threshold • ninode over threshold • nfile over threshold • local configuration possible (thresholds in percent) B&DO IES Event Management
HP-UX Monitoring Standard kts_mon: configuration ############################################################################# # File: kts_mon.cfg # Description: nfile, nproc, ninode Monitor Configuration file # Package : Concorde - UXSM # Version: A.01.00 # # <PARAMETER> <space> # # <PARAMETER> THRESH_NP nproc # THRESH_NI ninode # THRESH_NF nfile # <space> space utilization threshold in percents (decimal value) # ############################################################################# THRESH_NP=70 THRESH_NI=101 THRESH_NF=70 ############################################################################# # end of kts_mon.cfg ############################################################################# B&DO IES Event Management
HP-UX Monitoring Standard kts_mon: configuration testing kts_mon.sh [ -f <config-file> ] configuration file must be executable for root (at least) execution of kts_mon.sh will write ALWAYS in default logfile! B&DO IES Event Management
HP-UX Monitoring Standard monitoring of syslog B&DO IES Event Management
HP-UX Monitoring Standard • HP-UX syslog monitoring • encapsulates /var/adm/syslog/syslog.log • polling interval 30s • no local configuration • enhancement requests ( unwanted messages) to TEG monitoring B&DO IES Event Management
HP-UX Monitoring Standard optional monitors B&DO IES Event Management
HP-UX Monitoring Standard This chapter includes all standard monitoring solutions that are not applicable to all systems and therefore not part of the default monitoring • serviceguard • swapspace • security • cron • mail • system startup • NFS • Disk array • Housekeeping B&DO IES Event Management
HP-UX Monitoring Standard monitoring of service guard B&DO IES Event Management
HP-UX Monitoring Standard • HP-UX service guard monitor: sg_mon.ksh • Triggered every 15 minutes by OVO • uses output of cmviewcl • checks: • package switching enabled ? • package running (where / at all) ? • nodes active ? • Network available ? • local configuration possible and mandatory (default configuration file empty) • cc_mon may use same logfile if configured B&DO IES Event Management
HP-UX Monitoring Standard • sg_mon : configuration ############################################################################# # File: sg_mon.cfg # Description: Check service guard Package monitoring script # Package : Concorde - UXSM # Version: A.01.00 # # Description of parameters # ------------------------------------------------------------------- # PKG[0]=xxx Package name 1 # PKG_NODE[0]=yyy Primary node on which the pkg must run # PKG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED # Set to 0 if Package_switching must not be ENABLED # ############################################################################# #PKG[0]=xxx; PKG_NODE[0]=yyy; PKG_SWTCH[0]=0 #PKG[1]=zzz; PKG_NODE[1]=xyz; PKG_SWTCH[1]=1 ############################################################################# # end of sg_mon.cfg ############################################################################# B&DO IES Event Management
HP-UX Monitoring Standard monitoring of swap space B&DO IES Event Management
HP-UX Monitoring Standard • HP-UX swapspace monitor: swap_mon.sh • Triggered every 15 minutes by OVO • checks for total swapspace used • different severities for different usage levels possible • local configuration possible and mandatory (default configuration 90%) • configuration of message-processing possible • /var/opt/OV/bin/OpC/cmds/swap_mon.sh B&DO IES Event Management
HP-UX Monitoring Standard • swap_mon: configuration ######################################################################## # Config file for swap_mon.sh ######################################################################## # All lines which start with a hash-sign (#) will be ignored # the whole config file is case-insensitive # # total <percent_used> <severity> <alert> [<from-to> [<days>]] # # Every config line must start with "total" because every check is # performed on the totally free space # The percent used must be between 0 and 100. # Possible severities are: warning, major, critical # possible alert types (processing of the message): # B -> Browser # N -> Browser+Notification # T -> Browser+Trouble Ticket # NT -> Browser+Notification+Trouble Ticket # <from-to>: 24h-format, 0000-2400 as default (if nothing else configured) # <days>: 0=Sunday, 6=Saturday. values to be separated by “,” or “-” # (e.g. "1,3,4" -> Monday, Wednesday, Thursday) or ("2-4" -> from Tuesday until Thursday). ################################################################################ # Example configuration ############################################################################### #total percent_used severity Alert FROM-TO Days total 90 major T 0000-2400 * B&DO IES Event Management
HP-UX Monitoring Standard • swap_mon: configuration (cont’d) special feature: processing of messages can be configured Alert type • B message appears only in browser • N notification using JET • T troubleticket in EWM will be created • TN JET & EWM appropriate mapping in EWM or/and JET required Parameters: • Application HPUX_swap • Msg-Group OS • Object swap B&DO IES Event Management
HP-UX Monitoring Standard security (bad login attempts, sulog) B&DO IES Event Management