370 likes | 570 Views
ITM – Monitoring Resources Using Remote Agentless Technology. Scott Wallace October 20, 2009. Agenda. Overview Planning Installation / Configuration Usage Tips Troubleshooting Wrap Up. Overview. Agentless monitoring allows you to oversee the IT environment from a set of remote servers.
E N D
ITM – Monitoring Resources Using Remote Agentless Technology Scott Wallace October 20, 2009
Agenda • Overview • Planning • Installation / Configuration • Usage Tips • Troubleshooting • Wrap Up
Overview • Agentless monitoring allows you to oversee the IT environment from a set of remote servers. • Some agents already had remote capabilities • VMware VI and SAP • Introduced initially as an offering on OPAL • Agentless Operating System monitoring added as a product in ITM 6.2.1 • Provides operating system monitoring for AIX, HPUX, Linux, Solaris and Windows • Monitors using multiple mechanisms: CIM, SNMP, WMI
Agent or Agentless • Agent-based technology resides directly on a managed server and collects data based on policy set locally or by the management server • Agentless technology resides primarily on a management server and gets its data via a remote application programming interface (API) • “Agentless” doesn’t mean nothing is present or running. Some basic operating system function or base application function is running to provide the information as requested over the network • Resources are still being used and services need to be running on the server • Examples include SNMP, CIM, WMI
Agentless OS Monitoring Metric Overview • Key Operating System Metrics Returned • Logical and Physical Disk Utilization • Network Utilization • Vertical and Physical Memory • System Level Information • Aggregate Processor Utilization • Process Availability • Default Situations for • Disk Utilization • Memory Utilization • CPU Utilization • Network Utilization
Remote Node Capabilities • One agent can represent more than one monitored entity • Multiple remote systems in one agent • Each remote node has a unique “Managed System Name” so they can be in different managed system lists for situations • One agent can represent different types of entities • Windows and Solaris agents can monitor different sets of data on different systems • Multiple instances of an agent may co-reside on the same agent server
Agentless Monitoring for Operating Systems http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.1/welcome.htm
Agentless monitors can run from most ITM supported platforms Windows (x86 & x64, not IA64) x/p/z Linux Solaris AIX HP-UX Agentless monitors may remotely monitor older versions of listed operating systems and other Linux distributions, depending on capabilities If you want to use the Windows API data collectors, the Agentless monitor must run on a Windows platform You may configure different data providers for the Agentless monitors: Agentless Monitoring for AIX OS SNMP v1, v2c, v3 Agentless Monitoring for HP-UX OS SNMP v1, v2c, v3 Agentless Monitoring for Linux OS SNMP v1, v2c, v3 Agentless Monitoring for Solaris OS CIM-XML SNMP v1, v2c, v3 Agentless Monitoring for Windows OS Windows APIs - Windows Management Instrumentation (WMI) - Performance Monitor (Perfmon) - Event Log SNMP v1, v2c, v3 Agentless Monitoring Data Collection and Platforms
Deployment Considerations • With Agentless Monitoring, a percentage of preparation time needs to be devoted to verifying the native data emitter configurations • Ensuring SNMP daemons are installed, configured and started (community strings and user/pw information verified) • Exposing MIB branches in SNMP configuration files • Verifying Windows passwords and user account rights for Windows API collection • Patch levels for endpoint systems – need to be verified based on the User’s Guides • If possible, use tools like snmpwalk, WMIExplorer, and perfmon to verify the metrics are exposed before pointing ITM to the environments • Decide how many remote systems that need to be monitored and then identify the systems to run the agentless agents
Install • Ensure that the prerequisites are met for the system that you are using for the agent. • See the Agentless Agent User Guides for this information
Windows Installer • Select the remote system types that you want to monitor from this Windows host. • Next select the agents you want to install in the depot for remote deploy.
Linux Installer • Select the operating system type or take the default • Select the remote system types that you want to monitor from this Linux host
Install Application Support • TEMS • HUB • Remotes • TEPS • TEP Desktop clients • Warehouse Proxy Agent • Warehouse Summarization Agent
Configuring the Agent – Linux • SNMP – v3 • [root@rc2test4 /]# itmcmd config -A r4 • Agent configuration started... • Enter instance name (default is: ): SLESv3 • Edit "Monitoring Agent for Agentless Linux OS" settings? [ 1=Yes, 2=No ] (default is: 1): • Edit 'SNMP connection' settings? [ 1=Yes, 2=No ] (default is: 1): • Port Number (default is: 161): • SNMP Version [ 1=SNMP Version 1, 2=SNMP Version 2c, 3=SNMP Version 3 ] (default is: 1): 3 • Edit 'SNMP Version 3' settings? [ 1=Yes, 2=No ] (default is: 1): • Security Level [ 1=noAuthNoPriv, 2=authNoPriv, 3=authPriv ] (default is: ): 2 • User Name (default is: ): snmpuser • Auth Protocol [ 1=MD5, 2=SHA ] (default is: ): 1 • Enter Auth Password (default is: ): • Re-type : Auth Password (default is: ):
Configuring the Agent – Linux • Priv Protocol [ 1=DES, 2=CBC DES ] (default is: ): 1 • Enter Priv Password (default is: ): • Re-type : Priv Password (default is: ): • Edit 'Remote System Details' settings? [ 1=Yes, 2=No ] (default is: 1): 1 • No 'Remote System Details' settings available? • Edit 'Remote System Details' settings, [1=Add, 2=Edit, 3=Del, 4=Next, 5=Exit] (default is: 4): 1 • Managed System Name (default is: ): rc2SLES • SNMP host (default is: ): 172.17.4.219 • 'Remote System Details' settings: Managed System Name=rc2SLES • Edit 'Remote System Details' settings, [1=Add, 2=Edit, 3=Del, 4=Next, 5=Exit] (default is: 4): 5 Easy to overlook
Configuring the Agent – Windows • Open the Manage Tivoli Enterprise Monitoring Services (MTEMS) • Select the template for the agent type • Fill in the requested information
Considerations for Using • Agentless monitors return the last background collection interval of data when a real-time query results in a timeout with the endpoint system due to network load or latency • With Historical Collection enabled, the collection for all the remote endpoints will be stored on the Agentless Monitoring Server when storage “at the Agent” is selected. • Ensure the physical system has sufficient disk space, network bandwidth to the Warehouse Proxy Agent when monitoring large numbers of remote systems • With the Agentless Monitoring Server now maintaining connections to hundreds of severs, it becomes a more critical component in the infrastructure than a single agent instance
Agentless Health • Each remote monitor has self-monitoring attribute tables that can be used to monitor the collection process: • Performance Object Status attributes: • Last collection errors encountered • Last collection start/finish times • Last/average collection duration • Refresh interval • Number of collections • Cache hit/miss/hit percent • Intervals skipped (most useful) • Thread Pool attributes: • Current/max Thread pool size • Current/average/min/max active threads • Current/min/max queue length • Average wait time • Total jobs • Situations may be created against these attribute groups to notify of collection failures • It is recommended that an Operating System agent be co-deployed to the Agentless Monitoring Server to watch CPU, Memory, and Network utilization of the monitors
Troubleshooting Overview • General Diagnosis • Fault Determination • Is the data coming through? • Is the data incorrect? • Specific Diagnosis • Agent issues • Remote system setup • Connectivity • Review logs on the agent • TEP issues • Application support - workspaces / data • TEMS issues • Application support – situation issues
Agent Log Files and Trace Settings • Default location: %CANDLE_HOME%\TMAITM6\logs\<hostname>_<pc>_k<pc>agent_<instance>_<timestamp>-01.log (Windows) $CANDLE_HOME/logs/<hostname>_<pc>_<instance>_<timestamp>-01.log (UNIX/Linux) • Increase unit traces to isolate the issues
How can I tell if the endpoint is the problem? • Typical endpoint issues: • Connectivity • Firewall • SNMP needs ports 161 and 162 open. • CIM needs ports 5988 and 5989 open. • TCP Stack • Verify TCP connectivity to the remote system using ping, telnet, etc. • DNS • Use nslookup and/or route to verify that the remote system is known to your domain. • SNMP or CIM • Daemons not running • Incorrect version of SNMPD or CIM • SNMPD not configured correctly (snmpget, snmpnext, snmpwalk) • snmpget -v 1 –c public rc2testSLES sysUpTime.0 • snmpget -v 3 -u snmpuser -l authNoPriv -a MD5 -A password rc2testSLES sysUpTime.0
How can I tell if the endpoint is the problem? • SNMPD daemon is not running • Check the ITM logs for the following lines: (2009/10/13,20:37:35.0001-A:snmpqueryclass.cpp,1164,"handle_snmp_response_async") ERROR: decoded PDU is null -- this is a timeout scenario (2009/10/13,20:37:35.0003-29:snmpqueryclass.cpp,1782,"internalCollectData") Timeout occurred. No response from agent 172.17.4.219. • Password error – SNMP v3 (2009/10/14,05:58:23.0067-6:snmpqueryclass.cpp,1158,"handle_snmp_response_async") Entry (2009/10/14,05:58:23.0068-6:snmpqueryclass.cpp,1164,"handle_snmp_response_async") ERROR: decoded PDU is null -- this is a timeout scenario • Password working – SNMP v3 (2009/10/14,05:40:01.0017-7:snmpqueryclass.cpp,688,"completeInit") Host: 172.17.4.219, Port: 161, User: snmpuser, Sec Level 1 (2009/10/14,05:40:01.0018-7:snmpqueryclass.cpp,689,"completeInit") Auth password: xxxx, proto: 1, key: (2009/10/14,05:40:01.0019-7:snmpqueryclass.cpp,690,"completeInit") Priv password: xxxx, proto: 1, key: (2009/10/14,05:40:01.001A-7:snmpquerymetric.cpp,89,"getOID") Entry (2009/10/14,05:40:01.001B-7:snmpquerymetric.cpp,91,"getOID") OID=1.3.6.1.2.1.25.2.3.1.1
Windows Agentless Monitor fails to collect perfmon data • When using the Windows Agentless Monitor (r2), the following errors appear in the log: (4891C694.0066-1558:queryclass.cpp,1006,"start") Error adding query for class PhysicalDisk. (4891C694.0067-1558:queryclass.cpp,1007,"start") \\rc2test3.tivlab.raleigh.ibm.com\PhysicalDisk(*)\% Disk Write Time - add returned C0000BB8 • Potential problems: • The Counter may simply not exist. Runing the typeperf command (or perfmon GUI) locally on the server when you are trying to collect metrics to verify the command comes back cleanly without error. • The Remote Registry service may not be enabled. A remote collector must have registry access to lookup the indexes. Verify the service is enabled and run the typeperf command (or perfmon GUI) remotely to verify the command comes back cleanly without error. • The indexes of counters are corrupt. When a request is made, the string name of the counter is requested. That in turn is matched to an index on the target computer. All the perfmon index dictionary name to number maps are stored in the registry here:HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Perflib\009 On the failing systems with this problem, the "counter" entry there either has no value, or garbage (those empty rectangles).
Windows Agentless Monitor fails to collect data • Am trying to run the Windows Agentless Monitor (r2) against one of our machines but am getting errors that I don't know what they mean (48BF57E9.0006-EF4:wmiqueryclass.cpp,728,"internalCollectData") ::collectData==>Could not connect. Error code = 0x80070005 (48BF57E9.0007-AD4:queryclass.cpp,790,"internalCollectData") Authentication failed against host testSys1 as user itoperations, return code = 1326 • Potential problems: • The User name was not properly specified in the format Domain\User.
Some workspaces have blank views for Linux • On TEP, the Linux Agentless Monitor (r4) only shows data for the "Network" and "System" navigator items. • Potential Problems: • By default, Red Hat Linux allows connection with the Host Resources MIB and the UCD MIB only through SNMPv3 connections. • Verify that the following lines are modified or added in the Access Control portion of the /etc/snmp/snmpd.conf: view systemview included .1.3.6.1.4.1.2021 view systemview included .1.3.6.1.2.1.25 • Verify the SNMP daemon is running by using the ps –ef command.
Ignore These Errors • You can ignore these: (48E297DC.0095-17E4:configdata.cpp,65,"getConfigurationProperty") KR2_WMI_WIN_PASSWORD_1 not found in the hash map (48E297DC.0097-17E4:configdata.cpp,65,"getConfigurationProperty") KR2_WMI_WIN_PASSWORD_DEFAULT not found in the hash map • The configuration does a fall-back lookup for its required parameters: Subnode Configuration Default Configuration • These errors indicate that they were not overridden in the subnode
Wrap Up The Agentless Agent technology gives you relatively quick startup and value with limited intrusion on the monitored system!