420 likes | 608 Views
Monitoring Update. Antonio Sgro Service Availability & Business Performance Management Best Practices Senior IT Architect. Agenda. ITM 6.2.2 FP2 ITM 6.2.3 and Roadmap ITCAM for Apps ITCAM for MS Apps ITCAM for Virtual Environment ITCAM for AD/Transactions. Important Disclaimer.
E N D
Monitoring Update Antonio Sgro Service Availability & Business Performance Management Best Practices Senior IT Architect
Agenda • ITM 6.2.2 FP2 • ITM 6.2.3 and Roadmap • ITCAM for Apps • ITCAM for MS Apps • ITCAM for Virtual Environment • ITCAM for AD/Transactions
Important Disclaimer THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: • CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR • ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
Saving Time Leveraging Existing Solutions • If a solution exists on OPAL, don’t write it yourself • If existing OPAL solutions can be enhanced to meet customer needs • Source code provided with many solutions. We can provide source code for most of the other IBM developed solutions • Many useful OPAL solutions (Custom Monitoring, management tools, and more • ITMSuper – audits existing environment (replaces taudit): http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10TM6L • Writing Situations: http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10TM6E • Optimizing End-to-End TEPS Response Time: http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10TM4T • Admin Workspace & Views: http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10TM0D • Remotely Stop/Start OS Agents: http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10TM0G • ITM Self-Monitoring Workspaces, Queries, and Views: http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10TM28 • EIF Mapping File: http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10EC0O • SSM Deploy/Config: http://www-01.ibm.com/software/brandcatalog/portal/opal/details?catalog.label=1TW10TM46 4
ITCAM for Transactions Comprehensive Set of Response Time Capabilities ITCAM for Applications ITM ITM for Virtual Servers ITCAM for SOA Platform ITM for Microsoft Applications ITM for Energy Management Health monitoring of operating systems Broad Application and Application Infrastructure Monitoring Capabilities Health monitoring of VMware, Citrix, and MS Virtual Server environments Integrated Service, Middleware and SOA Enterprise Management Offering Health Monitoring of Microsoft Operating Systems, Application Infrastructure and Applications Reduce power consumption ITCAM for Application Diagnostics Resource monitoring and deep dive diagnostics of WebSphere and J2EE servers Application and Resource Monitoring Portfolio
ITM v6.2.2 What’s New? eGA :Sept 2009! • Agent Autonomy • Local Config – Operates without ITM infrastructure • EIF/SNMP Eventing from Agents • Granular Data Warehouse Collection • Built-In Predictive Analytics • Visual Baselining • ITPA bundling • Workspace Gallery • Agent Management Services • TIP Integration • 64Bit Windows Agents (FP1)
eGA :May 28! ITM 6.2.2 FP2 Overview • Base Contents – All Delivered • New Automation Opportunities with CLI • Get/put/exec files against single targets • Update Logical Tree with Agents • Associate Situations in a Tree • Rules to allow Omnibus to process EIF Events sent directly from ITM Agents • Systems Director Enablement • Ability for Director customers to generate reports using TCR • Director will re-use the TDW, TCR and ITPA Components of ITM Extended Contents – All Delivered • Visual Baselining for Dynamic Thresholds • TDW Granular Row Filtering • System Monitor Agents • Centralized Config • Upload of History Data into TDW • Agent Mgmt Services : OS Data in PAS Workspace & Support for QI Msg Broker Agt • EIF Heartbeating with OMNIBUS Rules The information is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion
Autonomous Agent Operation Customizable levels of agent behavior – embeddable (static config, exceptions-only) through integrated, centralized operations • Capabilities • The ability of the agent to operate independently of the/a centralized infrastructure • Robust across intermittent and extended disconnection incidents • Event data cached persistently until delivered • Operational configuration independently of a centralized TEMS • Local, “human readable” configuration files • Enhancements being made so they apply to a majority of agents • Local Config Options: • Situations – Events as SNMP traps or EIF events from the agent • Overrides • Warehousing • Central Config: Agents may pull configuration from http sources
Autonomous Agent Operation • EIF Heartbeating…Agent can be configured to send heartbeats to Event Server • OMNIbus triggers are delivered so that ONLY the missed Heartbeats are displayed in the Event Server. • Historical Collection can now include Summarization & Pruning • KSY_AUTONOMOUS=Y • Some Limitations to be aware of: • SNMP Traps won’t send multiple Events for multi-row tables, but EIF Events will. • Situation formulas must be simple AND/OR formulas such as numerical comparisons. Count, Average, rate of change are not supported yet. • Missing function is the only complex function at this point
Central Configuration Server • Centrally manage and control agent configuration • Central Config server can be an Agent or a Web Server • Agent starts periodic configuration task at initialization • Agent performs one time configuration download at start up or continuous periodic update inquiry(default interval is every hour) • Well-known agent artifacts such as Authorization Group Profile, Private Situation XML, threshold XML, and Load List itself can be optionally activated upon successful download • Two ways to make Agent use Central Configuration • Supply Configuration Load List and place the file on the Agent • At Agent startup, Agent retrieves Configuration Load List from a Centralized Configuration Server • IRA_CONFIG_SERVER_URL • IRA_CONFIG_SERVER_USERID • IRA_CONFIG_SERVER_PASSWORD • IRA_CONFIG_SERVER_FILE_NAME • IRA_CONFIG_SERVER_FILE_PATH • Multiple XML files can be define for an Agent…contents are merged. • Current limitation…Situations can’t be deleted
Put File/Get File/Execute Command • putfile and getfile • tacmd putfile –m <managed system> -s <source file> -d <destination file> • tacmd getfile –m <managed system> -s <source file> -d <destination file> • -force flag allows you to replace existing files • -type flag specifies whether to do ASCII or binary transfer • Recommend no more than 10 concurrent putfile/getfile. • Recommend relatively small files (16 Meg or smaller) • Execute Command • tacmd executecommand • {-m|--system} SYSTEM • {-c|--commandstring} COMMAND_STRING • [{-w|--workingdir} REMOTE_WORKING_DIRECTORY}] • [{-o|--stdout}] (capture standard out) • [{-e|--stderr}] (capture standard error) • [{-r|--returncode}] (capture the return code) • [{-l|--layout}] (capture the local command string in the results file) • [{-t|--timeout} TIMEOUT] • [{-d|--destination} LOCAL_STD_OUTPUT_ERROR_FILENAME] • [{-s|--remotedestination} REMOTE_STD_OUTPUT_ERROR_FILENAME] • [{-f|--force} FORCE_MODE} • [{-v|--view}] (Views the results file) • tacmd executeAction requires you to define it the Take Action before executing it.
Situation Associations via CLI • This CLI adds the capability for users to list, add, delete, export, and import Tivoli Enterprise Portal situation associations and managed system assignments via the TACMD command line interface. • Ten new CLI commands have been created: • createsysassignment • deletesysassignment • exportsysassignments • importsysassignments • listsysassignments • createsitassociation • deletesitassociation • exportsitassociations • importsitassociations • listsitassociations
Granular Warehouse Data Collection: • Filter with a Logical set of AND and OR statements • Does NOT support complex assessments like count, avg, etc. • Setup different historical collection intervals for different MSLs • Named collections • Distinct intervals, locations • Distribution to • all agents on a TEMS • specific agents • agents in an MSL • Also participates in Object Grouping ala Situation Groups
Agent Management Services • Now enabled via ITM's TACMD command and can be driven from command line or scripts • Functions below are automatically done by PAS but can be forced manually as well via taccmds • Start PAS Management for an agent( tacmd -n "AMS Start Management" ...) • Stop PAS Management for an agent ( tacmd -n "AMS Stop Management" ...) • Start Agent Instance ( tacmd -n "AMS Start Agent ...) • Stop Agent Instance ( tacmd -n "AMS Stop Agent...) • Support Added for: • OS’s • AIX • Solaris • HP-UX • Linux and Windows • Other • UNIX Log Agents • Agentless Agents • System Monitor Agents • Support in place to manage non-ITM Agents Extended agent information: Agent Version, Build Number, Instance Name - When running more than one agent with the same name, PAS can now manage each independently, User Name - the security level of the agent Integrated with Agent Builder
Performance Analyzer enhancements Threshold Predicted CPU Violation CPU • No longer a dependency on TEPS DB • During upgrade to 6.2.2 FP2, configuration DB tables are automatically moved to TDW • Simpler Configuration • Added platform support for 64-bit Linux (although 32-bit runtime) • Now supports TEMS/TEPS support for all platforms • Support for Warehouse on DB2 on z/OS • One TDW can support multiple ITPA instances • 6.2.2 release made dramatic performance improvements Predicted trend Actual Monitor Data Time
Visual Baselining • Determine expected efficacy of new thresholds based on known problem times or historical trends • Potential thresholds can be calculated and drawn on the chart. • Choose from set of attributes • Specify the time span to use – real time and time-aligned historical • The threshold for the metric is visualized as a dark shaded area, and changes as the operator or condition value change, or when the user clicks on the chart. • Automatically generate a Situation • Overrides not applicable via TEP, but can use CLI to establish baselines with overrides.
Visualize overrides Model attribute and inline calendar overrides Establish/Edit calendars via TEP View comparable portions of historical data for trending/abnormal diagnosis Leverage Historical Data to Build Adaptive SituationsVisual Baselining for Dynamic Thresholds 622FP2Extended Content The information is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion
Use Managed Systems Lists to Populate Navigation Trees • Structure of Navigation Trees are static • 6.2.2 FP2 allows you to assign MSL’s to a Navigation Item • Workspace & Queries populated by members of MSL
OS Agent Enhancements • Process Group Leader for UNIX and Linux • Improvements in ITM DLA • Optionally Disable Windows Attribute Groups: • e.g.: NT_EXCLUDE_PERF_OBJS=Job Object, Job Object Details • HP Disks: • Add support for Agile Disks (DFS) • Optionally, EXCLUDE_LEGACY_DISKS=TRUE • NT Reverse Name Lookup failures can cause Agent to fail: • Now the environment variable REVERSE_LOOKUP_ACCEPTED_FAILURES can be specified in the configuration file to allow the user to set the max number of accepted failures in the reverse lookup. Default is 30 failures. • Additional Attributes available for historical collection
Base versus Extended Content • Under the Agile development methodology content is either considered Base or Extended. • Base Content is release defining – the date will move to accommodate completion or function trumps dates. • Extended Content is planned and may be under development at this time – if Base Content falls behind, Extended Content may be sacrificed to hold the initial schedule or the date trumps function.
ITM 6.2.3 Target GA 1H 2011 • Total Cost of Ownership (TCO) • Significantly reduce the number of manual tasks involved with agent installation and maintenance via automatic agent data synchronization • Granular Security focused on actions • Data Warehouse compression for both data transfer and DB tables • WPA remote deploy and configuration CLI • TEMS scale/performance improvements • Provide dynamic updates to TADDM when virtual machines are added, deleted, or moved • SSL encryption for EIF events to OMNIBUS • Remote Agent configuration • Improve the total cost of ownership for AIX system monitoring • Connect and remote deploy capabilities for Micro agents • Unix long process names support • TCR v2.1 uplift (Cognos reports) • NSF monitoring • Situation distribution architectural changes • Multi Domain Manager • Time To Value (TTV) • Deployment Planning Tool • Dependency checker Note. Base and Extended content
Self-describing agents • Application data replicated from agent to • framework at agent startup time. • Infrastructure support refreshed when newer • version of agent is detected • Components no longer required to be recycled with every update • Can be selectively disabled • Eliminates need for mainframe App Support on distributed systems • Mainframe RTEMS may need to be recycled • Depot files must still be populated TEP Application Metadata replicated up from agent TEPS IBM Tivoli Data Warehouse HUB TEMS, FTO Warehouse Proxy Agent Remote TEMS Remote TEMS Agent Applications Agent Applications Note. Base content
Situation Distributions Enhancements Previous design choice to optimize for network bandwidth during HUB-REMOTE synchronizations, for example when an agent switches from one remote TEMS to another, led to the potential for high memory and CPU usage at the HUB as well as missed configuration changes in large environments. Every TEMS has a complete replica of the configuration HUB TEMS Synchronization to create full replica of config state Remote TEMS B Remote TEMS A Switching over Agent 2, 3 Applications Agent 1 application The new design reduces many small synchronization updates down to one larger update, and eliminates the bookkeeping caches at the HUB Note. Base content
Goal • Provide ITM customers with the ability to manage the configuration of all ITM product components from a central administrative control point in a world-wide estate. • Includes the generalization of single hub configuration controls to multiple hubs, allowing e.g., a managed system group to contain managed systems from multiple hubs. • ITM is merely first step – extends to all SAPM products and components.
Exemplary Target Capabilities • Allow customers to export Managed System Group and situation configuration data from multiple HUB TEMS into a single relational database, under a single namespace, for management and reporting. • E.g. Provide views and reports of the Managed System Groups and Managed systems to which their situations are distributed. • Allow customers to deploy Managed System Group and situation definitions to new HUB TEMS, optionally including target distributions as part of the situation configuration. • Allow customers to integrate new agents into Managed System Groups automatically, and to select configuration items of the agents by policy. • Allow customers to version situation definitions • Server-side audit of all operations
HUB TEMS HUB TEMS Architectural Highlights UI TCR reports WAS Multi-Domain Manager TDW Domain config …
Problem Statement • ITM Installations Failures Due to: • Missing OS Patches • Insufficient Space • Missing Application requirements • Incorrect Permissions on Target System • Incorrect Agent Configuration Data Input • Username/Password • Port Conflict • Missing Component Specific Prereqs • Result • Product left in indeterminate state(No Install rollback capability) • Remote Installation timeout waiting for an ITM Agent to Start/Re-start • Corrupted Product Installation Note. Base content
Prerequisite Checking Scenarios • Preemptive Local Prereq Checking • Local Installation Prereq Checking • Prereq Checking during Large Remote Deployments • Preemptive Prereq Checking Prior to Large Deployments
Story 3: Prereq Checking during Remote Install TEMS • User invokes prereq check command against the group • Prereq Checker pushed down to each target and executed • If PASS, image is transmitted to target node and install executed • If FAIL, report transmitted back to TEMS • User can execute getdeploystatus for PASS/FAIL Summary Prereq Checker Prereq Checker PASS Prereq Checker PASS FAIL Prereq Checker Prereq Checker PASS PASS FAIL
Story 4: PreemptivePrereq Checking - Remote TEMS • User invokes prereq check command against target group • Prereq Checker is pushed down to each target and executed • User Executes getdeploystatus to obtain PASS/FAIL Summary • Failing Server reports transmitted back to TEMS to facilitate remediation Prereq Checker Prereq Checker PASS Prereq Checker PASS FAIL Prereq Checker Prereq Checker PASS PASS FAIL
Remote OS agent configuration &agent environment configuration
Remote OS agent configuration & agent environment configuration Background • At present, remote deploy allows reconfiguring application agents but not OS agents. • Also, it is desirable to have the ability to remotely set/update agent environment variables, which is not currently supported. Solution • Add tacmd setAgentConnection to allow configuration of TEMS connection for agents. – Provide the ability to choose RXA to update the OS agent configuration by providing the RXA credentials (root/Administrator userid/password) instead of credential-less update. This will be helpful in the case where OS agent fails to start following reconfiguration thus disabling remote deploy connection to the target node. • Modify tacmd configureSystem and addSystem to allow users to set/update environment variables for the agent being configured or deployed. Note. Base content
Role-Based Access Control Introduction Note. Extended content Role-Based Access Control (RBAC) permissions are defined by SUBJECT x ACTION x OBJECT X CONSTRAINT where • Subject: User or user group • Action: Action that can be performed (e.g. invoke a Take Action, create a situation) • Object: Object that you can perform an action on (e.g. a specific TakeAction) • Constraint: Further restricts how, when, or where you can perform the action. Example constraints: • Target: Specific managed system or managed systems in a managed system group or agent type • Time based • Run As
Auditing • Audit records are created for all events associated with granular security relevant actions whose result depends upon user input, or that are the result of an access decision. • RBAC administrative actions (e.g. creating a role, granting a role to a user, revoking a permission) • A user attempts to perform an action for which they don’t have permission. • A user performs an action for which they have permission. • Audit records provide the information defined by the TCIM W7 format: who, what, when, on What, where, where From and where To • ITM administrators can change the audit logging level (minimum, basic, detailed) to change the volume of audit records • Minimum granularity would only reflect major state changes. • Basic would audit any action that modifies any object and access failures. • Detailed would detail any access decision. • Audit records can be viewed locally on an ITM component or in the warehouse database. • Warehouse database contains complete set of audit records • Local audit records are the most recent events