360 likes | 531 Views
Availability Manager V3.0-2 Overview. Barry Kierstein Hewlett-Packard. Overview of This Session:. Availability Manager Overview Availability Manager Components Availability Manager Installation Availability Manager Configuration Availability Manager New Features for V3.0-2
E N D
Availability Manager V3.0-2 Overview Barry Kierstein Hewlett-Packard
Overview of This Session: • Availability Manager Overview • Availability Manager Components • Availability Manager Installation • Availability Manager Configuration • Availability Manager New Features for V3.0-2 • Availability Manager Gotcha Items • Availability Manager Unsupported Features • Availability Manager Data Collection Considerations • Availability Manager Live Demonstration
Availability Manager Overview • Real-time display of system(s) being monitored; similar to MONITOR but with additional capabilities • Error and Information display: issues warnings when resources are low • Can be used to “fix” various problems • Display portion is easy to learn (point and click) • Default setup is a good place to start finding performance bottlenecks and resource contentions • With some customization, the Availability Manager helps pinpoint problems specific to the systems being monitored
Availability Manager Overview • Collects data on one or more nodes (systems), analyzes the data, displays it, and issues warnings • For Alpha and VAX, requires only an OpenVMS license to collect data • For I64, requires the EOE platform, or Avail_Man PAK to collect data • Separate installation kit: • Included on the CD-ROM distribution kit • Download is available from the OpenVMS homepage • Manuals included in hardcopy documentation kit, distribution kit, and on-line documentation kit
Availability Manager Groups • Systems can be grouped together for analysis • All members of an OpenVMS Cluster must have the same group name for correct clusterwide data collections • Unclustered systems can be put into the same group • Availability Manager can be configured to display information only from specified groups, reducing the number of systems being monitored • Also knows as AMDS groups
Availability Manager Components • Three parts: • Data Collector gathers data on system(s) being monitored • Data Analyzer collects data from the Data Collectors and displays the data • Data Server allows Data Analyzers to collect data over an IP-based wide area network (WAN)
Availability Manager Components • Data Analyzer: • A Java-based application • Runs on OpenVMS Alpha platform from V7.3-2 and later with Motif or X-Windows • Runs on OpenVMS I64 platform from V8.2-1 and later with Motif or X-Windows • Runs on Intel platforms under Windows 2000 and Windows XP Professional
Availability Manager Components • Data Collector: • Consists of an OpenVMS device driver, configuration and startup files • Device driver file is RMDRIVER.EXE on VAX, SYS$RMDRIVER.EXE on Alpha and I64 • Device shows up with $ SHOW DEVICE RMA0: command • Runs on Itanium, Alpha and VAX platforms • Runs on OpenVMS V6.2 and later • Sends out a Hello multicast message to announce an OpenVMS system to the Data Analyzer • Only collects data when a Data Analyzer sends a data collection request, uses little CPU time
Availability Manager Components • Data Server • The Availability Manager uses its own protocol (AMDS protocol) for communication between the Data Analyzer and each Data Collector • Connection not dependent on network software to work (IP, DECnet, LAT, etc.) • Data collection and fixes often work even when the network on the system is not functioning or the system is hung
Availability Manager Components • Data Server • Data Server allows a Data Analyzer to collect data over an IP-based network • Data Server resides on the same extended LAN as the OpenVMS systems so it can communicate to the Data Collectors using the AMDS protocol • Data Analyzer connects to the Data Server using an IP-based secure socket connection over a WAN or LAN • A Data Server can accept connections from several Data Analyzers • A Data Analyzer can connect to several Data Servers • For redundancy, one could have two Data Servers on the same LAN
Availability Manager Installation • Availability Manager kits • OpenVMS Data Collector kit and manifest for secure delivery • Contains files for each OpenVMS version and platform • Can use SYSMAN> DO command to install on a cluster • If updating the Data Collector, a system reboot is necessary to remove the old Data Collector • OpenVMS Data Analyzer/Server kit and manifest • Contains files for both the Data Analyzer and Data Server • System reboot is not necessary with this kit • Windows 2000/XP kit • Normal Windows installation, requires a reboot to install a driver • Install using Administrator account or equivalent
Availability Manager Configuration • Data Collector configuration • Data Collector password • In file SYS$MANAGER:AMDS$DRIVER_ACCESS.DAT • Authentication between Data Analyzer and Data Collector • A Data Collector can have several passwords allowing for various access rights and scopes • Considerations for passwords • Access rights – Read, Write and Control • Scope for a particular password • OpenVMS – password for all OpenVMS systems • AMDS group – common for clusters • Individual node
Availability Manager Configuration • Data Collector configuration • Data Collector settings • In file SYS$MANAGER:AMDS$LOGICALS.COM • AMDS$GROUP_NAME – set as desired, one per cluster • AMDS$DEVICE – Network adapter used for communications using the AMDS protocol • Data Analyzer connections to Data Collectors • Data Server connections to Data Collectors • Note: Data Analyzer to Data Server connections use the IP protocol. The network adapters used are controlled by the IP stack on the particular system.
Availability Manager Configuration • Data Collector configuration • Data Collector settings • Hello multicast message settings • AMDS$RM_DEFAULT_INTERVAL – Broadcast interval in seconds for Hello multicast messages when the system is not being monitored • AMDS$RM_SECONDARY_INTERVAL – Broadcast interval in seconds for Hello multicast messages when the system is being monitored • Determines how quickly the Data Analyzer discovers all the systems. For instance, if the secondary interval is 20 seconds for each system on a LAN, then it will take up to 20 seconds for all the systems on the LAN to be discovered. • Each message is one packet of around 200 bytes, contributes little to the network traffic
Availability Manager Configuration • Data Collector configuration • Data Collector startup • SYS$STARTUP:AMDS$STARTUP is used to start and stop the Data Collector, P1 is the function • START – Loads the configuration data and passwords, and starts the Data Collector. Put this in command in SYS$MANAGER:SYSTARTUP_VMS.COM after the network stacks have been started so the MAC addresses of the network adapters have their final value • STOP – Stops the Data Collector • RESTART – Stops and restarts the data collector. This is useful if you change the configuration data or passwords, and want the changes loaded into the Data Collector • STATUS – Current status of the Data Collector • HELP – List of possible functions
Availability Manager Configuration • Data Server configuration • Authentication between the Data Analyzer and the Data Server is by Kerberos public and private keys • Create key pair on Data Server system • Start Data Analyzer, create keys, export public key • Create key pair on Data Analyzer system • Start Data Analyzer, create keys, copy to Data Server system • Covered in Chapter 2 of the Availability Manager Users Guide
Availability Manager Configuration • Data Analyzer configuration • Import any Data Server public keys • Start Data Analyzer • Import keys in Network Connection dialog box • Input Data Collector passwords • Use the Security tab in the Customization dialog box • Passwords can be entered at the appropriate level • OpenVMS level – Customize in System Overview • AMDS group level – Right-click on group in System Overview • Node level – Customize in Node pane or right-click on a node
Availability Manager Startup • The first window to appear is the System Overview Window • Event data also goes to the event log file AnalyzerEvents.LOG • On OpenVMS, you can set the location of the event log file with logical names
System Overview Window • Initially the System Overview window is empty. Systems are displayed as the Hello multicast message is received from each Data Collector • Shows all the systems being monitored in one window • Information includes the Name, Utilization, O.S. and Hardware versions • Allows customizations at the application and operating system levels • Shows the connection used to gather the data (network adapter, connections to Data Servers)
Availability Manager New Features • Data Collection over IP • Data Server to tunnel AMDS protocol over IP • Avail_Man_Ana kit renamed to Avail_Man_Ana_Srvr to show that the Data Analyzer and Data Server reside in the same kit – must remove old kit due to name change • Java 5.0 JVM used by Availability Manager • Increased performance on OpenVMS • Requires ODS-5 disk – use /DESTINATION qualifier when installing the Data Analyzer/Server kit to direct the installation on an ODS-5 disk • AMDS$AM_DISABLE_OFFSCREEN_PIXMAP_SUPPORT logical can help remote X-window display performance
Availability Manager New Features • System Overview window changes • New and revised columns • PFLTS shows total and hard page fault rates • PFW/COM shows number of processes in PFW and COM states • DC shows Data Collector capability version and Managed Object registration state • CPU Qs revamped to have more consequential states • PFW and COM removed, leaving COMO, MWAIT, COLPG & FPW • If total is non-zero, show all counts as n/n/n/n (see tooltip) • Events have changed to reflect PFW and COM removal • Memory tooltip shows memory and alignment fault info • Added HIALNR event for high alignment fault rate
Availability Manager New Features • Data Collection for Logical Disks (LDcn:) devices • Event Log enhancements • Each data connection has its own log file • Status column – shows when a threshold event begins, ends, is cancelled or expires • EventKey – unique key for an event on a node • For instance, all HICOMQ events for node APPLE • Can use $ SEARCH to easily find all occurences of an event • EventID – unique key for a single event • Easily find the BEGIN and END/CANCELED/EXPIRED record for an event
Availability Manager New Features • Fixes • Force a disk volumn out of Mount Verify state • Force a shadow set member out of a shadow set that is in Mount Verify state • Data Analyzer supports MAC address changes • CFGDON and PTHLST show MAC address used • CHGMAC and NEWMAC events show address changes • SYS$STARTUP:AMDS$STARTUP.COM • STATUS parameter shows RMA0: status • START and RESTART have LOG qualifier to output configuration data sent to RMA0:
Availability Manager Gotcha Items • Make sure the most recent AMNDIS50.SYS file on Windows systems is installed • Correct date is Nov 28, 2006 • Driver from earlier date can crash system when a second Data Analyzer is started • OpenVMS Data Analyzer/Server V3.0-2 kit requires Data Collector V3.0-2A kit • A check for required logical names is done when the Data Analyzer or Data Server is started. The logical names are defined in AMDS$STARTUP.COM, which is in the Data Collector kit.
Availability Manager Unsupported Features • Installation on Windows Vista • Right-click -> Properties to install under compatibility mode • More testing and compatibility knowledge needed to put on the supported list • Running the Data Analyzer on other Oses • Work done to allow the Data Analyzer to connect to Data Servers by using the JVM only, tested on Linux • Install JVM on system • Copy *.JAR and *.ZIP files into a subdirectory • Create script with JAVA command line from AMDS$AM_RUN.COM
Availability Manager Data Collection Considerations • Data Collections on a local LAN typically finished quickly - in less than a second or two for the largest data collections with many continuations • Using a Data Server slows down the round trip time for data collections with many continuations • Affects systems with many processes, disks or large resource hash table • DCSLOW events are signaled when the data collection takes longer than the data collection interval • DCCOLT events document how long the data collection actually took in seconds
Availability Manager Data Collection Considerations • Lock contention data in particular can take many continuations to finish • 1K resource hash table entries were scanned per collection, so large tables resulted in hundreds of continuations • Since the data collection time is returned in the AMDS packet, the number of hash table entries scanned is now limited by a 1ms limit. This was the maximum collection time seen in scanning 1K hash table entries on a DEC 3000/400. On larger Alphas, this time limit results in scanning about 3K hash table entries. • This limit can be changed if necessary, but take care as the IOLOCK8 spinlock is held during the data collection
Availability Manager Live Demonstration • Initial key configuration for Data Analyzer and Data Server • Overview of new features
Availability Manager Contact Information • Barry Kierstein – • Barry.Kierstein@HP.Com, Kiersteinco@Gmail.Com • Shubhabrata Bose • Shubhabrata.Bose@HP.Com • Karthigeyan Kasthuriregan • Karthigeyan.Kasthuriregan@HP.Com • Srividhya Subramanian • Srividhya.Subramanian@HP.Com