250 likes | 773 Views
Tivoli Storage Manager Crash/Recovery. Presented By Rahul Sharma. Agenda: TSM server startup process overview Configuration files/example TSM server startup failure TSM server crash analysis on AIX platform TSM server DB restore command/syntax. TSM server startup process.
E N D
Tivoli Storage Manager Crash/Recovery Presented By Rahul Sharma
Agenda: • TSM server startup process overview • Configuration files/example • TSM server startup failure • TSM server crash analysis on AIX platform • TSM server DB restore command/syntax
TSM server startup process Execute DSMSERV Read DSMSERV_CONFIG for Dsmserv.opt Parse Dsmserv.dsk to read DB and LOG path Perform LVM check Read Devncfg.out and Volhist.out Mount DB & Log volumes Initiate server module
TSM server Important files • Dsmserv.opt • TSM server’s configuration file containing configurable options • Outlines the location of the volhist.out and devcnfg.out to be utilized during the server recovery. • Dsmserv.dsk • Specify the location of the database and recovery log volume that the server should read during the server initialization process. • Volhist.out (Volume history file) • Specify the history of sequential volumes (typically tapes) used by the TSM serve Referenced when restoring TSM server database • Devcnfg.out (Device configuration file) • Specify the device configuration and media formats used by the TSM server to communicate with the library and tape device • Referenced when server database is unavailable
Example of Volume History file *************************************************************************************************************************** * * Sequential Volume Usage History * Updated 03/12/10 14:55:25 * * Operation Volume Backup Backup Volume Device Volume * Date/Time Type Series Oper. Seq Class Name Name **************************************************************************************************************************** 2010/01/06 13:51:45 BACKUPFULL 1 0 0 1 FILEDEV /test/02669705.DBB 2010/01/07 13:51:45 BACKUPFULL 2 0 0 1 FILEDEV /test/08966987.DBB 2010/01/08 13:53:37 DBSNAPSHOT 1 0 0 1 FILEDEV /test/08796487.DBS 2010/01/15 13:53:42 BACKUPFULL 5 0 0 1 FILEDEV /test/08857543.DBB 2010/01/16 13:53:48 BACKUPFULL 6 0 0 1 FILEDEV /test/08858557.DBB 2010/01/16 13:54:14 BACKUPINCR 6 0 0 1 FILEDEV /test/14076904.DBB 2010/01/16 13:54:20 BACKUPINCR 7 1 0 1 FILEDEV /test/16465612.DBB 2010/01/17 14:54:40 BACKUPFULL 8 0 1 1 FILEDEV /test/24012917.DBB 2010/01/17 14:55:24 BACKUPINCR 9 0 1 1 FILEDEV /test/66353724.DBB
Example of Device configuration file root@testbox:bin$ more devconfig.out /* Device Configuration */ DEFINE DEVCLASS LTOCLASS DEVTYPE=LTO FORMAT=DRIVE MOUNTLIMIT=DRIVES MOUNTWAIT=60 MOUNTRETENTION=60 PREFIX=ADSM LIBRARY=LTOLIB WORM=NO DRIVEENCRYPTION=ALLOW SET SERVERNAME TIVOLI_SERVER1 SET SERVERPASSWORD 21e6dfc45bd2feb57bf909fff999b5003d DEFINE LIBRARY LTOLIB LIBTYPE=SCSI WWN="500308C09AE06090" SERIAL="000001307830" SHARED=YES AUTOLABEL=NO RESETDRIVE=YES DEFINE DRIVE LTOLIB DRIVE0 ELEMENT=257 ONLINE=Yes WWN="500308C09AE06094" SERIAL="1210234237" DEFINE PATH TIVOLI_SERVER1 LTOLIB SRCTYPE=SERVER DESTTYPE=LIBRARY DEVICE=/dev/smc3 ONLINE=YES DEFINE PATH TIVOLI_SERVER1 DRIVE0 SRCTYPE=SERVER DESTTYPE=DRIVE LIBRARY=LTOLIB DEVICE=/dev/rmt2 ONLINE=YES
TSM server crash • Database and/or Recovery log corruption. • Recovery log has grown to the maximum limit of 13.2Gb (13564Mb) • Disk failures where db and log volume reside • Fatal operating system errors can be caused by the operating system, device drivers or by faulty hardware. • In case of an Application, the crash might be related to a bug in the source code of the product or due to some invalid values being passed to a function or memory not getting freed up.
Core Dump Analysis on AIX • What is needed to analyze core dump • Libraries/modules mapped into the TSM Servers address space at the time of the core dump. • Package all of the required information using AIX SNAPCORE utility. • The SNAPCORE utility should be run from the location of the dsmserv executable (typically in /usr/tivoli/tsm/server/bin
Invoking the SNAPCORE utility • snapcore -d <output dir> <path of /core file> dsmserv • snapcore -d /coredir /usr/tivoli/tsm/server/bin/core dsmserv • The above will package up the core file, dsmserv and all related libraries and modules into a PAX archive file (similar to TAR) in the output directory specified with the "-d" option. This PAX file will contain everything required for analyzing a core file in-house
Check the output directory for Snapcore file • bash-2.04# ls -l /tmp/coredir/snapcore_381120.pax.Z • -rw-r--r-- 1 root system 26124703 Jun 26 06:35 snapcore_381120.pax.Z • Unpack the Snapcore (.pax.Z file): • bash-2.04# zcat snapcore_381120.pax.Z | pax –r • Use of DBX utility to capture call stack
Extract the Call Stack from DBX utility • The DBX utility can be invoked as follows: • dbx dsmserv <core file path/name> • Issue "where" command to see the callstack. • Issue Register command to see the registers value
Example : DBX Execution root@bahraintivl2:bharat$ dbx dsmserv core Type 'help' for help. [using memory image in /usr/tivoli/tsm/server/bin/core] reading symbolic information ... IOT/Abort trap in pthread_kill at 0x9000000004092b4 ($t310) 0x9000000004092b4 (pthread_kill+0x80) e8410028 ld r2,0x28(r1) (dbx)
Example : Call Stack (dbx) where tmParticipate(??, ??, ??, ??, ??) at 0x100072b28 dbParticipate(??) at 0x1000e9edc tbOpen(??, ??, ??) at 0x100047bb8 admAuthSystem(??, ??) at 0x100041b1c AdmEstimateDbReorg(??) at 0x100576fac AdmShow(??) at 0x100896a5c AdmCommandLocal(??, ??, ??, ??, ??) at 0x10015c3e8 admCommand(??, ??, ??, ??, ??) at 0x10015d5d4 SmExecuteCommand(??) at 0x100581e5c SmLocalConsoleSession(??) at 0x100581bb4 StartThread(??) at 0x10000fef0
Prerequisites for TSM server restore • Know Location of following files: • dsmserv.opt • Dsmserv.dsk • Volhist.out (fully qualified path in dsmserv.opt) • Devcnfg.out (fully qualified path in dsmserv.opt) • Recovery log and Database layout and sizes (per volume) • Know your Tape Hardware: • Ensure tape library and drives function at the OS level • Element number layout of automate library
TSM server restore command/Syntax • Restore db to most current state • DSMSERV RESTORE DB • Require recovery logs mode to be set in rollforward • Restore a Single DB volume to most current state • DSMSERV RESTORE DB DBVOL=XXX • where xxx is full path and name of db volume to be restored • Require recovery logs logmode in rollforward • Restore DB Point In Time (PIT) with volhist available • - dsmserv restore db todate=MM/DD/YYYY totime=HH:MM:SS • Restore DB Point-in-Time (PIT) Volume History File Unavailable • - dsmserv restore db devclass=xxx volumename=YYY commit=YES • - Note: commit=YES entered on the LAST volume in db set.
Collecting data for support ( common for all platform) • From TSM Administrative command line client (DSMADMC), enter the following commands: QUERY SYSTEM > querysys.out QUERY ACTLOG begind=<mm/dd/yyyy> begint=<hh:mm> dsmserv.opt for TSM server dsmserv.err for TSM server errpt –a output • For Server Crash stack.out registers.out (if DBX was used) .pax archive created with snapcore command tsmdiag_results.tar copies of the coring executable, dsmlicense, and OS libraries.
Server or Storage Agent crash on AIX http://www-01.ibm.com/support/docview.wss?uid=swg21319850 Server or Storage Agent crash on Solaris http://www-01.ibm.com/support/docview.wss?uid=swg21322742 Server or Storage Agent Crash – Windows http://www-01.ibm.com/support/docview.wss?uid=swg21306067 Server crash on Linux http://www-01.ibm.com/support/docview.wss?uid=swg21260220 Collecting data for support (Different Platform)