340 likes | 506 Views
Best Practices for Backing Up Windows Servers Using IBM Tivoli Storage Manager Session S21 August 30 th 2004 -Sept 2 nd 2004. Sean O Sperry ssperry@us.ibm.com. Abstract. Windows Best Practices.
E N D
Best Practices for Backing Up Windows Servers Using IBM Tivoli Storage ManagerSession S21August 30th 2004 -Sept 2nd 2004 Sean O Sperry ssperry@us.ibm.com
Abstract • Windows Best Practices This session will focus on various best practices surrounding the configuration of TSM to protect Windows. Features of the TSM client that are specific to Windows such as Journal Based Backup, System Objects, Open File Support, ASR, and Image Based Backup will be included. Some typical pitfalls of backing up and restoring Windows will be presented and recommendations will be made on how they can be avoided.
Agenda • What’s unique about Windows? • Windows only features of TSM. • General TSM features important for Windows. • Putting it all together – Best Practices for Backup and Restore.
Backing up Windows – Unique Problems • Registry, active directory, and other system objects • Plug and Play • Typical to have many servers with unique configurations • Typical to have lots and lots of directories with wide and deep structures • Lots of small files are very common. Sometimes they make up LARGE AMOUNTS OF DATA!
System Objects • Prior to version 5.1, Windows system object files could be backed up incrementally when backing up the system object. • In version 5.1 and above, a full copy of all Windows system object files is taken during each backup. A full set of these objects is kept for each version of the system object that is retained based on policy. • There are many (usually over 2000) TSM objects (e.g. files) associated with the Windows system object. • Typical system object backup for a server can be 400 Mb or greater.
System Objects (continued) • Large numbers of Windows systems with longer retention policies can produce large amounts of data! • 500 “small” Windows Boxes • 30 version retention • DB Size • 500 * 1 * 2000 * 600 = .6 GB (first object) • 500 * 29 * 2000 * 200 = 5.8 GB (additional copy) • 500 * 30 * 2000 * 200 = 6.4 GB (copy storage pool) • Total DB = 12.8 GB • Nightly Data • 500 * 400 Mb = 200 GB
System Object Recommendations • You can use an include statement to assign the Windows system object to a different management class that causes them to be kept for a shorter period of time. • Ex. include.systemobject sysobjclass • include.systemstate sysstateclass • Carefully consider how system objects will be used. If you intend to rebuild a box for scratch in the event of its total loss, use the domain statement to exclude system objects. • Ex. domain all-local –systemobject
Ensure business continuance Reduce administrative costs Maximize current hardware investment Brings back system to state of last backup Recovers all the OS changes and customizations Streamlines and automates the OS recovery process Eliminates the need for highly skilled professionals to manually reinstall hardware, network, patches… Speeds up the recovery time Integrates bare machine backups directly to Tivoli Storage Manager server Tivoli Storage Manager’s Bare Machine Recovery
BMR (Bare Machine Recovery) • Prior to Windows 2003, a 3rd party product was required to do automated Bare Machine Recovery. • Windows 2003 introduced a semi-automated solution for BMR called Automated System Recovery (ASR). • Provides for restore in the event of a catastrophic system or hardware failure • Goal is to return the operating system to the point of last backup – not for restore of application or user data • Invokes the new Windows API calls • Restores the operating system files and system state data
ASR Overview • Insert the Windows installation CD into the CD-ROM drive • Restart the system and boot from CD • You may need to do some BIOS configuration • Press F2 to enter ASR recovery mode • Insert the TSM-created ASR diskette (label TSMASR) into the floppy drive • Windows will read the diskette and then reformat the boot partition and possibly other partitions • Windows will copy installation files to the hard drive • Insert the TSM client installation package CD (labeled TSMCLI) into the CD-ROM drive • The TSM client installation package will be copied to the hard disk • Insert the ASR diskette into the floppy drive (if not already there) • Windows copies TSM files from the diskette
ASR Overview (cont) • Remove the diskette and the system reboots • Insert the Windows install CD into the CD-ROM drive • After the Windows setup completes, two command windows are opened • One runs the TSM client silent install • One is available for diagnostic purposes • You will be prompted to choose either network connected restore from the TSM server or local backupset restore • Restore from TSM server requires node name and password • Restore from backupset requires the path and name of the local backupset – you will be prompted when subsequent volumes are needed • TSM restores the system state or system objects • TSM restores the system drive (partition)
ASR Overview (cont) • Remove the TSMASR diskette and the machine reboots • The operating system comes up in fully recovered state • Restore user data and applications • Use traditional TSM restore facilities • Address and “application specific” recovery issues
ASR Pros and Cons • Pros • Provides a Microsoft / Tivoli supported methodology for doing a Bare Machine Recovery • Cons • Lots of Prerequisites • Same / Similar Hardware • Certain Disk Geometries / Sector Sizes • Only works from a floppy drive (no support for network boot) • Still a “guided process” • Still need to plan for “application recovery”
JBB (Journal Based Backup) • Journal Based Backup allows an incremental backup to occur using a local database called a change journal. The local database is used to determine what has changed and needs to be backed up. • Windows clusters are supported beginning at version 5.1.6. • Can significantly reduce backup time for file systems with large numbers of files of which a small number change. • Typical Windows File Server • Recommended for systems with large numbers of files where the amount of files that change is less than 10-15% of the total.
Image / Snapshot Backup • Image / Snapshot Backup allows you to do a block level backup of a Windows system. • Pros of Image Backup • Much (orders of magnitude) faster than file-by-file when doing a full volume restore. • Snapshot technology can be used to back up open files • Can be combined in a rotation w/ normal incremental backups to achieve a “point in time” restore. • Requires only one entry in the TSM database. • Cons of Image Backup • Requires a separate drive from the one being backed up for the snapshot image. • Can use more network bandwidth when using LAN based backup (because you need to move the entire volume). • Cannot be used to restore a single file; must restore the entire volume.
OFS (Open File Support) • TSM version 5.2 allows for backing up of open files on Windows 2000 (not supported yet on Windows 2003). • Leverages the TSM Logical Volume Snapshot Agent used for image backup. • Recommend for use with applications that do not support backup through other APIs (e.g. MS Access).
Image and OFS Options • SNAPSHOTCACHELOCATION • Controls the location of the OBF (old blocks file) during image backup. • SNAPSHOTFSIDLEWAIT • Time which can pass waiting the disk to have no write activity so the snapshot can be taken. • SNAPSHOTIDLERETRIES • Number of times the LVSA should try to achieve the snapshot. • INCLUDE.FS • Can be used to exclude a volume from LVSA processing. • “include.fs C: fileleveltype=dynamic” • PRESNAPSHOTCMD and POSTSNAPSHOTCMD • Can be used to quiesce applications before and after a snapshot.
Image and OFS Caveats • The LVSA is not intended to provide Windows system backup. • This is done through the system object and system state. • The snapshot cache file cannot be located on the drive being backed up. • The restore of an image backup cannot be done to the same drive as the TSM client is installed on. • Essentially, all this adds up to eliminate “single-drive” systems.
Volume Fragmentation • Large Windows file servers are particularly vulnerable to volume fragmentation when using incremental-forever backup methodology. Typically they have large numbers of small files and wide directory structures. • I have seen active files for a Windows system spread out over more than 250 tapes! Active Inactive Time Expired System A and B System A and B
How Fragmented Is My Data? • For a given node, the following SQL statement will show the all the volumes used by active files. • select node_name,filespace_name,stgpool_name,volume_name from volumeusage where node_name=‘MYNODE' and node_name in (select node_name from backups where state='ACTIVE_VERSION' group by node_name) NODE_NAME FILESPACE_NAME STGPOOL_NAME VOLUME_NAME ------------------ ------------------ ------------------ ---------------- SSPERRY ASR SUNCPOOL IOX873 SSPERRY ASR SUNCPOOL IOX921 SSPERRY ASR SUNPOOL IOX872 SSPERRY ASR SUNPOOL IOX947 SSPERRY SYSTEM OBJECT SUNCPOOL IOX921 SSPERRY SYSTEM OBJECT SUNPOOL IOX947 SSPERRY \\ibmt40\c$ SUNCPOOL IOX866 SSPERRY \\ibmt40\c$ SUNCPOOL IOX873 SSPERRY \\ibmt40\c$ SUNCPOOL IOX904 SSPERRY \\ibmt40\c$ SUNCPOOL IOX912 SSPERRY \\ibmt40\c$ SUNCPOOL IOX921 SSPERRY \\ibmt40\c$ SUNPOOL IOX852 SSPERRY \\ibmt40\c$ SUNPOOL IOX863 SSPERRY \\ibmt40\c$ SUNPOOL IOX872 SSPERRY \\ibmt40\c$ SUNPOOL IOX900 SSPERRY \\ibmt40\c$ SUNPOOL IOX947
Collocation • Collocation is the classic solution for volume fragmentation • Segregates data from different sources • Can be done by node or by file space • Current level of granularity is by storage pool • Collocation can be thought of as a methodology for moving tape mounts from the restore to backup (during migration) • Pros of collocation • Collocation will significantly reduce restore time for systems that are prone to fragmentation (small files, large amounts of data) • Cons of collocation • Collocation will waste tape space (particularly with small systems and large sequential volumes) • Collocation will increase the library slot count necessary to keep all volumes in the library
Multisession Restores Session 1 Storage pool volumes Session 2 Session 3 Client Server • Restore uses multiple client-server sessions for a single file space. • New to version 5.1. Prior to version 5.1, only a single tape drive could be applied to a volume restore at a time. • Limited by • Number of sequential-access volumes with data to be restored plus one session for random-access disk • Mount points on the client (as defined on the TSM server) • Client's resourceutilization option • Works only for no-query restore (unrestricted wildcard) • I often see 10x improvement in restore speed when using multisession!
Full Backups • Although most people use the Incremental Forever methodology, TSM will allow you to do a full file-level backup • Set copymode to absolute on the copy group • Do a selective backup on the drive • Prepares for rapid client restore by • Consolidating most recent copy of node data (reducing volume fragmentation) • Significantly increases network utilization • Can be done periodically depending on levels of fragmentation Periodic Full Backup ALL Files for node BOB
Move Nodedata Command • Moves files for specified node, file space, and data type residing in a specified sequential-access storage pool • Prepares for rapid client restore by • Consolidating node data within a sequential-access pool • Moving data to disk for fast access • Reconstruction option removes unused space within aggregates • Requires lead time before restore for disk, but can be scheduled for tape Consolidation Movement to Disk Files for node BOB Files for node BOB move nodedata bob fromstgpool=tapepool move nodedata bob fromstgpool=tapepool tostgpool=diskpool
Directory Management Class Files for node BOB • The dirmc option on the TSM client can be used send directories to a separate disk pool from files. • While files will migrate to sequential access media, directories can be kept on disk because of their size. • This can greatly reduce restore time for file servers with large directory structures. • Note the a copy storage pool for DR and redundancy is still strongly recommended. Directories for node BOB Primary Pool w/ migration Primary Pool w/ no migration Copy Pool
Classic Restore vs. No-Query Restore • Classic restore processing • Client queries server for information about files to be restored • Server sends file information to client • Client sorts files by storage location • Client sends restore request for each file in optimal restore order • Server sends each specified file to client • No-query restore (NQR) processing • Client sends restore specification to server • Server sends matching files to client
Classic Restore vs. No-Query Restore • Classic vs. NQR tradeoffs • NQR reduces client-server interaction and client memory requirement • NQR allows multi-session restores, but classic restore does not • NQR is usually faster for restoring entire file systems or large directories • Classic restore may be faster for restoring directories if data is highly fragmented across sequential volumes with only a small amount of data on each volume • NQR is automatically used if both of the following apply • The file specification is an unrestricted wildcard (/home/mydata/*) • None of the following are used: inactive, latest, pick, fromdate, todate • To force classic restore, use ?* in the file specification (home/mydata/?*)
Putting It All Together – Best Practices for Windows Backup and Restore
Best Practices for Windows Backup • Large clients with small files should be backed up to collocated sequential access storage pools where feasible. • It will probably not be feasible to collocate all Windows clients, so consider segregating out the largest ones with the most files into a collocated storage pool. • Carefully consider how you will use system objects. Consider keeping only a small number of system object backups. • For large systems with multiple drives, consider doing regular image (or full) backups. • Consider using Journal Based Backup for very large systems with lots of file and relatively little change. • Consider using OFS on large Windows servers with multiple drives.
Best Practices for Windows Restore • Take advantage of Multisession Restore for Windows systems! Resourceutilization = 6-10 and Max Number of Mountpoints = 3-4. • Monitor volume fragmentation for at risk systems. Take action to correct excessive fragmentation (e.g. move nodedata) before it’s too late. • Make sure you use No-Query Restore if you need to restore the whole file system. • If an image backup is available, restore the image and then apply the incremental in full restore situations. • If you are using Windows 2003, take advantage of ASR. • Document and test restore procedures. Particularly, be aware of “application recovery issues”.