360 likes | 531 Views
Session #: 5728. Best Practices for TSM Design and Sizing. Ron McCracken rmccrack@us.ibm.com. Title slide. Abstract – Best Practices for TSM Design and Sizing.
E N D
Session #: 5728 Best Practices for TSM Design and Sizing Ron McCracken rmccrack@us.ibm.com Title slide
Abstract – Best Practices for TSM Design and Sizing This session will discuss the various issues surrounding the sizing of TSM in a business environment. Service level agreements, hardware, network speeds, SAN issues, and solution sizing will be discussed. Typical inputs to the TSM design and sizing process and the outputs from the process will be presented so that the attendee can begin the process of designing or improving an TSM solution for their own business. Agenda slide
Agenda • What is TSM Design and Sizing? • The InputsNeeded for TSM Design • Considerations for the Design • Best Practices for TSM Design and Sizing Agenda slide
Definition of TSM Design A TSM Design is a set of configuration and sizing recommendations that will result in an TSM implementation that will meet a specific set of needs and expectations. Agenda slide
TSM Design – Simplest Terms In the simplest terms, TSM is just an application that moves data. Developing an TSM design is an exercise in determining the hardware, software, and network configuration that will be required to move the data within certain pre-determined amounts of time.
TSM Design – Typical Components • Transport recommendations ( e.g. SAN, LAN, etc) • Placement of TSM servers in the environment • Number of TSM server instances • Sizing recommendations for the TSM server hardware • Sizing recommendations for the tape libraries • High level storage pool configurations • Feature usage recommendations Agenda slide
The Inputs Needed For TSM Design and Sizing • How much data are you trying to back up? • What types of data are in your environment? (e.g. databases, mail servers, small files, large files, etc) • What are your retention policies? (i.e. How long do you want to keep the backed up data?) • How fast is the data growing? • What are your backup and restore service level agreements (“SLA’s”)? Agenda slide
Data Gathered for Each Server to Back Up • Server Name • Operating System • Type Network/Speed • Backup Window • Restore Window • Estimated Growth • Amount of FS data • Amount of DB data • Type/Version of DB(s) • # of DB versions to retain • # of FS versions to retain • Nightly Change Rate %
IBM Tivoli Storage Resource Manager • IBM Tivoli Storage Resource Manager provides information on amount and type of data stored on servers. • Should be first in the project plan…..if possible
Can’t get all the inputs? – Agree on assumptions! • % change rate – file data generally 5-10%, database data 15-100% • Most businesses keep 3-4 weeks worth of DB backups • Need to understand the methodology (e.g. full + incremental?) • Must businesses keep 1 – 4 weeks of file system data • Get an average server size, number of servers • Focus on large DB servers, including mail • But Remember: Garbage in Garbage out!
Inputs Allow Us to Calculate Backup Workload • 20 file servers each with 100GB of disk space • Amount of data = 60GB • Nightly change rate = 5% • Nightly workload per system = 60GB * 5% = 3GB • Total nightly workload = 3GB * 20 = 60GB • 20 DB servers each with 500GB of disk space • Amount of data = 400GB • Nightly change rate = 15% • Nightly workload per system = 400GB * 15% = 60GB • Total nightly workload = 60GB * 20 = 1200GB • Total workload for both file and DB servers, 1200GB + 60GB = 1260GB • Backup window = 6 hours • Nightly Throughput = 1260GB / 6 hours = 210 GB per hour
Size for Both Backup and Restore • Don't forget to size for Restore! • The worst time to find out that your system is undersized for restore is when you must do a large scale restore--by then it's too late to do anything about it • Determine which systems, clients, databases are most important • Determine the throughput requirement for a full system restore of all of the most important clients • In some cases you might want to size primarily for Restore and secondarily for Backup • It's important to practice your Restores after you have implemented • You can then validate your sizing estimates • You can make tuning or configuration adjustments before you are forced to do a “real” restore
Transports – LAN/WAN Design Considerations • Never assume the maximum theoretical throughput for a LAN or WAN. Often reality is much worse. • Gigabit Ethernet performance is often CPU bound. Be very careful with this on TSM servers. • Do real world FTP testing if you can. • Be conservative in your network assumptions.
Client Client TSM Client Switch TSM Client Ethrnt Hub Client Client 10 Mbytes/sec Max 100 Mbit 1 Gbit Transports – LAN/WAN Design Considerations • Understand the bandwidth limitation(s) in your network • On a single non-switched network, the rated speed is the maximum throughput for ALL workload on that network • Consider using an additional adapter on the TSM Server for switched networks
Transports – LAN/WAN Design Considerations • Start with the overall backup/restore throughput that you are trying to achieve • Find a way to configure the network to achieve that level of throughput • Additional adapters on a switched network • Dedicated backup network • Gbit Ethernet • SAN and Lan-free backup • Don't forget to factor in other workload that must share network bandwidth
LAN-Free Considerations • Although data movement occurs over the SAN infrastructure, metadata must still be sent over the LAN. • Speeds will not be anywhere near fibre channel native speed. • Lan-Free data movement is most appropriate for servers with large files and large amounts of data. (e.g. mail and database) • File servers (e.g. Netware and Win2k file servers) are not necessarily good candidates for Lan-Free
TSM Database Sizing Considerations • TSM database volumes should be spread across multiple disks. • Smaller DB Volume Size typically performs better (e.g. 4 GB) • The database should be located on the fastest disks/configuration possible in enterprise environments. • TSM database should be mirrored (use TSM mirroring), on a separate set of disks. • Database data has far less impact on TSM DB size (large files, few in number) • Average TSM DB size 40-60GB – as large as 80-100GB (getting big) • Rule of thumb: 3-5% of total file system data • 20 servers * 60GB (of file data) = 1200GB total • 3-5% = 36-60 GB for TSM DB • Each file/object occupies 600 bytes of space • Each version occupies 200 bytes • Copy storage pools occupies 200 bytes • Space managed files occupy an additional 200 bytes • average file size = 100KB • 1200GB * 1000 = 1,200,000 MB • 1,200,000 / .1 = 12,000,000 files (12 million files) • Primary backup -- 12,000,000 files* 600 bytes = 7.2GB • 14 version -- (12,000,000 files * 200 bytes ) = 33.6GB • Copy storage pools -- 33.6GB * 2 = 67.2GB • Total DB size -- 67.2GB+7.2GB = 74.4GB
TSM Server Hardware Considerations • Unix Box may perform up to 2X faster than Windows Box • S/390 may perform similar to high-end Windows Box • S/390 and Unix scales well with multiple server instances • File size is important……for everyone, not just TSM! • Rule of Thumb Configurations: Intel TSM servers, 2 or 4 CPUs (4 CPU capable) Unix TSM servers, 4-6 CPUs (8-12 CPUs possible with 2 installs of TSM) 1 GB of memory per CPU 2 tape drives per CPU • Enough SCSI, Network and HBA cards to meet performance and availability requirements • Bus Speed and expansion capabilities to meet performance requirements
Server Instance and Placement Considerations • Server placement will usually be driven by: • Network infrastructure • Disaster recovery considerations • Number of server instances will usually be driven by: • Server throughput capability • How many tape drives can I attach? • How much data can be moved through the bus? • How many CPUs can be put in a system? • Size of TSM Database • 40-100 GB per instance is a good rule of thumb Remember, one physical system can run multiple TSM instances.
TSM Disk Storage Pool Considerations • Typical Assumptions • File system data cached to disk storage pool, migrated to tape after backup window has ended. • DB data backed up direct to tape. • Have a large enough disk storage pool to store at least 1 night worth of file system backup data plus 20%. • Some users have a large enough disk storage pool for multiple days or weeks. • For faster restores allow the disk storage pool to cache migrated files. • The disk storage pool can be located on SAN attached disk. • The write performance of the disk storage pool is important. Many users choose not mirror disk storage pool.
Tape Drive Considerations • Throughput • Use 80% of tape drives theoretical maximum uncompressed throughput for large files. • 3590-E1A 14MB/s = 49 GB/hr, at 80% = 40 GB/hr • 3580 LTO 15MB/s = 53 GB/hr, at 80% = 42 GB/hr • File systems should be backed up to a disk storage pool when using a LAN • Database data should be backed up to a tape storage pool. • Mount points • May need to perform a restore during times of migrations, copy pool operations, reclamations or when a drive is broken? • May need twice as many tape drives for copy pool operations. • LAN-Free backup of DB data to tape may need additional tape drives. Try to always configure a minimum of 4 drives in a physical library per TSM server (absolute minimum 2 drives)
Tape Library Considerations • Typically allow for only 65% tape utilization due to expiring data. • Consider yearly growth • Consider tape drive compression and type of data • Usually 1.5x is a good number for compression • Some data does not compress much at all • Additional slots for TSM DB backups, scratch & copy pool tapes, etc. • Not unusual to recommend libraries with 5-10X capacity of raw data • More DB than file system drives slot numbers higher • More versions or high growth drives slot numbers higher • Additional slots required for collocation
The TSM Schedule • Backup the data (overnight) • Cut disaster recovery copies • Backup TSM DB • Produce DR plan and take tapes off-site • Migration • Expire old files • Run reclaimation • Fix any broken hardware (maintenance window) • Start Over
The TSM Schedule (The Show Must Go On) • The schedule cannot stop because: • A restore must occur • A tape drive is broken • Although certain items can be postponed in the event of an emergency, they must still eventually be done in full! • Reclaimation • Storage pool and disaster recovery/offsite copies
Restore Performance Considerations • Restores are only as good as the organization of data on the tape. • Use collocation or other means to minimize fragmentation • Keep # of versions to a minimum • Tape technology, mount time, and seek time affect restore time • Clients often take longer to write data on file system in a restore operation • The bandwidth of a client system as well as the disk performance will often be a bottleneck • Restores are not incremental. You may have to restore all the data on a system.
TSM Performance Numbers (Backup Example) H80 2 Way 450 MHz, 1 GB RAM, 8 Win2k Clients, 2 GB Ethernet Adapters using Jumbo Frames, 3580 LTO Drives
TSM Performance Numbers (Restore Example) H80 2 Way 450 MHz, 1 GB RAM, 8 Win2k Clients, 2 GB Ethernet Adapters using Jumbo Frames, 3580 LTO Drives
Best Practices for TSM Design and Sizing • Understand your environment! • Total Amount of Data • Nightly Data • Retention Requirements • Backup and Restore SLAs • Growth Rates • Work with network resources to “right size” network. Verify network throughput using base tools (e.g. FTP). • Use LAN-Free where it fits best. Usually this is for systems with large amounts of data and large files.
Best Practices for TSM Design and Sizing • As your TSM database starts approaching 80-100 GB, think about multiple instances. • Have realistic expectations of tape and disk performance. • Never have more tapes than slots in a TSM Library. • Size the TSM server and library so that there is some room for growth. A system that is teetering on the edge of feasibility is a maintenance nightmare!
A Good Design Requires Focus on Getting Inputs and Providing Cost Effective Outputs • How much and what types of data do you have? (Environment) • What are your service level agreements (SLAs), growth expectations, and retention policies? (Requirements) • How much hardware will it take to meet those SLAs and is it worth the cost? (Cost/Benefit Analysis)