540 likes | 733 Views
Tivoli Storage Manager Lunch and Learn Series – December 2010 TSM V6 Disk Tuning Dave Canan ddcanan@us.ibm.com IBM Advanced Technical Skills. ATS Team. Dave Canan ddcanan@us.ibm.com Dave Daun djdaun@us.ibm.com Tom Hepner hep@us.ibm.com Eric Stouffer ecs@us.ibm.com.
E N D
Tivoli Storage ManagerLunch and Learn Series – December 2010TSM V6 Disk TuningDave Canan ddcanan@us.ibm.comIBM Advanced Technical Skills
ATS Team • Dave Canan ddcanan@us.ibm.com • Dave Daun djdaun@us.ibm.com • Tom Hepner hep@us.ibm.com • Eric Stouffer ecs@us.ibm.com
Topics – TSM V6 Disk Tuning Why is this topic so difficult to present? Common pitfalls with tuning disk and TSM Best Practices for the TSM DB2 database , logs, and storage pools How can I now tell if I have a disk bottleneck with TSM V6? Not all disk subsystems are alike TSM V6 with DSxxxx subsystems (excluding DS8xxx) TSM V6 with DS8xxx TSM V6 with SVC TSM V6 with XIV TSM V6 with ??? (Discussion) Reference Material
Why is this topic so difficult to present? There are conflicting ideas on some of the ways to do disk tuning. Examples: Best segment size for storagepools Should I choose software mirroring, hardware mirroring, or NO mirroring? Is RAIDX better than RAIDY? Do I need to still worry about separation of TSM components? TSM V6 continues to evolve. What works today as a best practice might change in the future. Multiple levels of storage virtualization make the task of disk tuning more confusing Storage technology is also changing (and sometimes it seems to be changing in the wrong direction as far as performance is concerned). SSD is getting cheaper and is more common. (Better performance - GOOD) Larger capacity disks are now produced that are still the same speed (rpm) as before (Slower performance - BAD)
Why is this topic so difficult to present? Disk tuning is only part of the puzzle. People focus on disk layout, but also need to be concerned with : HBA speed Sharing Disk/Tape on same HBA (still not recommended) Disk Type and Speed Type of Raid Data sharing Filesystem type and mount options Speed and number of CPUs and amount of memory Multiple instances of TSM on same subsystem and host
Why is this topic so difficult to present? There are many different subsystems that are now available. We can’t understand all of them, all their features, and their technologies. In some cases, a “best practice” may not apply to all subsystems. Benchmarking takes time and is expensive We can’t do benchmarking on all subsystems. We can’t benchmark all configurations they support. It’s difficult to do production type benchmarking on them. We (IBM) are not allowed to share disk benchmarking information Disk setup takes time. We often get last minute questions on how to set things up without having all the information. Service level agreements, backup windows, amount of data, configuration information It is also sometimes hard for customers to gather what we need. And most important: Best practices means more cost. Example: In some cases the best DB performance involves wasting disk space. Are you willing to do that? Or, if I tell you RAID10 is the best, are you willing to spend the money for twice as much disk?
Common Pitfalls with Tuning Disk and TSM. People set up disk with TSM based on capacity not on performance. This can be another philosophical argument. Often, storage administrators believe that the technology will handle any performance issues that arise. Some technologies can do this, but not all. Most disk subsystems, in order to have the best performance, still have the strong recommendation that the TSM components (DB, Active Log, Stgpools) be separated. TSM systems are configured once and then not monitored There are too many kinds of mirroring going on. (limit it to 1 kind) TSM best practices for housekeeping tasks are not followed. Out of date documentation is used. Some of the recommendations made early on in V6 have been revised. (Suggestion: The TSM Wiki tends to contain current recommendations.)
Tivoli Storage Manager V6 Best Practices for the TSM DB2 Database, Logs, and Storage Pools
Best Practices for the TSM V6 DB • Use fast, low latency disks for the DB. (SSD is great for the DB, but is still expensive). Don’t use the slower internal disk included by default in most AIX servers, or using consumer grade PATA/SATA disk in a Linux or Windows system. • Use multiple database containers. For an average size DB, it is recommended to use at least 4 containers initially for the DB, spread across 4 LUNs / physical disks. Larger TSM servers, or TSM servers planning on using data deduplication, should have up to 8 containers or more. (Exception to multiple LUNS can be XIV.) • Plan for growth with additional containers up front. Adding containers later can result in an imbalance of IO and create hot spots.
Best Practices for the TSM V6 DB • Place each database container is in a different filesystem. This improves performance; DB2 will stripe the database data across the various containers. TSM supports up to 128 containers for the DB. • There should be a ratio of one database directory, array, or LUN for each inventory expiration process
Best Practices for the TSM V6 DB (cont.) • The block size for the DB varies depending on the tablespace, most are 16K, but a few are 32K. Segment/strip sizes on disk subsystems should be 64K or 128K. • If using RAID, then define all your LUNs with the same size and type. • (For example, don’t mix 4+1 RAID5 and 4+2 RAID6 together) • RAID10 outperforms RAID5 (when doing large numbers of writes) but comes at a cost of twice as much disk being needed. • Smaller capacity disks are better than larger ones if they have the same rotational speed. • Have containers on disks that have the same capacity and IO characteristics. • (For example, don’t mix 10K and 15K drives for the DB containers)
Best Practices for the TSM V6 Logs • Use faster disks for the Active Logs. Do not mix active logs with disks containing the DB, archive logs, or system files such as page or swap space. • Can use slower disks for archive logs and failover archive logs. • Cache subsystem readahead is good to use for the active logs; it helps in archiving them faster. • RAID1 is good for active logs. • Highly recommended that FailoverArchiveLog space be set aside for possible emergency use. Slower disks can also be used for FailoverArchiveLog space.
Best Practices for the TSM V6 Storagepools • Disk subsystems detect readahead on a LUN by LUN basis. If you have multiple reads going against a LUN, then this detection fails. • What this means is that you should have more LUNS of a smaller size. But too many LUNS can be harder to manage. • Use pre-allocated volumes vs. scratch volumes. Scratch volumes will cause file fragmentation. • Each new fragment means cache readahead stops • Can use fileplace (AIX), filefrag (Linux), or contig (Windows) to see if file fragmentation is happening • Pre-define them one at a time. At V6.3, “define volume” performance is much faster. • If you use devclass DISK for your storagepools, have 1 storagepool volume per filesystem, and have no more than N volumes for an N+1 RAID5.
Best Practices for the TSM V6 Storagepools • If you use devclass FILE, then you need at least MAXSESSIONs worth of volumes defined, but this will impact performance. The resulting IO will be more random in nature. • Buy as much disk cache for the storagepool volumes as you can afford. • DIO (Direct IO) is enabled by default for stgpools • DIRECTIO option not displayed with “q opt”. Use “query option directio” command to see what it is set to. • You MIGHT benefit from DIRECTIO NO if your disk subsystem has a small amount of cache on it. • Using file system cache (DIRECTIO NO) can decrease overall server throughput and increase server processor utilization. Also, If you have set this and call TSM support with a performance issue, you should tell them that you have set this to NO. Setting this to NO is not a best practice.
Tivoli Storage Manager V6 How Can I Now Tell If I Have A Disk Bottleneck With TSM V6?
How Can I Now Tell If I Have A Disk Bottleneck With TSM V6? • TSM V5 server instrumentation had the ability to look at individual DB/Log volumes. TSM V6 server instrumentation doesn’t give us this level of detail. • TSM V6.2.3/V6.1.5.0 did introduce improvements in this area • IOSTAT, SAR and VMSTAT commands (free) are good for UNIX platforms. • (For example, don’t mix 4+1 RAID5 and 4+2 RAID6 together) • Linux and AIX also support nmon (free). Many customers now run this constantly and keep the data for historical purposes. Nmon has relatively low overhead, and is valuable in the data it provides. • Perfmon (free) is good for Windows platforms. • Many other tools also available at additional cost.
IOSTAT Command – Things to look for that may indicate a disk IO bottleneck With TSM V6. • Many parameters on this command. Recommendations: iostat -DlT 10 (AIX) iostat -xtksinterval iterations (Linux) iostat -xtcinterval iterations (Solaris) (also “sar –d” is useful) • This command has low overhead; many customers run this all the time and prune entries after N days. • Look for the column indicating queues are full. (“qfull” on AIX) • indicates how many times an i/o request was blocked because the os queue for the disk was full • Value represents a count since the last time iteration • Consistently high numbers are bad • Is an indication that there isn’t enough parallelism at the OS layer • Might be an indication that queue depth is incorrectly set (non IBM disk subsystems have a default value of 1) • Remember – queue_depth is at the LUN level, so for parallelism it can be better to have 10 100GB LUNS than 1 1TB LUN.
IOSTAT Command – Things to look for that may indicate a disk IO bottleneck With TSM V6. • Also look at “tps” column. This is the indicator for IOs per second (IOPS). • Also, long averages of read/write/service times can indicate a problem. There are different opinions on what constitutes a “long service value.” TSM likes service values less than 5 ms for log and DB reads/writes.
VMSTAT Command – Things to look for that may indicate a disk IO bottleneck With TSM V6. • Many parameters on this command. One recommendation (AIX version) is: vmstat -Itw 10 • This command has low overhead; many customers run this all the time and prune entries after N days. • Look fat the “b” column. (If you’re using “cooked filesystems”.) • how many threads of blocked on i/o. • TSM DB for V6 must reside on a filesystem. • Look at the "p" column if you are using raw LVs • how many threads of blocked on i/o. • Note: TSM V6 does not use raw LVs for the DB.
NMON (Diskbusy Tab) Example #1 (Some disks having high I/O) Weighted Avg for disk. This is the critical thing to look for. Look for disks that are consistently over 50% busy in red. Also see next slide
Perfmon Counters (Windows) • Physical Disk: Avg. Disk Sec./Transfer • Should be less than 25ms • Physical Disk: Avg Disk Queue Length • Optimal when value is 2-3 times the number of disks in the array • Physical Disk: Avg Disk Bytes/Transfer • The Stripe size for the array should be at least the average of this counter • Physical Disk: Disk Bytes/sec • Summary of all these attached to a given controller should be less than 70% of the theoretical throughput • Physical Disk: Split IO/sec • A non-zero value seen here for this counter indicates disk fragmentation.
DB2 STATISTICS DURING INSTRUMENTATION INTERVAL Report (V6.1.5.0/V6.2.3.0 levels) • Report showing some of the information produced from a Snapshot database dbname command • Lots of good information here. Some of this report contains some information that can be used to look at DB and Log Disk performance information. • This is reported on all containers for the DB, so it is difficult to see if a particular container is a problem. • The description of the rows provided can be looked up in the DB2 information Center. • Sample starting place to look up descriptions is here: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.regvars.doc/doc/r0005665.html
DB2 STATISTICS DURING INSTRUMENTATION INTERVAL Report – (Example) Deadlocks detected: 0 --> 0.0/sec Number of lock escalations: 0 --> 0.0/sec Lock waits: 0 --> 0.0/sec Time waited on locks(*): 0.000 sec Locks held: 3 before, 3 after Intern Rollbacks Due To Dlock: 0 --> 0.0/sec Total sorts: 1108 --> 0.9/sec, 0.001 sec/sort Total sort time(*): 967 --> 0.8/sec Sort overflows: 1 --> 0.0/sec Direct reads from database: 19740 --> 16.2/sec, 0.000 sec/read Direct read time: 0.154 Direct writes to database: 31166 --> 25.6/sec, 0.000sec/write Direct write time: 0.221 Number of Log Pages Written: 2011 --> 1.7/sec, 0.0001 sec latency Log Write Time: 0.217 sec Number of Log Writes: 898 --> 0.7/sec (Note: if Log write latency is > 5 ms, user receives a warning message in the report)
Tivoli Storage Manager V6 TSM V6 Disk Tuning with DSxxxx subsystems (excluding DS8xxx)
TSM V6 with DSxxxx subsystems (excluding DS8xxx) • Considered to be “mid-range” disk subsystems. • For best performance, TSM on these types of disk subsystems should have the TSM components separated on different physical spindles. • These subsystems offer great flexibility on how TSM can be configured. • Many types of RAID supported • Very flexible on number of disks per LUN • Can set segment/strip sizes and cache settings by LUN • Different models offer different disk types (fibre vs. SATA) • Different amounts of subsystem cache offered, usually less than a TIER1 class subsystem. • Have the ability to gather performance data to analyze.
Some Suggestions for DB Layout : A “Good” Layout (DSxxxx on fibre disks) Container #1 Option #1: 1. 4+1 RAID5 Array. Stripe Size 256KB, 1 Container, 1 LV, DB2_Parallel_IO=*:4 Pros: Follows Stripe Size Recommendation Cons: Write parity (but OK if Disk Cache exists) Pre-fetching limited to 1 container Data and Indexes only spread across 5 disks
Some Suggestions for DB Layout : A “Better” Layout (DSXXXX on fibre disks) Container #1 Option #2: 4+4 RAID10 Array. Stripe Size 256KB, 1 Container DB2_Parallel_IO=*:4 Raid10 Pros: Follows Stripe Size Recommendation No parity write overhead Faster reads (can read from either set of disks. Cons: More Expensive
Some Suggestions for DB Layout : An Even Better Layout (DSXXXX on fibre disks) Container #2 Container #1 Option #3: 2 x (4+4) RAID10 Arrays. Stripe Size 256KB, 2 Containers, 2 LVs, DB2_Parallel_IO=*:4 Pros: Follows Stripe Size Recommendation Lots more physical Spindles More Containers (more data is pre-fetched) No parity write overhead Faster writes, data more balanced across spindles Cons: Much more Expensive
Some Suggestions for Log Layout : A “Good” Layout (DSxxxx on fibre disks) Option #1: JBOD Active Archive Failover Pros: Fast fibre disk Cache readahead on disks Cons: Single point of failure All logs on 1 disk might be slower
Some Suggestions for Log Layout : A “Better” Layout (DSxxxx on fibre disks) Option #2: RAID1 for Active Log (or TSM Active Log mirroring), RAID1 for Archive Log Active Archive Failover Pros: Fast fibre disk Cache readahead on disks RAID1 Mirror on Active Log (could also do TSM Active Log mirroring here) RAID1 Mirror on Archive Log Cons: All logs on 1 disk might be slower
Some Suggestions for Storage Pool Layout : A “Good” Layout (DSxxxx on fibre or SATA disks) Option #1: 4+1 RAID5 Array Stripe Size 256KB Devclass DISK 4 LVs, 4 stgpool volumes 4 stgpool volumes across array Pros: Follows Stripe Size Recommendation (full stripe write) Best practice N+1 Array should have no more than N Volumes Cons: Write parity (but OK if Disk Cache exists)
DS8xxx • TIER1 class subsystem. • Accepts several different RAID types (RAID5 and RAID10) • The size of the arrays, in terms of the quantity of disk units, is fixed. • No ability to set stripe/segment sizes • Can’t tune cache (for example, you can’t disable it for the DB LUNs). This is not usually a problem, as these TIER1 class subsystems usually have large amounts of cache on them.
DS8xxx • Differing opinions on best ways to set these subsystems up. • Mix it all together and let the technology figure it out • Put same types of workload on the same arrays • The fixed number and size make it difficult to follow some recommended best practices for Tivoli Storage Manager. • One good example is dedicating a set of disks for TSM DB. • If the decision is made to follow this practice, there might be a substantial waste of storage if the database consumes a small percentage of a given RAID array. • This is a decision that needs to be made. TSM support still recommends, event for a TIER1 subsystem such as the DS8xxx for a large TSM server, that this be done.
DS8xxx – One Customer Setup That Was Recommended • Used Raid5 . DS8x is fast and can manage heavy IO, dedicated subsystem to TSM • Had plenty of cache (64GB) • TSM DBs, LOGs, Diskpool on separate extentpools • Distributed the components over as many ranks as possible • Ranks were on different DA (device adapter) pairs • Used as many adapters as possible on DS8k and TSM-server to access LUNs
SVC (San Volume Controller) Overview MDISK Group DS4800 subsystem
SVC (San Volume Controller) Overview OS hdisk
SVC + Dxxxx And StoragePools (SATA drives) • Goal is to maximize the opportunity to maintain sequential performance and read ahead capability. • No more than 2 LUNS per SATA array • For 2 arrays, this would mean 4 mdisks • DS4xxx - Stripe size / segment size: 256K • DS4xxx - Read cache = YES • DS4xxx - Write Cache=YES • DS4xxx - Write Cache mirror=NO (Understand implications of doing this) • DS4xxx - Read Prefetch=YES. (value > 1) • SVC – Cache = NONE (done with chvdisk command) Recommended with SVC code Levels 6.1 and above)
SVC And Storage Pools (SATA drives) Sample customer Step3: These 2 LUNS are presented to the SVC as 2 mdisks Lun2 = Mdisk2 3 Lun1 = Mdisk1 Step2: This array then has 2 LUNs carved from it, each 1TB in size) 2 Step1: Create 4+P 500GB RAID5 Array (~2TB) 1
X2A X2A X1A X1A X1B X1B X2B X2B X1C X1C X2C X2C X2D X2D X1D X1D X1E X1E X2E X2E X2F X2F X1F X1F X1G X1G X2G X2G X2H X2H X1H X1H X2I X2I X1I X1I SVC And StoragePools (SATA drives) Mdisk1 Mdisk2 VDisk1 VDisk2 “Image Mode” VDisks created 1 Volume Group – 2 PVs 2 LVs striped across the 2 PVs 1 Filesystem Per LV Pre-allocate Storagepool volumes across the FS
SVC And Database/Log (Fibre drives) Sample customer ( 1TB Database) Step3: These 2 LUNS are presented to the SVC as 2 mdisks Lun2 = Mdisk2 3 Lun1 = Mdisk1 Step2: This array then has 2 LUNs carved from it, each 550GB in size) 2 1 Step1: Create 2 4+1 146GB RAID5 Arrays (~1.2 TB)
SVC And Database/Log (Fibre drives) Mdisk1 Mdisk2 X2A X1A VDisk1 VDisk2 VDisk3 X1B X2B X1G X1D X1C X2C X1A X2G X2D X2A X2D X1D X1B X1H X1E X1E X2E X2H X2B X2E X2F X1F X1I X1C X1F X1G X2G X2C X2I X2F X2H X1H X2I VG2 – Active Log 128GB X1I VG1 – TSM DB 2 LVs, 2 Containers ~1 TB
SVC And Database/Log (Fibre drives) • Again, goal is to maximize the IOPS to the fibre drives. • Consider wasting space on fastest drives to gain benefit of performance • Strive to isolate the components if possible or use extra space for install images or temp space • DS4xxx - Stripe size / segment size: 64K • DS4xxx - Read cache = NO • DS4xxx - Write Cache=YES • DS4xxx - Write Cache mirror=NO (Understand implications of doing this) • DS4xxx - Read Prefetch=NO. (value = 0) • SVC – Cache = READWRITE (done with chvdisk command)
TSM and XIV • This is an example of a subsystem where there is ongoing discussion about how to set it up optimally. (Hardware vs. Software) • Advantages of XIV include: • it has massive parallelism capabilities. • All LUN storage is randomly spread across all disk units in 1MB stripes. • This topology makes it impossible to separate the DB from the storage pools, but it is offset by the many disks that the IO is spread across. • Recommendation: • Test with the workload you have. • Monitor it for potential IO bottlenecks.
TSM and XIV • If putting the DB under XIV, then use 1 LV, but have multiple containers as recommended for other disk subsystems. (at least 4 or 8 for larger environments). • For storagepools, consider the following: • If you have many volumes on too few LUNS, then you don’t get much benefit from cache readahead from the XIV. You also have to deal with fragmentation with this setup. • Each LUN has a queue depth, so if you have too few, it limits the number of outstanding IOs to a given LUN. • If you have too many LUNs it makes harder to manage by the administrator.