1 / 68

DATA PROTECTOR BACKUP PERFORMANCE WITH TAPE DRIVES

DATA PROTECTOR BACKUP PERFORMANCE WITH TAPE DRIVES. media agent. disk agent. media agent (robotic control). Cell manager. Cell manager. Cell manager. storage area network. disk agent. local backup. SAN backup. IDB. IDB. application server. media agent. disk agent. tape.

smayorga
Download Presentation

DATA PROTECTOR BACKUP PERFORMANCE WITH TAPE DRIVES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA PROTECTOR BACKUP PERFORMANCE WITH TAPE DRIVES

  2. mediaagent disk agent media agent (robotic control) Cell manager Cell manager Cell manager storage area network disk agent local backup SAN backup IDB IDB applicationserver mediaagent disk agent tape tape network-based backup 30 + 30 = 20? 60 MB/sec in. 20 MB/sec out. IDB Gbit LAN disk agent media agent 20 ????????? MB/sec 30 MB/sec 30 MB/sec via NDMP LTO4Drives 2:1 Compression tape tape tape library Page 2

  3. In This Presentation • Performance - What do we mean? • Is there a performance issue? • Is there a performance issue for HP DP Support? • Proviso • Why HP DP Support does not “do” performance. • Why HP DP Support does help with performance. • Streaming – The Secret to High Performance • Stream-Fail – The Secret to Dismal Performance • Data collection • Analysis • Options • Reference

  4. What does performance mean? • In this presentation we are talking about how fast a Backup backs up data to tape. • This is not about Restores, although similar. • This is not about Virtual Tape. • This is about Physical Tape. • All drive references here are to LTO/Ultrium, but, except that Ultrium drives have a range of streaming rates, while other drives have a single rate, the same principles apply to all commonly used backup tape drives.

  5. Performance • “Performance” can mean a number. • “Performance” can mean efficiency. • “Performance” can be subjective. • A customer who upgrades from DLT8000 to LTO3 might be pleased with the faster backups, and not realize the performance is poor compared to the capability of the LTO3. • Or that customer might be very displeased because the LTO3 backup could be slower than the DLT8000 backup. We’ll see why.

  6. Is There a Performance Issue? • Calculate: Bytes backed up Divided by time Divided by number of drives • Equals performance: • how many bytes per second/minute/hour per drive. • Is this value within the STREAMING range for the drive? • No – a performance issue. • An issue for HP DP Support? Sometimes.

  7. An issue for HP DP Support? Things To Check • Recent changes? • Patches? Upgrade? New drive? • Patch – does backing out the patch help? • Upgrade – to 6.00? 6.10? • writedb/readdb to defragment the Filenames tablespace, then keep an eye on it (omnidbutil –info). • New drive? Customer claims “no compression”? • Probably not a Data Protector issue. More likely the backup is not streaming, which takes longer and reduces tape capacity.

  8. Borderline cases • Added an object to the backup and now it takes twice as long. • Two backups of same size, one takes much longer. • “Same” backup in another Cell is twice as fast. • In cases like these, HP DP Support probably will explain streaming, give advice, URLs, proviso (slide 10).

  9. Proviso (1) • AFTER conversation with customer, understanding the issue and agreeing to provide some guidance for a performance issue, HP DP Support will send advice along with a proviso.

  10. Proviso (2) For Customers _________________________________________________________________________ Please note that, in the absence of errors, performance analysis and tuning is not a Response Center service for Data Protector. However, I am familiar with some of the issues and may be able to offer some advice. If my advice does not help, you may wish to obtain the services of a performance specialist. There is excellent performance guidance at this website: HP Surestore and StorageWorks - Performance Troubleshooting and Using Performance Assessment Tools http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=lpg50460 This whitepaper for the Ultrium 960 discusses performance issues. http://h71028.www7.hp.com/ERC/downloads/5982-9971EN.pdf If email puts a space in the URL, just remove it. There are no spaces in these URLs. _________________________________________________________________________

  11. Why HP DP Support Does Not “do” performance • HP DP Support supports the Data Protector product, not its entire environment. • HP DP Support does not control the environment that Data Protector operates in. • HP DP Support personnel are not and cannot be sufficiently familiar with customer’s Data Protector environment. • There are many factors outside Data Protector that affect performance. • Performance analysis and tuning is its own specialty.

  12. Why HP DP Support DOES help with performance HP DP Support personnel: • Know some of the factors that customer might not know. • Know some of the common problems and solutions. • Know of Data Protector issues. • Know the options in Data Protector.

  13. S T R E A M I N G The obscure, misunderstood, esoteric, recondite, abstruse, arcane, shadowy, unseen, unnoticed component in tape backup performance.

  14. Streaming – The Secret to High Performance • Streaming: moving the tape continuously during a backup - Never stop/start. • Streaming requires that data be delivered to the drive above the minimum streaming rate. • Tape drives perform best when they are streaming. • Tapes hold the most data when streaming – 100% of capacity. • Performance degrades severely and precipitously whenstreamingis not achieved (“stream-fail”). • Stream-faildramatically reduces capacity.

  15. Streaming Rate Varies with Compression(1) • No compression, the drive writes one byte of data for each one byte received. One byte in – one byte out. • 2:1 compression, the drive writes one byte for each two bytes received. Two bytes in – one byte out. • 4:1 compression, the drive writes one byte for each four bytes received. Four bytes in – one byte out. • 8:1 compression, the drive writes one byte for each eight bytes received. Eight bytes in – one byte out. • The greater the compression, the faster the data must be delivered to write that one byte.

  16. Streaming Rate Varies with Compression (2) • LTO1 minimum streaming rate with no compression is 6 MB/second. • LTO1 minimum streaming rate with 2:1 compression is 12 MB/second. • LTO1 minimum streaming rate with 4:1 compression is 24 MB/second. • LTO1 minimum streaming rate with 8:1 compression is 48 MB/second. • LTO4 minimum streaming rate with 8:1 compression is 320 MB/second! • Data must be delivered at these rates so the drive can stream the data onto the tape. • Lower data delivery rate results in stream-fail.

  17. S T R E A M I N G Big deal! Of course modern tape drives are fast. So what?

  18. People naturally expect that when the rate of data delivery slows down the backup will slow down at about the same rate.

  19. It’s much worse than that! • Performance drops precipitously, down to less than 1% of best performance, because of repeated repositioning (0.25 – 3 seconds each time).

  20. Stream-Fail – The Secret to DISMAL Performance • Stream-fail • Costs Time • Costs Capacity • When the rate of data delivery drops below the streaming rate, the tape repositions – stop, reverse, forward, stop – known as the “shoe-shining” effect. This is “stream-fail”. • Each stream-fail costs 0.25 - 3 seconds. • Each stream-fail leaves 10-100 MB of position-markers on the tape between data blocks. • Stream-fail can reduce performance to less than 1% and capacity to 8% or less! We Want STREAMING. L We Don’t Want “S T R E A M – F A I L” L bad bad bad bad bad bad bad bad bad bad bad bad bad bad bad bad

  21. True Story (1) • RCE said customer’s new LTO3 was not compressing. • How did he know? How did he conclude this? • The tape was full at 50 GB and it took 8 hours to get full! • Hmmmm. LTO3 at 2:1 compression can put 800 GB on a tape and will put 50 GB on that tape in 6-18 minutes, but needs to receive 54-240 MB/second to do it. • NOTE that the compression factor is stated. It is needed to calculate the streaming rate for compressed data. LHe was backing up one filesystem from a simple disk. LReading the filesystem at perhaps 6 MB/s – far short of 54 MB/s. LNOT STREAMING! NOT EVEN CLOSE! LHe had 50 GB of data intermixed with 750 GB of non-data gaps from stream-fails!

  22. True Story (2) Stream-Fail Reduces Capacity Streamed tape ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- datadatadatadatadatadatadatadatadatadatadatadatadatadatadatadatadata ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- J data = 100% of tape J J Elapsed time = 6-18 minutes J Stream-failed tape ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- dataL s p a c e LdataL s p a c e LdataL s p a c e LdataL s p a c e Ldata ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Ldata = 8% of tape L Lnon-dataL = 92% of tape L LElapsed time = 480 minutes L L = 8 hours L L 0.8% of best performance L

  23. Did It Stream? 1 TB in ten hours @2:1 (1) 1 x LTO1 (40-100 GB/hour) - Streaming at top speed! 2 x LTO1 - ??? 3 x LTO1 - Not 1 x LTO2 (67-200 GB/hour) - ??? 2 x LTO2 - ??? 3 x LTO2 - ??? 1 x LTO3 (190-560 GB/hour) - ??? 1 x LTO4 (290-860 GB/hour) - ???

  24. Did It Stream? 1 TB in ten hours @2:1 (2) 1 x LTO1 (40-100 GB/hour) - Streaming at top speed! 2 x LTO1 - Streaming at 50 GB/hour. 3 x LTO1 - Not! 33 GB/hour. 1 x LTO2 (67-200 GB/hour) - Streaming at 100 GB/hour. 2 x LTO2 - Not! 50 GB/hour. 3 x LTO2 - Not! 33 GB/hour. 1 x LTO3 (190-576 GB/hour) - No way! 100 GB/hour. 1 x LTO4 (288-864 GB/hour) - Sorry! 100 GB/hour.

  25. DATA COLLECTION

  26. Data from Customer • How many tape drives? • How many tapes? • How much data was backed up? • How long did it take? • What kind of tape drive(s)? • What generation cartridges? (LTO2 cartridge in LTO3 drive?) • Needed to calculate whether the backup streamed. • Network backup or local backup? • What is the network speed? • How many Objects? (Filesystem on UNIX; Disk on Windows) • Recent changes?

  27. NETWORK BACKUP OR LOCAL BACKUP? Cell Console Local Backup Network Backup Disk Agent Disk Agent TCP/IP TCP/IP Cell Manager Shared Memory Scheduler TCP/IP Session Manager Session Manager TCP/IP TCP/IP Media Agent Media Agent

  28. [Normal] From: BMA@server2.com "drive_5" Time: 05/21/05 03:01:23 STARTING Media Agent "drive_5" [Normal] From: BMA@server2.com "drive_5" Time: 05/21/05 03:01:25 Loading medium from slot 50 to device /dev/rmt/2mn [Normal] From: OB2BAR@server1.com “server1" Time: 05/21/05 03:02:39 Starting OB2BAR Backup: /dbs01/0 (dbspace) [Normal] From: OB2BAR@server1.com “server1" Time: 05/21/05 03:18:47 Completed OB2BAR Backup: /dbs01/0 (dbspace) [Normal] From: BMA@server2.com "drive_5" Time: 05/21/05 12:22:47 COMPLETED Media Agent "drive_5“ Backup Statistics: Session Queuing Time (hours) 0.00 ---------------------------------------- Completed Disk Agents ........ 62 Failed Disk Agents ........... 0 Aborted Disk Agents .......... 0 ---------------------------------------- Disk Agents Total ........... 62 ======================================== Completed Media Agents ....... 1 Failed Media Agents .......... 0 Aborted Media Agents ......... 0 ---------------------------------------- Media Agents Total .......... 1 ======================================== Mbytes Total .............270983 MB Used Media Total ............. 1 Disk Agent Errors Total ...... 0 Data in DP Session Session Report from GUI or CLI Command lineomnidb –session 2005/05/21-3 –report Reports bytes per backup. Use session_devices report to get bytes per drive.For capacity, you still must determine bytes per tape if multiple tapes.

  29. “Display Statistical Information”(1) Display statistical info Options tab Other tab Advanced options

  30. “Display Statistical Information” (2) • Enable “Display Statistical Info” in the backup specification (Backup GUI, select backup, Options tab, Filesystem Options, Advanced button, Other tab) • Statistical information is displayed for each Disk Agent in the Session Report. [Normal] From: OB2BAR@server1.com “server1" Time: 05/21/05 03:02:39 Starting OB2BAR Backup: /dbs01/0 (dbspace) Directories……… 0 Regular Files….. 1 ------------------------------------------------- Objects Total….. 1 Kbytes Total….. 6181958 ----------Note that this is KB, not MB. At completion of Disk Agent: [Normal] From: OB2BAR@server1.com “server1" Time: 05/21/05 03:18:47 Backup Profile: Run Time ........... 0:16:08 Backup Speed ....... 6386.32 (KB/s) ----------Note that this is KB, not MB.

  31. Session Objects Report • To see each object’s performance omnirpt -report session_objects -session <sess_ID> omnirpt –report session_objects -session 2005/05/21-3 Session Objects Report Cell Manager: server1.com Creation Date: 05/23/05 11:29:03 Object Type Client Mountpoint Description Status Mode Start Time Duration [hh:mm] Size [kB] # Files Performance [MB/min] Protection # Errors # Warnings Device ________________________________________________________________________________ BAR server1.com /dbs01/0 Informix Completed full 05/21/05 03:01:48 0:16 6181958 1 373.81 07/16/05 03:01:48 0 0 drive_5 BAR server1.com /dbs02/0 Informix Completed full 05/21/05 03:17:57 0:14 6187042 1 405.05 07/16/05 03:17:57 0 0 drive_5

  32. Session Devices Report • To see a tape drive’s write rate omnirpt -report session_devices -session <sess_ID> omnirpt -report session_devices -session 2005/05/21-3 Session Devices Report Cell Manager: server1.com Creation Date: 05/23/05 11:28:18 Device Start End Duration GB WrittenPerf [GB/h] # Objects # Media _______________________________________________________________________________ drive_5 05/21/05 03:01:28 05/21/05 12:19:28 9:18 264.31 28.42 62 1 This is per drive, not per tape.

  33. Ob2TapeStatistics(1) • global file option • Disabled by default # Ob2TapeStatistics=0 or 1 # default: 0 # If enabled, this option allows tape statistics logging into # media.log file.

  34. Ob2TapeStatistics(2) • Set in global file on Cell Manager • Windows C:\Program Files\OmniBack\Config\Server\Options\global C:\ProgramData\OmniBack\Config\Server\Options\global • UNIX /etc/opt/omni/server/options/global • Logged to media.log on Cell Manager • Windows C:\Program Files\OmniBack\log\Server\media.log C:\ProgramData\OmniBack\log\Server\media.log • UNIX /var/opt/omni/server/log/media.log

  35. Ob2TapeStatistics(3) • Entries are placed in media.log upon close of media 05/21/05 12:21:45 cf98e98f:403a4182:2e50:0001 "[DKL002] INFORMIX_60" [TAPE WRITE STATISTICS] logical drive=drive_5 errsubdel=59586 errposdel=0 total=744 toterrcorr=744 totcorralgproc=0 totb=97714108800 totuncorrerr=0 • errsubdel= errors corrected with substantial delays • errposdel= errors corrected with possible delays • total= total number of re-writes • toterrcorr= total errors corrected • totcorralgproc = total number of times correction algorithm processed • totb = total blocks processed, after compression. This field has different units for different drive types. For many drives it’s bytes. • For LTO it’s the number of datasets, which is the Data Protector data within the data blocks. • For LTO1/2 the size of the dataset is 403,884 compressed bytes. Slightly larger for LTO3/4. • This value divided into the bytes sent to the drive (see Session report or Session Devices report) gives the compression ratio. • totb x 403884 bytes = data-bytes-written-to-tape • bytes-sent-to-drive-in-session / (data-bytes-written-to-tape 1 + data-bytes-written-to-tape 2 + ...) = compression ratio NOTE: NEED TO SUM totb FOR ALL THE TAPES WRITTEN BY THE BACKUP BECAUSE WE DO NOT HAVE SESSION-DATA-TO-DRIVE TOTALS FOR INDIVIDUAL TAPES. • totuncorrerr= total uncorrected errors • Usually error counts will be very low. High error rates degrade performance and Merit a hardware call.

  36. A N A L Y S I S

  37. Measuring Backup Performance • What is the performance of each Disk Agent? • Are there enough DAs to achieve data delivery at the minimum streaming rate? • Compute overall compression ratio for the session. • Find all the tapes in media.log and add up “totb” for them, divide into bytes backed up. • Calculate what would be a streaming number of tapes used. (For LTO3, one tape per 800 GB at 2:1) • Overall, how many bytes were sent to each drive?bytes-to-drives / session duration / # of drives = bytes per time per drive. • 1.6 TB per 2 hours per 2 drives = 800 GB per hour per 2 drives = 400 GB per hour per drive • Is that a streaming rate for LTO3 with LTO3 cartridge at 2:1 compression? • Is this within streaming range for the drive?

  38. Troubleshooting Media Agent Performance • Performance is: bytes written per time period per drive. • Use “omnirpt -report session_devices -session <sess_ID>” plus Ob2TapeStatistics to calculate compression rate. • Must total the statistics for every tape written to all the drives that the backup used. • It is possible to calculate per drive only if you can determine which drive used which tapes. • UNFORTUNATELY, if a backup uses two or more drives, the only way to determine which tapes were used in which drives is to use the Backup Session report and list the storage slots used by the BMAs, and to know which tapes were in those slots at the time of the backup. That slot-tape information is not available after a tape has been moved. • Test a drive’s performance independently of Data Protector with LTT - Library and Tape Tools http://h18006.www1.hp.com/products/storageworks/ltt/index.html?jumpid=reg_R1002_USEN

  39. Troubleshooting Disk Agent (disk) performance • Use “session_objects” report • Enable DP’s “Display Statistical Info” • Shows object Backup Profile/Statistics in session report • Run VBDA (Volume Backup Disk Agent) standalone • The standalone test shows what the Disk Agent actually can do without being slowed down by tape drive repositioning. • Run test backup to /dev/null (Unix) or C:\nul (Windows). • “-profile” means “Display statistical info”. /opt/omni/lbin/vbda –vol /opt –trees /opt/omni –out /dev/null –profile Kbytes Total ………. 774347 Run Time ………….. 0:01:10 Backup Speed ……. 11062.10 (KB/s)

  40. Cell manager local backup TroubleshootingNetworkPerformance network-based backup SAN (LAN-free) backup IDB disk agent media agent via NDMP disk agent tape tape Page 40

  41. Troubleshooting Network performance • Network backup = DA and MA on different hosts. • Network must have sufficient capacity to move data at high rates. • 10BaseT (<1 MB/sec) and 100BaseT (<10 MB/sec) are slow compared to current tape drive transfer rates. • 1000BaseT (<100 MB/sec) can handle some network backups. • 1000BaseT is insufficient for LTO3/LTO4 much above their minimum streaming rates (54/80 MB/s) for no more than one of those drives. • Better performance is possible using SAN. • TEST: ftp large or many files between DA and MA hosts. • TEST: Run test backup to a Data Protector drive defined with /dev/null (Unix) or nul (Windows). • TEST: Use PAT - Performance Analysis Tools http://www.hp.com/support/pat

  42. - OPTIONS - - SOLUTIONS - - TUNING -

  43. Tuning for Performance (1) • Ensure current patches, firmware, drivers are installed • Data Protector, Operating System, Drives, Tape Library, NSR, SAN Switch, etcetera. • Software Compression • Don’t use it - causes high CPU overhead. • If used, except for Ultrium/LTO, disable hardware compression, otherwise non-LTO drive will produce larger blocks and run slower. • Hardware Compression • On by default.

  44. Tuning for Performance (2) Data Protector Settings • Block Size • Equivalent to data transfer size. • Should be at least 64KB. • Use the Data Protector default setting, except for LTO/Ultrium. • See “Whitepaper for the Ultrium 960” link in Reference section. • Segment Size • Defines the amount of data DP writes to tape before a catalog segment is written. • Increasing this parameter will improve the importing speed of tapes and may improve write performance. • Uses more memory on the MA host.

  45. Tuning for Performance (3) Data Protector Settings • Disk Agent buffers • Set in the Data Protector tape drive definition. • It’s the number of buffers set up in memory for Disk Agents on both the Disk Agent host(s) and the BMA host. • The memory is shared if both the DAs and BMA are on the same host. • Default of 8 is reasonable. • Might help to increase number of buffers. Usually not helpful. But in one case, Media Copy that took 24 hours was reduced to 4 hours by increasing DA buffers to 8 from 32. • Memory used on the MA host • Block Size x DA Buffers x Concurrency

  46. Tuning for Performance (4) Data Protector Settings • Concurrency • Specifies the number of Disk Agents writing simultaneously to a Media Agent. • Range of 1-32, default is 4. • Has negative effect on restore performance when not doing complete restore of the backup. • Keep to minimum needed for tape drive streaming. • Data Protector IDB logging level • Limit logging to level required for restore.

  47. Tuning for Performance (5) Data Protector Settings • CRC checking • More useful with older, less robust tape & drive technology. • Allows discovery of data corruption AFTER the backup, during Verify or Restore. • High CPU usage. • See “HP Data Protector software performance white paper” in Reference section: In tests with LTO3, enabling CRC reduced backup performance by 20%.

  48. Tuning for Performance (6) Data Protector Settings • Software Compression • Helps with limited capacity LAN. • Except for LTO/Ultrium drives, disable hardware compression, otherwise, drive performance will drop and data blocks will expand. • UNIX - device file • Windows - N at the end of the SCSI path specification • High CPU • See “HP Data Protector software performance white paper” link, page 54, in Reference section: Figure 36 shows that enabling the software compression increased the CPU load from 13% to 99%. The CPU load was very high because Data Protector compressed five file systems in parallel.

  49. Tuning for Performance (7) Data Protector Settings • Set Object Order by Size Backups typically run slower, often dramatically slower, near the end. • Concurrency effectively drops because there are no additional Disk Agents to replace a completed Disk Agent so fewer and fewer Disk Agents are running. • With reduced Concurrency, the data delivery rate drops below the drive’s minimum streaming rate. This causes a precipitous drop in performance. • Rearrange the order in which Objects will be backed up so that Objects of about the same size are grouped together and backed up concurrently. • On the Summary tab for a Backup Specification, or on the Summary page of the wizard for a new Backup Specification, change the Object order by right-click, “Move up” or “Move down”. • The Objects stay listed in the same order but the new Object Order is shown on the far right, so either stretch the window or scroll right to see the ordering. • The Object listing can be rearranged by clicking on the “Order” column heading.

  50. Tuning: Data Protector Tape Drive Options (1) Sizes Settings Devices & Media 256 KB for Ultrium 960 and newer. See “Whitepaper for the Ultrium 960” in Reference section Advanced options

More Related