1 / 111

Understanding Disk I/O By Charles Pfeiffer (888) 235-8916 CJPfeiffer@RemoteControlDBA RemoteControlDBA

Understanding Disk I/O By Charles Pfeiffer (888) 235-8916 CJPfeiffer@RemoteControlDBA.com www.RemoteControlDBA.com. Agenda. Arrive 0900 – 0910 Section 1 0910 – 1000 Break 1000 – 1010 Section 2 1010 – 1100 Break 1100 – 1110 Section 3 1110 – 1200 Break 1200 – 1330

muriel
Download Presentation

Understanding Disk I/O By Charles Pfeiffer (888) 235-8916 CJPfeiffer@RemoteControlDBA RemoteControlDBA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Disk I/O By Charles Pfeiffer (888) 235-8916 CJPfeiffer@RemoteControlDBA.com www.RemoteControlDBA.com

  2. Agenda • Arrive 0900 – 0910 • Section 1 0910 – 1000 • Break 1000 – 1010 • Section 2 1010 – 1100 • Break 1100 – 1110 • Section 3 1110 – 1200 • Break 1200 – 1330 • Section 4 1330 – 1420 • Break 1420 – 1430 • Section 5 1430 – 1520 • Break 1520 – 1530 • Q&A 1530 – 1630

  3. Section 1 • General Information • RAID • Throughput v. Response Time

  4. Who Is This Guy? • Been an independent consultant for 11 years • Sun Certified Systems Administrator • Oracle Certified Professional • Taught Performance and Optimization class at Learning Tree • Taught UNIX Administration class at Virginia Commonwealth University • Primarily focus on complete system performance analysis and tuning

  5. What Is He Talking About? • Disks are horrible! • Disks are slow! • Disks are a real pain to tune properly! • Multiple interfaces and points of bottlenecking! • What is the best way to tune disk IO? Avoid it! • Disks are sensitive to minor changes! • Disks don’t play well in the SAN Box! • You never get what you pay for! • Thankfully, disks are cheap!

  6. What Is He Talking About? (continued) • Optimize IO for specific data transfers • Small IO is easy, based on response time • Improved with parallelism, depending on IOps • Improved with better quality disks • Large IO is much more difficult • Increase transfer size. Larger IO slows response time! • Spend money on quantity not quality. Stripe wider! • You don’t get what you expect (label spec) • You don’t even come close!

  7. Where Do Vendors Get The Speed Spec From? • 160 MBps capable does not mean 160 MBps sustained • Achieved in optimal conditions • Perfectly sized and contiguous disk blocks • Streamline disk processing • Achieved via a disk-to-disk transfer • No OS or FileSystem

  8. What Do I Need To Know? • What is good v. bad? • What are realistic expectations in different cases? • How can you get the real numbers for yourself? • What should you do to optimize your IO?

  9. Why Do I Care? • IO is the slowest part of the computer • IO improves slower than other components • CPU performance doubles every year or two • Memory and disk capacity double every year or two • Disk IO Throughput doubles every 10 to 12years! • A cheap way to gain performance • Disks are bottlenecks! • Disks are cheap. SANs are not, but disk arrays are!

  10. What Do Storage Vendors Say? • Buy more controllers • Sure, if you need them • How do you know what you need? • Don’t just buy them to see if it helps • Buy more disks • Average SAN disk performs at < 1% • 50 disks performing at 1% = ½ disk • Try getting 20 disks to perform at 5% instead (= 1 whole disk)

  11. What Do Storage Vendors Say? (continued) • Buy more cache • Sure, but its expensive • Get all you can get out of the cheap disks first • Fast response time is good • Not if you are moving large amounts of data • Large transfers shouldn’t get super-fast response time • Fast response time means you are doing small transfers

  12. What Do Storage Vendors Say? (continued) • Isolate the IO on different subsystems • Just isolate the IO on different disks • Disks are the bottleneck, not controllers, cache, etc. • Again, expensive. Make sure you are maximizing the disks first.

  13. What Do Storage Vendors Say? (continued) • Remove hot spots • Yes, but don’t do this blindly! • Contiguous blocks reduce IOps • Balance contention (waits) v. IOps (requests) carefully! • RAID-5 is best • No its not, its just easier for them!

  14. The Truth About SAN • SAN = scalability • Yeah, but internal disk capacity has caught up • SAN != easy to manage • SAN = performance • Who told you that lie? • SAN definitely != performance

  15. The Truth About SAN (continued) • But I can stripe wider and I have cache, so performance must be good • You share IO with everyone else • You have little control over what is on each disk • Hot Spots v. Fragmentation • Small transfer sizes • Contention

  16. How Should I Plan? • What do you need? • Quick response for small data sets • Move large chunks of data fast • A little of both • Corvettes v. Dump Trucks • Corvettes get from A to B fast • Dump Trucks get a ton of dirt from A to B fast

  17. RAID Performance Penalties • Loss of performance for RAID overhead • Applies against each disk in the RAID • The penalties are: • RAID-0 = None • 1, 0+1, 10 = 20% • 2 = 10% • 3, 30 = 25% • 4 = 33% • 5, 50 = 43%

  18. Popular RAID Configurations • RAID-0 (Stripe or Concatenation) • Don’t concatenate unless you have to • No fault-tolerance, great performance, cheap • RAID-1 (Mirror) • Great fault-tolerance, no performance gain, expensive • RAID-5 (Stripe With Parity) • medium fault-tolerance, low performance gain, cheap

  19. Popular RAID Configurations (continued) • RAID-0+1 (Two or more stripes, mirrored) • Great performance/fault-tolerance, expensive • RAID-10 (Two or more mirrors, striped) • Great performance/fault-tolerance, expensive • Better than RAID-0+1 • Not all hardware/software offer it yet

  20. RAID-10 Is Better Than RAID-0+1 • Given: six disks • RAID-0+1 • Stripe disks one through three (Stripe A) • Stripe disks four through six (Stripe B) • Mirror stripe A to stripe B • Lose Disk two. Stripe A is gone • Requires you to rebuild the stripe

  21. RAID-10 Is Better Than RAID-0+1 • RAID-10 • Mirror disk one to disk two • Mirror disk three to disk four • Mirror disk five to disk six • Stripe all six disks • Lose Disk two. Just disk two is gone • Only requires you to rebuild disk two as a submirror

  22. The Best RAID For The Job

  23. Throughput Is Opposite Of Response Time

  24. Common Throughput Speeds (MBps) • Serial = 0.014 • IDE = 16.7, Ultra IDE = 33 • USB1 = 1.5, USB2 = 60 • Firewire = 50 • ATA/100 = 12.5, SATA = 150, Ultra SATA = 187.5

  25. Common Throughput Speeds (MBps) (continued) • FW SCSI = 20, Ultra SCSI = 40, Ultra3 SCSI = 80, Ultra160 SCSI = 160 Ultra320 SCSI = 320 • Gb Fiber = 120, 2Gb Fiber = 240, 4Gb Fiber = 480

  26. Expected Throughput • Vendor specs are maximum (burst) speeds • You won’t get burst speeds consistently • Except for disk-to-disk with no OS (e.g. EMC BCV) • So what should you expect? • Fiber = 80% as best-case in ideal conditions • SCSI = 70% as best-case in ideal conditions • Disk = 60% as best-case in ideal conditions • But even that is before we get to transfer size

  27. BREAK See you in 10 minutes

  28. Section 2 • Transfer Size • Mkfile • Metrics

  29. Transfer Size • Amount of data moved in one IO • Must be contiguous block IO • Fragmentation carries a large penalty! • Device IOps limits restrict throughput • Maximum transfer size allowed is different for different file systems and devices • Is Linux good or bad for large IO?

  30. Transfer Size Limits • Controllers = Unlimited • Disks and W2K3 NTFS = 2 MB • Remember the vendor Speed Spec • W2K NTFS, VxFS and UFS = 1 MB

  31. Transfer Size Limits (continued) • NT NTFS and ext3 = 512 KB • ext2 = 256 KB • FAT16 = 128 KB • Old Linux = 64 KB • FAT = 32 KB

  32. So Linux Is Bad?! • Again, what are you using the server for? • Transactional (OLTP) DB = fine • Web server, small file share = fine • DW, large file share = Might be a problem!

  33. Good Transfer Sizes • Small IO / Transactional DB • Should be 8K to 128K • Tend to average 8K to 32K • Large IO / Data Warehouse • Should be 64K to 1M • Tend to average 16K to 64K • Not very proportional compared to Small IO! • And it takes some tuning to get there!

  34. Find Your AVG Transfer Size • iostat –exn (from a live Solaris server) extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 2.8 1.1 570.7 365.3 0.0 0.1 2.9 19.0 1 3 0 0 0 0 d10 • (kr/s + kw/s) / (r/s + w/s) • (570.7 + 365.3) / (2.8 + 1.1) = 240K

  35. Find Your AVG Transfer Size (continued) • PerfMon

  36. Find Your AVG Transfer Size (continued) • AVG Disk Bytes / AVG Disk Transfers • Allow PerfMon to run for several minutes • Look at the average field for Disk Bytes/sec • Look at the average field for Disk Transfers/sec

  37. The mkfile Test • Simple, low-overhead, write of a contiguous (as much as possible) empty file • Really is no comparison! Get cygwin/SFU on Windows to run the same test • ‘time mkfile 100m /mountpoint/testfile’ • Real is total time spent • Sys is time spent on hardware (writing blocks) • User is time spent at keyboard/monitor

  38. The mkfile Test (continued) • User time should be minimal • Time in user space in the kernel • Not interacting with hardware • Waiting for user input, etc. • Unless its waiting for you to respond to a prompt, like to overwrite a file

  39. The mkfile Test (continued) • System time should be 80% of real time • Time in system space in the kernel • Interacting with hardware • Doing what you want, reading from disk, etc. • Real – (System + User) = WAIT • Any time not directly accounted for by the kernel is time spent waiting for a resource • Usually this is waiting for disk access

  40. The mkfile Test (continued) • Common causes for waits • Resource contention (disk or non-disk) • Disks are to busy • Need wider stripes • Not using all of the disks in a stripe • Disks repositioning • Many small transfers due to fragmentation • Bad block/stripe/transfer sizes

  41. The Right Block Size • Smaller for small IO, bigger for large IO • The avg size of data written to disk per individual write • In most cases you want to be at one extreme • As big as you can for large IO / as small as you can for small IO • Balance performance v. wasted space. Disks are cheap! • Is there an application block size? • OS block size should be <= app block size

  42. More iostat Metrics • iostat –exn (from a live Solaris server) extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 2.8 1.1 570.7 365.3 0.0 0.1 2.9 19.0 1 3 0 0 0 0 d10 • %w (wait) = 1. Should be <= 10. • %b (busy) = 3. Should be <= 60. • Asvc_t = 19 (ms response). Most argue that this should be <= 5, 10 or 20 in today’s technology. Again, response v. throughput.

  43. iostat On Windows • Not so easy • PerfMon can get you %b • Physical Disk > % Disk Time • Not available in cygwin or SFU • So what do you do for %w or asvc_t • Not much • You can ID wait issues as demonstrated later • Depend on the array/SAN tools

  44. vmstat Metrics • Vmstat procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b w swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 163608 77620 0 0 3 1 1 0 5 11 1 3 96 0 • b+w = (blocked/waiting) processes • Should be <= # of logical CPUs • us(er) v. sy(stem) CPU time

  45. vmstat Metrics (continued) • Is low CPU idle bad? • Low is not 0 • Idle cycles = money wasted • Need to be able to process all jobs at peak • Don’t need to be able to process all jobs at peak and have idle cycles for show! • Better off watching the run/wait/block queues • Run queue should be <= 4 * # of logical CPUs

  46. vmstat On Windows • Cygwin works (b/w consolidated to b)

  47. vmstat On Windows (continued) • PerfMon • System time = idle time – user time

  48. vmstat on Windows (continued) • PerfMon • Run Queue is per processor (<=4) • Block/Wait queue is blocking queue length

  49. Additional Metrics • Do not swap! • On UNIX you should never swap • Use your native OS commands to verify • Don’t trust vmstat • On Windows some swap is OK • Use PerfMon to check Pages/sec. • Should be <= 100 • Use ‘free’ in cygwin

  50. Additional Metrics (continued) • Network IO issues will make your server appear slow • ‘netstat –in’ displays errors/collisions • Collisions are common on auto-negotiate networks • Hard set the switch and server link speed/mode • Use ‘net statistics workstation’ on Windows

More Related