1 / 63

SAN Disk Metrics

SAN Disk Metrics. Measured on Sun Ultra & HP PA-RISC Servers, StorageWorks MAs & EVAs, using iozone V3.152. Current Situation. UNIX External Storage has migrated to SAN Oracle Data File Sizes: 1 to 36 GB (R&D) Oracle Servers are predominantly Sun “Entry Level”

gaille
Download Presentation

SAN Disk Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAN Disk Metrics Measured on Sun Ultra & HP PA-RISC Servers, StorageWorks MAs & EVAs, using iozone V3.152

  2. Current Situation • UNIX External Storage has migrated to SAN • Oracle Data File Sizes: 1 to 36 GB (R&D) • Oracle Servers are predominantly Sun “Entry Level” • HPQ StorageWorks: 24 MAs, 2 EVAs • 2Q03 SAN LUN restructuring using RAID 5 only • Oracle DBAs continue to request RAID 1+0 • Roadmap for future - needed

  3. Purpose Of Filesystem Benchmarks • Find Best Performance • Storage, Server, HW options, OS, and Filesystem • Find Best Price/Performance • Restrain Costs • Replace “Opinions” with Factual Analysis • Continue Abbott UNIX Benchmarks • Filesystems, Disks, and SAN • Benchmarking began in 1999

  4. Goals • Measure Current Capabilities • Find Bottlenecks • Find Best Price/Performance • Set Cost Expectations For Customers • Provide a Menu of Configurations • Find Simplest Configuration • Satisfy Oracle DBA Expectations • Harmonize Abbott Oracle Filesystem Configuration • Create a Road Map for Data Storage

  5. Preconceptions • UNIX SysAdmins • RAID 1+0 does not vastly outperform RAID 5 • Distribute Busy Filesystems among LUNs • At least 3+ LUNs should be used for Oracle • Oracle DBAs • RAID 1+0 is Required for Production • I Paid For It, So I Should Get It • Filesystem Expansion On Demand

  6. Web serving:Small, integrated system CPU Database/CRM/ERP: Storage Oracle Server Resource Needs in 3D Memory I/O

  7. Sun Servers for Oracle Databases • Sun UltraSPARC UPA Bus Entry Level Servers • Ultra 2, 2x300 MHz Ultra SPARC-II, Sbus, 2 GB • 220R, 2x450 MHz Ultra SPARC-II, PCI, 2 GB • 420R, 4x450 MHz Ultra SPARC-II, PCI, 4 GB • Enterprise Class Sun UPA Bus Servers • E3500, 4x400 MHz Ultra SPARC-II, UPA, Sbus, 8 GB • Sun UltraSPARC Fireplane (Safari) Entry Level Servers • 280R, 2x750 MHz Ultra SPARC-III, Fireplane, PCI, 8 GB • 480R, 4x900 MHz Ultra SPARC-III, Fireplane, PCI, 32 GB • V880, 8x900 MHz Ultra SPARC-III, Fireplane, PCI, 64 GB • Other UNIX • HP L1000, 2x450 PA-RISC, Astro, PCI, 1024 MB

  8. Oracle UNIX Filesystems • Cooperative Standard between UNIX and R&D DBAs • 8 Filesystems in 3 LUNs • /exp/array.1/oracle/<instance> binaries & config • /exp/array.2-6/oradb/<instance> data, index, temp, etc… • /exp/array.7/oraarch/<instance> archive logs • /exp/array.8/oraback/<instance> export, backup (RMAN) • Basic LUN Usage • Lun1: array.1-3 • Lun2: array.4-6 • Lun3: array.7-8 (Initially on “far” Storage Node)

  9. StorageWorks SAN Storage Nodes • StorageWorks: DEC -> Compaq -> HPQ • A traditional DEC Shop • Initial SAN equipment vendor • Brocade Switches resold under StorageWorks label • Only vendor with complete UNIX coverage (2000) • Sun, HP, SGI, Tru64 UNIX, Linux • EMC, Hitachi, etc… could not match UNIX coverage • Enterprise Modular Array (MA) – “Stone Soup” SAN • Buy the controller, then 2 to 6 disk shelves, then disks • 2-3 disk shelf configs have led to problem RAIDsets which have finally been reconfigured in 2Q2003 • Enterprise Virtual Array (EVA) – Next Generation

  10. MA 8000

  11. EVA

  12. 2Q03 LUN Restructuring – 2nd Gen SAN • “Far” LUNs pulled back to “near” Data Center • 6 disk, 6 shelf MA RAID 5 RAIDsets • LUNs are partitioned from RAIDsets • LUNs are sized as multiples of disk size • Multiple LUNs from different RAIDsets • Busy filesystems are distributed among LUNs • Server and Storage Node SAN Fabric Connections mated to common switch

  13. Results – Generalizations • Read Performance - Server Performance Baseline • Basic Measure of System Bus, Memory/Cache, & HBA • Good evaluation of dissimilar server I/O potential • Random Write - Largest Variations in Performance • Filesystem & Storage Node Selection • Dominant Variables • Memory & Cache – Important • Processor Cache, System I/O Buffers, Virtual Memory • All boost different data stream size performance • More Hardware, OS, & Fsys selections • To be evaluated

  14. IOZONE Benchmark Utility • File Operations • Sequential Write & Re-write • Sequential Read & Re-read • Random Read & Random Write • Others are available: • record rewrite, read backwards, read strided, fread/fwrite, pread/pwrite, aio_read/aio_write • File & Record Sizes • Ranges or individual sizes may be specified

  15. IOZONE – Output: UFS Seq Read

  16. IOZONE – UFS Sequential Read

  17. IOZONE – UFS Random Read

  18. IOZONE – UFS Sequential Write

  19. IOZONE – UFS Random Write

  20. Results – Server Memory • Cache • Influences small data stream performance • Memory - I/O buffers and virtual memory • Influences larger data stream performance • Large Data Streams need Large Memory • Past this limit => Synchronous performance

  21. Results – Server I/O Potential • System Bus • Sun: UPA replaced by SunFire • Peripheral Bus: PCI vs. SBus • Sbus (Older Sun only) • Peak Bandwidth (25 MHz/64-bit) ~200 MB/sec • Actual Thruput ~50-60 MB/sec (~25+%) • PCI (Peripheral Component Interconnect) • Peak Bandwidth (66 MHz/64-bit) ~530 MB/sec • Actual Thruput ~440 MB/sec (~80+%)

  22. Server – Sun, UPA, SBus

  23. Server – Sun Enterprise, Gigaplane/UPA, SBus

  24. Server – Sun, UPA, PCI

  25. Server – HP, Astro Chipset, PCI

  26. Server – Sun, Fireplane, PCI

  27. Results – MA vs. EVA • MA RAID 1+0 & RAID 5 vs. EVA RAID 5 • Sequential Write • EVA RAID 5 is 30-40% faster than MA RAID 1+0 • EVA RAID 5 is up to 2x faster than MA RAID 5 • Random Write • EVA RAID 5 is 10-20% slower than MA RAID 1+0 • EVA RAID 5 is up to 4x faster than MA RAID 5 • Servers were SunFire 480Rs, using UFS+logging. • EVA: 12 72 GB FCAL Disk RAID 5 partitioned LUN • MA: 6 36 GB SCSI Disk RAIDset

  28. RAID 0 RAID 1

  29. RAID 3 RAID 5

  30. RAID 1+0 RAID 0+1

  31. Results – MA RAIDsets • Best: 3 mirror, 6 shelf RAID 1+0 • 3 mirror RAID 1+0 on 2 shelves only yield 80% of 6 shelf version • 2 disk mirror (2 shelves) yields 50%

  32. Results – MA RAIDsets • Best: 3 mirror, 6 shelf RAID 1+0 • 6 disk, 6 shelf RAID 5: • Sequential Write: 75-80% • Random Write: 25-50% (2 to 4 times slower) • 3 disk, 3 shelf RAID 5: • Sequential Write: 40-60% • Random Write: 25-60% • Can outperform 6 disk RAID 5 on random write

  33. Results – LUNs from Partitions • 3 Simultaneous Writers • Partitions of same RAIDset • Write performance (S or R) • Less than 50% of no-contention performance • No control test performed: • 3 servers write to 3 different RAIDsets of same Storage Node • Where is the Bottleneck? • RAIDset, SCSI channels, or Controllers?

  34. Results – Fabric Locality • In production, “far” LUNs underperform • Monitoring “sar” disk data, “far” LUN filesystems are 4 to 10 times slower. • Fabric-based service disruptions are drawn into the server when any LUNs are not local. • This round of testing did not show wide variations in performance whether the server was connected to it’s Storage Node’s SAN Switch, or 3 / 4 hops away.

  35. Results – UFS Options • Logging • The journaling UFS Filesystem • Advised on large filesystems to avoid long running “fsck”. • Under Solaris 8, logging introduces a 10% write performance penalty. • Solaris 9 advertises its logging algorithm is much more efficient. • Forcedirectio • No useful testing without an Oracle workload

  36. Results – UFS Tuning • Bufhwm: • Default 2% of memory, Max 20% of memory • Extends I/O Buffer effect • improves write performance on moderately large files • Ufs:ufs_LW & ufs:ufs_HW • Solaris 7 & 8: 256K & 384K bytes • Solaris 9: 8M & 16M bytes • More data is held in system buffer before being flushed. • Fsflush() effect on “sar” data: large service times

  37. Results – VERITAS VxFS • Outstanding Write Performance • VxFS only on MA 6-disk RAID 5 • UFS on MA 6-disk RAID 5 • Sequential Write VxFS is 15 times faster • Random Write VxFS is 40 times faster • UFS on MA 6-disk RAID 1+0 • Sequential Write VxFS is 10 times faster • Random Write VxFS is 10 times faster • UFS on EVA 12-disk RAID 5 • Sequential Write VxFS is 7 times faster • Random Write VxFS is 12 times faster

  38. Results –Random Write • Hardware-only Storage Node Performance • MA 1+0 = EVA RAID 5 • EVA RAID 5 pro-rata cost similar to MA RAID 5 • RAID 1+0 is Not Cost Effective • Improved Filesystem is Your Choice • Order-of-Magnitude Better Performance • Less expensive • Server Memory • Memory Still Is Important for Large Data Streams

  39. Random Write: UFS, MA, RAID 5

  40. Random Write: UFS, MA, RAID 1+0

  41. Random Write: UFS, EVA, RAID 5

  42. Random Write: VxFS, MA, RAID 5

  43. Closer Look: VxFS vs. UFS • Graphical Comparison: • Sun Servers provided with RAID 5 LUNs • UFS EMA UFS EVA • VxFS EMA VxFS EVA • File Operations • Sequential Read • Random Read • Sequential Write • Random Write

  44. Sequential Read

  45. Random Read

  46. Sequential Write

  47. Random Write

  48. Results – VERITAS VxFS • Biggest Performance gains • Everything else is of secondary importance • Memory Overhead for VxFS • Dominates Sequential Write of small files • Needs further investigation • VxFS & EVA RAID 1+0 not measured • Don’t mention what you don’t want to sell

  49. Implications – VERITAS VxFS • Where is the Bottleneck? • Changes at Storage Node • Modest Increases in Performance • Changes within Server • Dramatically Increase Performance • The Bottleneck is in the Server, not the SAN • The relative cost is just good fortune • Changing the filesystem is much less expensive

  50. Results – Bottom Line • Bottleneck Identified • It’s the Server, not Storage • VERITAS VxFS • Use it on UNIX Servers • RAID 1+0 is Not Cost Effective • VxFS is much cheaper – Tier 1 servers • Server Memory • Memory is cheaper than Mirrored Disk • Operating System I/O Buffers • Configure as large as possible

More Related