630 likes | 776 Views
SAN Disk Metrics. Measured on Sun Ultra & HP PA-RISC Servers, StorageWorks MAs & EVAs, using iozone V3.152. Current Situation. UNIX External Storage has migrated to SAN Oracle Data File Sizes: 1 to 36 GB (R&D) Oracle Servers are predominantly Sun “Entry Level”
E N D
SAN Disk Metrics Measured on Sun Ultra & HP PA-RISC Servers, StorageWorks MAs & EVAs, using iozone V3.152
Current Situation • UNIX External Storage has migrated to SAN • Oracle Data File Sizes: 1 to 36 GB (R&D) • Oracle Servers are predominantly Sun “Entry Level” • HPQ StorageWorks: 24 MAs, 2 EVAs • 2Q03 SAN LUN restructuring using RAID 5 only • Oracle DBAs continue to request RAID 1+0 • Roadmap for future - needed
Purpose Of Filesystem Benchmarks • Find Best Performance • Storage, Server, HW options, OS, and Filesystem • Find Best Price/Performance • Restrain Costs • Replace “Opinions” with Factual Analysis • Continue Abbott UNIX Benchmarks • Filesystems, Disks, and SAN • Benchmarking began in 1999
Goals • Measure Current Capabilities • Find Bottlenecks • Find Best Price/Performance • Set Cost Expectations For Customers • Provide a Menu of Configurations • Find Simplest Configuration • Satisfy Oracle DBA Expectations • Harmonize Abbott Oracle Filesystem Configuration • Create a Road Map for Data Storage
Preconceptions • UNIX SysAdmins • RAID 1+0 does not vastly outperform RAID 5 • Distribute Busy Filesystems among LUNs • At least 3+ LUNs should be used for Oracle • Oracle DBAs • RAID 1+0 is Required for Production • I Paid For It, So I Should Get It • Filesystem Expansion On Demand
Web serving:Small, integrated system CPU Database/CRM/ERP: Storage Oracle Server Resource Needs in 3D Memory I/O
Sun Servers for Oracle Databases • Sun UltraSPARC UPA Bus Entry Level Servers • Ultra 2, 2x300 MHz Ultra SPARC-II, Sbus, 2 GB • 220R, 2x450 MHz Ultra SPARC-II, PCI, 2 GB • 420R, 4x450 MHz Ultra SPARC-II, PCI, 4 GB • Enterprise Class Sun UPA Bus Servers • E3500, 4x400 MHz Ultra SPARC-II, UPA, Sbus, 8 GB • Sun UltraSPARC Fireplane (Safari) Entry Level Servers • 280R, 2x750 MHz Ultra SPARC-III, Fireplane, PCI, 8 GB • 480R, 4x900 MHz Ultra SPARC-III, Fireplane, PCI, 32 GB • V880, 8x900 MHz Ultra SPARC-III, Fireplane, PCI, 64 GB • Other UNIX • HP L1000, 2x450 PA-RISC, Astro, PCI, 1024 MB
Oracle UNIX Filesystems • Cooperative Standard between UNIX and R&D DBAs • 8 Filesystems in 3 LUNs • /exp/array.1/oracle/<instance> binaries & config • /exp/array.2-6/oradb/<instance> data, index, temp, etc… • /exp/array.7/oraarch/<instance> archive logs • /exp/array.8/oraback/<instance> export, backup (RMAN) • Basic LUN Usage • Lun1: array.1-3 • Lun2: array.4-6 • Lun3: array.7-8 (Initially on “far” Storage Node)
StorageWorks SAN Storage Nodes • StorageWorks: DEC -> Compaq -> HPQ • A traditional DEC Shop • Initial SAN equipment vendor • Brocade Switches resold under StorageWorks label • Only vendor with complete UNIX coverage (2000) • Sun, HP, SGI, Tru64 UNIX, Linux • EMC, Hitachi, etc… could not match UNIX coverage • Enterprise Modular Array (MA) – “Stone Soup” SAN • Buy the controller, then 2 to 6 disk shelves, then disks • 2-3 disk shelf configs have led to problem RAIDsets which have finally been reconfigured in 2Q2003 • Enterprise Virtual Array (EVA) – Next Generation
2Q03 LUN Restructuring – 2nd Gen SAN • “Far” LUNs pulled back to “near” Data Center • 6 disk, 6 shelf MA RAID 5 RAIDsets • LUNs are partitioned from RAIDsets • LUNs are sized as multiples of disk size • Multiple LUNs from different RAIDsets • Busy filesystems are distributed among LUNs • Server and Storage Node SAN Fabric Connections mated to common switch
Results – Generalizations • Read Performance - Server Performance Baseline • Basic Measure of System Bus, Memory/Cache, & HBA • Good evaluation of dissimilar server I/O potential • Random Write - Largest Variations in Performance • Filesystem & Storage Node Selection • Dominant Variables • Memory & Cache – Important • Processor Cache, System I/O Buffers, Virtual Memory • All boost different data stream size performance • More Hardware, OS, & Fsys selections • To be evaluated
IOZONE Benchmark Utility • File Operations • Sequential Write & Re-write • Sequential Read & Re-read • Random Read & Random Write • Others are available: • record rewrite, read backwards, read strided, fread/fwrite, pread/pwrite, aio_read/aio_write • File & Record Sizes • Ranges or individual sizes may be specified
Results – Server Memory • Cache • Influences small data stream performance • Memory - I/O buffers and virtual memory • Influences larger data stream performance • Large Data Streams need Large Memory • Past this limit => Synchronous performance
Results – Server I/O Potential • System Bus • Sun: UPA replaced by SunFire • Peripheral Bus: PCI vs. SBus • Sbus (Older Sun only) • Peak Bandwidth (25 MHz/64-bit) ~200 MB/sec • Actual Thruput ~50-60 MB/sec (~25+%) • PCI (Peripheral Component Interconnect) • Peak Bandwidth (66 MHz/64-bit) ~530 MB/sec • Actual Thruput ~440 MB/sec (~80+%)
Results – MA vs. EVA • MA RAID 1+0 & RAID 5 vs. EVA RAID 5 • Sequential Write • EVA RAID 5 is 30-40% faster than MA RAID 1+0 • EVA RAID 5 is up to 2x faster than MA RAID 5 • Random Write • EVA RAID 5 is 10-20% slower than MA RAID 1+0 • EVA RAID 5 is up to 4x faster than MA RAID 5 • Servers were SunFire 480Rs, using UFS+logging. • EVA: 12 72 GB FCAL Disk RAID 5 partitioned LUN • MA: 6 36 GB SCSI Disk RAIDset
Results – MA RAIDsets • Best: 3 mirror, 6 shelf RAID 1+0 • 3 mirror RAID 1+0 on 2 shelves only yield 80% of 6 shelf version • 2 disk mirror (2 shelves) yields 50%
Results – MA RAIDsets • Best: 3 mirror, 6 shelf RAID 1+0 • 6 disk, 6 shelf RAID 5: • Sequential Write: 75-80% • Random Write: 25-50% (2 to 4 times slower) • 3 disk, 3 shelf RAID 5: • Sequential Write: 40-60% • Random Write: 25-60% • Can outperform 6 disk RAID 5 on random write
Results – LUNs from Partitions • 3 Simultaneous Writers • Partitions of same RAIDset • Write performance (S or R) • Less than 50% of no-contention performance • No control test performed: • 3 servers write to 3 different RAIDsets of same Storage Node • Where is the Bottleneck? • RAIDset, SCSI channels, or Controllers?
Results – Fabric Locality • In production, “far” LUNs underperform • Monitoring “sar” disk data, “far” LUN filesystems are 4 to 10 times slower. • Fabric-based service disruptions are drawn into the server when any LUNs are not local. • This round of testing did not show wide variations in performance whether the server was connected to it’s Storage Node’s SAN Switch, or 3 / 4 hops away.
Results – UFS Options • Logging • The journaling UFS Filesystem • Advised on large filesystems to avoid long running “fsck”. • Under Solaris 8, logging introduces a 10% write performance penalty. • Solaris 9 advertises its logging algorithm is much more efficient. • Forcedirectio • No useful testing without an Oracle workload
Results – UFS Tuning • Bufhwm: • Default 2% of memory, Max 20% of memory • Extends I/O Buffer effect • improves write performance on moderately large files • Ufs:ufs_LW & ufs:ufs_HW • Solaris 7 & 8: 256K & 384K bytes • Solaris 9: 8M & 16M bytes • More data is held in system buffer before being flushed. • Fsflush() effect on “sar” data: large service times
Results – VERITAS VxFS • Outstanding Write Performance • VxFS only on MA 6-disk RAID 5 • UFS on MA 6-disk RAID 5 • Sequential Write VxFS is 15 times faster • Random Write VxFS is 40 times faster • UFS on MA 6-disk RAID 1+0 • Sequential Write VxFS is 10 times faster • Random Write VxFS is 10 times faster • UFS on EVA 12-disk RAID 5 • Sequential Write VxFS is 7 times faster • Random Write VxFS is 12 times faster
Results –Random Write • Hardware-only Storage Node Performance • MA 1+0 = EVA RAID 5 • EVA RAID 5 pro-rata cost similar to MA RAID 5 • RAID 1+0 is Not Cost Effective • Improved Filesystem is Your Choice • Order-of-Magnitude Better Performance • Less expensive • Server Memory • Memory Still Is Important for Large Data Streams
Closer Look: VxFS vs. UFS • Graphical Comparison: • Sun Servers provided with RAID 5 LUNs • UFS EMA UFS EVA • VxFS EMA VxFS EVA • File Operations • Sequential Read • Random Read • Sequential Write • Random Write
Results – VERITAS VxFS • Biggest Performance gains • Everything else is of secondary importance • Memory Overhead for VxFS • Dominates Sequential Write of small files • Needs further investigation • VxFS & EVA RAID 1+0 not measured • Don’t mention what you don’t want to sell
Implications – VERITAS VxFS • Where is the Bottleneck? • Changes at Storage Node • Modest Increases in Performance • Changes within Server • Dramatically Increase Performance • The Bottleneck is in the Server, not the SAN • The relative cost is just good fortune • Changing the filesystem is much less expensive
Results – Bottom Line • Bottleneck Identified • It’s the Server, not Storage • VERITAS VxFS • Use it on UNIX Servers • RAID 1+0 is Not Cost Effective • VxFS is much cheaper – Tier 1 servers • Server Memory • Memory is cheaper than Mirrored Disk • Operating System I/O Buffers • Configure as large as possible