430 likes | 445 Views
Explore testing and performance evaluation methods for Oracle DB storage at CERN, including hardware setup, RAID configuration, and the benefits of using Oracle ASM with commodity hardware.
E N D
Evaluating and testing storage performance for Oracle DBs Luca Canali, CERN Dawid Wojcik, CERN UKOUG, Birmingham, December 1st, 2009
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik CERN and LHC experiments
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Outline • Storage for Physics DB Services @ CERN • Architecture: 10g RAC + ASM on commodity HW • DBAs and storage – our working model • Performance testing and (new) HW evaluation • Tools: ORION, SQL-based test harness • Discussion of some test results • What is interesting to measure • FC commodity HW, tests of iSCSI 1 and 10Gig • Measuring IO performance from prod DBs • Our experience of production workload measurement • Metrics for capacity planning • Testing for robustness • Pre-prod stress testing
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik CERN Set-up • Dual-CPU quad-core 2950 DELL servers, 16GB memory, Intel 5400-series “Harpertown”; 2.33GHz clock • Dual power supplies, mirrored local disks, 4 NIC (2 private/ 2 public), dual HBAs, “RAID 1+0 like” with ASM
Compressing Very Large Data Sets in Oracle, Luca Canali Mirroring Striping Striping Oracle Cluster Storage, ASM • ASM for mirroring and striping across storage arrays • Allows the use of commodity HW (mirror across arrays) • Disks can be added and removed online for scheduled and • unscheduled changes • Example: • disk groups: data and flash recovery DATADG RECODG
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Why ASM and commodity HW • Storage market is very conservative • Enterprise storage typically priced much higher than consumer storage • Opportunities • Commodity HW/grid-like solutions • provide order of magnitude gain in • cost/performance
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Service Management and DBAs • DBAs at CERN work across the full stack of activities necessary to run the DB service: • User-facing: • match users requirements with DB deployments • help developers to optimize their DB usage • Technology-facing: • Involved also in configuration and tuning of ‘lower level stack’ • Guarantee SLA, tune SQL execution, backup, security, replication • Database software installation, monitoring, patching
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik DBAs and Storage Testing • Opportunity for testing new HW • Test for performance • Test for stability • Measure what we know are the critical metrics • Often this means small-read random IOPS
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik HD – the basic element • Hard disk technology • Basic block of storage since 40 years • Main intrinsic limitation: latency
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik HD specs • HDs are limited • In particular seek time is unavoidable (7.2k to 15k rpm, ~2-10 ms) • 100-200 IOPS • Throughput ~100MB/s, typically limited by interface • Capacity range 300GB -2TB • Failures: mechanical, electric, magnetic, firmware issues. • In our experience with ~2000 disks in prod we have about 1 disk failure per week
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Enterprise disks? • Performance • Enterprise disks offer more ‘performance’ • They spin faster and have better interconnect protocols (e.g. SAS vs SATA) • Typically of low capacity • Our experience: often not competitive in cost/perf vs. SATA • Reliability • Evidence that low-end and high end disks don’t differ significantly • Still an open question
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Scaling out the disk • The challenge for storage systems • Scale out the disk performance to meet demands • Throughput • IOPS • Latency • Capacity • Sizing storage systems • Sizing must focus on critical metric(s) • Depend on the workload of the DB • Avoid ‘capacity trap’
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik RAID and redundancy • Storage arrays with RAID are the traditional approach • implement RAID to protect data. • Parity based: RAID5, RAID6 • Stripe and mirror: RAID10 • Scalability problem of RAID • For very large configurations the time between two failures can become close to RAID volume rebuild time (!) • That’s also why RAID6 is becoming more popular than RAID5
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Oracle ASM way • Physics DB storage uses Oracle ASM • Volume manager and cluster file system integrated with Oracle • Oracle files are divided in chunks • Chunks are distributed evenly across storage • Chunks are written in multiple copies (2 or 3 it depends on file type and configuration) • Allows the use of low-cost storage arrays: does not need RAID support (JBOD is enough)
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik VLDB and storage interconnect • Throughput challenge • It takes 1 day to copy/backup 10 TB over 1 GBPS network
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Techology • Several technologies available with different characteristics • SAN • NAS • iSCSI • Direct attach
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Fiber channel SAN • FC SAN is currently most used architecture for enterprise level storage • Fast and low overhead on server CPU • Used for physics DBs at CERN and Tier1s • SAN networks with 64 ports at low cost • Measured: 8 Gbps transfer rate (4+4 dual ported HBAs for redundancy and load balancing) • Proof of concept FC backup (LAN free) reached full utilization of tape heads • Scalable: proof of concept ‘Oracle supercluster’ of 410 SATA disks, and 14 dual quadcore servers
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik How to Test the Whole stack? • Oracle workload based tests • Testing using production copies and application-like behaviour • Best option but very difficult • Benchmarking • Not the real thing but can save a lot of time • Oracle’s ORION • Tests only the access to storage • Uses async IO as ASM would do • Results from our tests prove it reliable to compare storage solutions
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Testing storage with ORION • ORION • Oracle utility that has proven to give similar results as more complex SQL-based tests • In the following some examples of results • IOPS measured for various disk types • FC results • iSCSI 1 Gbps and 10 GigE results
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Metrics of interest • Basic IO metrics measured by ORION • IOPS for random I/O (8KB) • MBPS for sequential I/O (in chunks of 1 MB) • Latency associated with the IO operations • Simple to use • Getting started:./orion_linux_em64t -run simple -testname mytest -num_disks 2 • More info: https://twiki.cern.ch/twiki/bin/view/PDBService/OrionTests
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik ORION output, an example
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik ORION results,small random read IOPS
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Example of HW testing • 14 servers
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Test hardware • 8 FC switches: 4Gbps (10Gbps uplink)
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Test hardware • 26 storage arrays (16 SATA disks each)
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Results • Measured, sequential I/O • Read: 6 GB/sec • Read-Write: 3+3 GB/sec • Measured, small random IO • Read: 40K IOPS (8 KB read ops) • Note: • 410 SATA disks, 26 HBAS on the storage arrays • Servers: 14 x 4+4Gbps HBAs, 112 cores, 224 GB of RAM
SQL-Based Custom Testing • A custom SQL-based DB workload: • IOPS: Probe randomly a large table (several TBs) via several parallel queries slaves (each reads a single block at a time) • MBPS: Read a large (several TBs) table with parallel query • The test table was 5 TB in size • To reduce storage caching effects • Test table was probed by a small table via nested loop join • V$systat and v$system_event data used to ensure each row of the ‘probe table’ when joined caused a random read on the large test table
SQL-Based vs. ORION • SQL-based vs. ORION • SQL-based uses more CPU • Closer to a ‘real workload’ • Each physical IO in Oracle needs a logical IO too, that is CPU + latch, etc • ORION just stresses async IO of the OS • Recent experience • Tested a config of 371 raptor disks of 150 random IOPS per disk • SQL-based test maxed out CPU (60% usr, 40% sys) at 42K random IOPS • Should have gone up to 55K IOPS
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik iSCSI Storage • iSCSI is interesting for cost reduction • Some trends to get rid of ‘specialized’ FC network • Reuse existing infrastructure and know-how • Not applicable to all systems • IP interconnect throughput • CPU usage • Adoption seems to be only for low throughput systems at the moment • However 10GigE tests are very promising
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik iSCSI 1 Gbps • Orion scalability tests, small IO, FC vs. iSCSI • ./orion_linux_x86-64 -run advanced -write 0 -matrix point -duration 120-testname mytest -simulate raid0 -num_disks 24 -cache_size 500-num_small 1000 -num_large 0 Double the number of actual disks to push them to their limits
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik iSCSI 1 Gbps • Orion scalability tests, large IO, FC vs. iSCSI • ./orion_linux_x86-64 -run advanced -write 0 -matrix point -duration 120-testname mytest2 -simulate raid0 -num_disks 24 -cache_size 500-num_small 0 -num_large 1000
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik iSCSI on 10 GigE • We have recently tested custom iSCSI 10GigE storage • ‘CERN-made’ disk servers that export storage as iSCSI over 10 GigE • 2 Clovertown quad-core processors of 2.00 GHz • 8 GB of RAM • 16 SATA-II drives of 500 GB, 7'200 rpm • RAID controller 3ware 9650SE-16ML • Intel 10GigE dual port server adapter PCIexpress (EXPX9502CX4 - Oplin) • HP Procurve 10GigE switch Data: H. Meinhard, CERN
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik iSCSI tests, 10 GigE • Preliminary results • ORION tests with up to 3 disk arrays (14 disks each) • Almost linear scalability • Up to 42 disks tested -> 4000 IOPS at saturation • 85% CPU idle during test • IOPS of a single disk: ~110 IOPS • Promising interconnect technology, if it can be made rock solid like FC • Performance of iSCSI stacks varies a lot from version to version Data: A. Horvath, CERN
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Disclaimer • Simple testing with ORION is not the same as production workload testing • CPU cycles also important for IOPS • You need to combine different IO patterns • ... see next slides • Testing has many sides • We need reasonable numbers quickly • Results should allow comparison between platforms and HW types • Results needs to be confronted with real life performance
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Storage Capacity Planning • Capacity planning – iterative process, also for ‘stable’ services • Size is not the only metric for storage • Measure and understand current system utilization • Identify bottlenecks (if any) • Prepare forecast for future growth • Plan for HA • Try to plan for the ‘unexpected’ • Service expansion • Plan stability issues • HW failures • Consolidate if possible – better resource utilization
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Storage Capacity Planning • Capacity planning requires historical data • Diagnostic Pack (AWR – DBA_HIST_SYSMETRIC_SUMMARY) • Custom probing (V$SYSMETRIC_SUMMARY) • Choose some important metrics: • Physical Read Total Bytes Per Sec • Physical Read Total IO Requests Per Sec • Physical Write Total Bytes Per Sec • Physical Write Total IO Requests Per Sec • Redo Generated Per Sec • Redo Writes Per Sec • ... • DB size
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Capacity Planning – plots
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Capacity Planning – plots
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Reproducing IO load pattern • Historical data allows to set-up simple IO tests that will resemble production workload • Reproduce different IO sizes (large vs small IO ratio) • Reproduce mixed workload (read/write ratio) • Reproduce cluster-wide IO load (nodes IO ratios) • Single node testing trap - parallel (multi-node) IO tests required to detect some potential SCSI level or storage issues (both performance and stability) • Test for spikes (short term and long term) • Test on storage of similar size • Cache influence, uneven platter performance • Watch for storage cache • Disable, purge, warm-up
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Testing for Robustness • Our experience • Putting in production new HW and OS versions needs extensive testing • Especially true with commodity HW • Lessons learned after few years of running different tests • Mixed read/write (if possible) • Mixed IO size • Simulate workload from several nodes • Periodic testing: high load, low load, high load ... • Proper testing should also detect most of the 'infant mortality’ cases • Beware – disk or storage firmware issues can take long time to debug
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Testing for Robustness • Some issues can still appear only on a running production system • Be proactive – schedule for tests! • ASM normal redundancy – backup validate check logical is not enough! • Corruption can affect secondary extents only! • We have developed a tool that scans all ASM files and compares primary and secondary extents • Set-up media scans on the storage arrays in low activity periods • Test HA features (new kernel version may have bugs)
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Conclusions • Storage is key for successful • DB implementations • Testing different technologies • DBAs can profit by getting a deeper knowledge of their storage subsystem • Typical case: new HW evaluation, pre-prod stress testing • There are simple tools for testing, ORION • Monitoring and measuring production environment • Helps in keeping service level/ production performance • Provides data for capacity planning
Testing Storage Performance for Oracle DBs – L.Canali, D.Wojcik Acknowledments • Many thanks to Physics DB team at CERN, in particular for this work: • Maria Girone, Jacek Wojcieszuk • http://phydb.web.cern.ch/phydb/ • Physics DB@UKOUG later today: • Eva Dafonte Pérez, 13:15, “Worldwide distribution of experimental physics data using Oracle Streams” • Luca Canali, 15:55, “Compressing very large data sets in Oracle”