Storage Performance on SQL Server

Storage Performance on SQL Server Joe Chang

Coverage • Emphasis is on Line of Business DB • Different priorities for less critical apps • Performance • Fault-tolerance covered else where

Overview • IO Performance Objectives • The Complete Storage Environment • Direct-Attach and SAN • Storage Components • Disk Performance • SQL Server IO Characteristics • Configuration Examples • SSD

Old Rules • Meet transaction throughput • Disk Performance Criteria • Read from Data, Write to Logs • Separate Data and Log files? • Disk Queue Depth < 2 per disk • Prevalent use of SAN • LUNs with unknown number of disks • Latency (Avg Disk Sec/Read)

Storage Performance Criteria • SELECT (Read) Query • Data must be read into buffer cache if not already in cache – read from data • INSERT/UPDATE/DELETE (Write) Query • Data must be read into buffer cache • Transaction must be written to log • Buffer is marked as dirty, lazy writer handles • Large Query (as necessary) • Write and Read to tempdb

Flashback: 1994 - 2009 • 1994: Pentium 100MHz • 64MB, 4 x 16MB SIMM ($700+ each?) • OS + DB executable ~ 16-24MB • Net: 40MB Buffer cache • Difficult to support transactions • Reports run on 1st of month • Today: 4 x Quad Core • 128GB, 32 x 4GB, $4800 • 3000 X increase in buffer cache

Requirements Then and Now • Old: Support transactions • No longer really an issue for most environments (after proper SQL tuning!) • Today: Minimize disruptions to transactions • Large query or table scan while supporting transactions • Checkpoint – write dirty buffers to data • Transaction Log backup • Backup & Restore

Cost versus Value/Requirements • Money is no object: • With sufficient number of disks, IO channels, proper configuration • It is possible to avoid most disruptions • Otherwise – Manage IO disruptions • Establish tolerable disruptions: 5-30 seconds? • Large reports run off-hours • Configure sufficient performance to handle transient events

Complete Storage EnvironmentDirect Attach and SAN

Most Common Mistakes • Storage sized to capacity requirements • 2 HBA (or RAID Controllers) • Too few big capacity disk drives • Fill system PCI-E slots with controllers • Many small 15K drives (146 3.5 or 73 2.5)

CPU CPU CPU CPU IO HUB IO HUB PCI-E PCI-E PCI-E PCI-E HBA HBA HBA HBA SAS SAS SAS SAS Direct Attach System IO capabilities is distributed across multiple PCI-E slots. Single controller does not have sufficient IO Single (or even Dual) SAS/FC port does not have sufficient IO Distribute IO over multiple PCI-E channels Controllers (SAS or FC) Dual port SAS or FC Disk Array Enclosures (DAE) Do not daisy chain (shared SAS/FC) until all channels are filled! Server System SAS SAS SAS SAS

Server System CPU CPU CPU CPU IO HUB IO HUB PCI-E PCI-E PCI-E PCI-E HBA HBA HBA HBA FC FC FC FC FC FC FC FC HBA HBA HBA HBA HBA HBA HBA HBA FC FC FC FC FC FC FC FC SAN SAN is really computer system(s) Typically connected by FC to host and storage Can be fault-tolerant in all components and paths: HBA, cables, switches, SP, disks No special performance enhancements Slight degradation (excessive layers) Write cache is mirrored between SP’s Really important! Distribute load over all front-end and back-end FC ports SAN SP A SP B

Direct Attach & SAN • Direct Attach • RAID Controller in Server • Fault-tolerant disks, • sometimes controller/path, 2-node clusters • SAN • Host Bus Adapter, (switches) • Service Processor • Full component and path fault tolerance • Multi-node clusters

SAN Vendor View SAN Switch DW-BI DB OLTP DB Email Web Share Point QA DB One immensely powerful SAN serving storage needs of all servers Storage consolidation – centralize management and minimize unused space Problem is: SAN is not immensely powerful What happens if LUN for another server fails, and a restore from backup is initiated during busy hours

Storage Storage SAN SAN SAN DW/BI OLTP Email Share point File Server Proper View Nothing should disrupt the operation of a line-of-business server Data Warehouse is not be mixed with transaction processing DB Consider multiple storage systems for very large IOPS loads instead of a single SAN

Storage Systems Direct Attach HP MSA 60, Dell MD 1000 DA High Density HP MSA 50, 70, Dell MD 1120 SAN Entry HP MSA 2000, (Dell MD 3000) Mid range EMC CLARiiON, HP EVA, NetApp FAS3100 Enterprise EMC DMX, Hitachi, 3 PAR, FAS6000

EMC CLARiiON

Multi-Core Processors Multi-Core Processors CPU Module CPU Module CPU CPU CPU CPU CPU CPU CPU CPU Multi-core processors Increased memory 64-bit FLARE Up to 960 drives = up to twice the performance, scale CPU CPU CPU CPU CPU CPU CPU CPU Memory Memory x8 CMI Power Supply IO Complex IO Complex Fibre Channel module Fibre Channel module Power Supply Adaptive Cooling Fibre Channel module Fibre Channel module = Energy efficiency Fibre Channel module Fibre Channel module SPS SPS LCC LCC Fibre Channel module Fibre Channel module iSCSI module iSCSI module iSCSI module iSCSI module High-performance Flash drives Low power SATA II drives Virtual Provisioning = Capacity optimization Spin Down

EMC DMX

Cache • If system memory is 128GB • What you expect to find in 16GB SAN cache • That is not in the buffer cache? • Performance benchmarks • Most use direct attach storage • With SAN: cache disabled • Alternative: tiny read cache, almost all to write

Complete Environment Summary • Server System • Memory Bandwidth • IO bandwidth, port, PCI-E slots • Pipes/channels from Server to Storage • Storage System • RAID controller, etc • Pipes to disk drives • Disk drives If system memory is 128GB, what you expect to find in the 16GB SAN cache that is not in the buffer cache?

Storage Components

Storage Components/Interfaces • System IO • Disk Drives • HBA and RAID Controller • SAS (3Gbit/s going to 6), FC (4Gbit/s to 8) • Storage Enclosures (DAE) • Disk Drives • SAN – Systems • SAN – Switches

Server Systems: PCI-E Gen 1 PCI-E Gen 1: 2.5Gbit/s per lane, bi-directional • Dell PowerEdge 2950 – 2 x8, 1 x4 • Dell PowerEdge R900 – 4 x8, 3 x4 (shared) • HP ProLiant DL385G5p – 2 x8, 2 x4 • HP ProLiant DL585G5 – 3 x8, 4 x4 • HP ProLiant DL785G5 – 3 x16, 3 x8, 5 x4 Most PCI-E slots have dedicated bandwidth, some may be shared bandwidth (with expander chip)

Server Systems: PCI-E Gen 2 PCI-E Gen 2: 5.0Gb/s per lane x4: 2 GB/sec in each direction • Dell PowerEdge R710 – 2 x8, 2 x4 • Dell PowerEdge R910(?) • HP ProLiant DL370G6 – 2 x16, 2 x8, 6 x4 Intel 5520 chipset: 36 PCI-E Gen 2 lanes, 1 ESI (x4) ProLiant ML/DL 370G6 has 2 5520 IOH devices

Disk Drives • Rotational Speed – 7200, 10K, 15K • Average Rotational latency 4, 3, 2 milli-sec • Average Seek Time • 8.5, 4.7, 3.4ms (7200, 10K, 15K RPM) • 2.5 in 15K 2.9 ms avg. seek • Average Random Access Time • Rotational + Seek + Transfer + Overhead • Native Command Queuing

Disk Interfaces • SATA – mostly 7200RPM • SATA disk can be used in SAS system • SATA Adapter cannot connect to SAS disk • SAS –15K • 3.5 in LFF, 2.5in SFF • Currently 3 Gbits/sec, next gen: 6 Gb/s • FC – typically in SAN • 4 Gbit/s, next: 8 Gbit/s

Disk Drives (3.5in, LFF) 65mm 84mm 95mm 7200RPM, 1TB Barracuda 12: 8.5ms, 125MB/s Barracuda LP 95MB/s (5900) 10,000RPM, 5ms End of life? 15,000RPM, 3.4ms 146, 300, 450GB 167MB/sec Lower RPM drives have higher bit density and larger platters contributing to very low $/GB. Desktop rated for 2 years @ 20% duty cycle, server for 5 years @ 100%

Seagate Drives Savvio 15K.2 Savvio 10K.3 Barracuda ES 15K.7 Cheetah 3.5in LFF drives 15K.2 2.9/3.3 15K.4 36/73/146GB 3.5/4.0ms 95? 15K.5 73/146/300GB 3.5/4.0ms 125-73 15K.6 146/300/450GB 3.4/3.9ms 171-112MB/sec 15K.7 300/450/600GB Savvio 2.5 in SFF drives 15K.1 36/72GB 2.9/3.3 ms 112-79MB/sec 15K.2 73/146GB 2.9/3.3 ms 160-120MB/s

Dell PowerVault • Dell PowerVault MD 1000 – 15 3.5in • $7K for 15 x 146GB 15K drives • Dell PowerVault MD 1120 – 24 2.5in • $11K for 24 x 73GB 15K

HP MSA • MSA 60: 12 LFF drives • MSA 70: 25 SFF drives

Direct Attach Cluster Capable • Dell PowerVault MD 3000 – 15 3.5in • 2 internal dual-port RAID controllers • $11.5K for 15 x 146G 15K drives Listed as Direct Attach, but essentially an entry SAN

PCI-E SAS RAID Controllers • First Generation • PCI-E host interface • PCI-X SAS controller • PCI-E to PCI-X bridge • 800MB/sec • Second Generation • Native PCI-E to SAS • 1.6GB/sec in x8 PCI-E, 2 x4 SAS ports

FC HBA • QLogic QLE2562 • Dual port 8Gbs FC, x8 PCI-E Gen 2 • QLogic QLE 2462 • Dual Port 4Gbs, x4 PCI-E Gen 1 • Qlogic QLE 2464 • Quad port FC, x8 PCI-E Gen 1 • Emulex LPe12002 • Emulex LPe11002/11004

Disk Performance

Drive Rotational Latency Avg Seek 8KB transfer Total milli-sec IOPS 7200 4.17 8.5 0.06 12.7 78.6 10K 3.0 4.7 0.07 7.77 128.7 15K 2.0 3.4 0.05 5.45 183.6 15K SFF 2.0 2.9 0.05 4.95 202 Random IO Theory Queue Depth 1 IO rate based on data distributed over entire disk accessed at random, one IO command issued at a time Not accounting for other delays

Other Factors Short Stroke: Data is distributed over a fraction of the entire disk Average seek time is lower (track-to-track minimum) Command Queuing: More than one IO issued at a time, Disk can reorder individual IO accesses, lowering access time per IO

8K Random IOPS vs Utilization IOPS for range of Queue depth and space utilization

Latency versus Queue Depth Latency versus Queue depth for range of space utilization

Disk Summary • Frequently cited rules for random IO • Applies to Queue Depth 1 • Data spread across entire disk • Key Factor • Short-stroke • High-Queue Depth • SAN • Complex SAN may hide SS and HQ behavior

SQL Server IO Patterns

SQL Server IO • Transactional queries • Read/Write • Reporting / DW queries • Checkpoints • T-Log backups • Differential/Full backups

Transactional Query • Few rows involved • SELECT xx FROM Table WHERE Col1 = yy • Execution Plan has bookmark lookup or loop joins • IO for data not in buffer cache • 8KB, random • issued 1 at a time, serially (5ms min latency) • (up to around 24-26 rows) • Even if LUN has many disks, IO depth is 1!

Large Query • Plan has bookmark lookup or loop join • Uses Scatter-Gather IO • More than (approximately) 30 rows • Depending on Standard or Enterprise Edition • Multiple IO issued with one call, • Generates high-queue depth • Query for 100 rows can run faster than 20! High row count non-clustered index seek: Are key lookups really random. Build index with care. Only highly selective SARG in key.

Tempdb • Large Query may to spool intermediate results to tempdb • Sequence of events is: • Read from data • Write to tempdb • Read from tempdb (sometimes) • Repeat • Disk load is not temporally uniform! • Data and tempdb should share common pool of Disks/LUNs

Checkpoint • Dirty data buffers written to disk • User does not wait on data write • SQL Server should throttle checkpoint writes • But high-queue depth of writes may result in high-latency reads

Log Backup • Disrupts sequential log writes

Update • Problem in SQL Server 2000 • UPDATE uses non-clustered index • Plan does not factor in key lookups • Execution – fetch one row at a time • ~5-10ms per key lookup

Storage Configuration Examples

General Strategy – Distribute IO • Distribute IO across multiple PCI-E slots • Distribute IO across multiple HBA/Controllers • Distribute IO across many disk drives • Daisy chain DAE only after • High transaction (write) volume • Dedicate HBA/controller, SAN SP, disk drives for logs?

Storage Performance on SQL Server